## Why annotate experiments?
Because most choices a data scientist makes for one model cannot be carried over to a new model or experiment, data may need exploration and manipulation to produce a set of features a model can use. Models often have many parameters so that they are not amenable to formal analysis. 

The Cortex Python SDK provides facilites for these activities. The priniciple abstraction to support these tasks is the experiment. The user can create a collection of `runs` that systematically explore different views of the data and different parameters for the predective models. Runs can be annotated with parameters, metrics, artifacts and metadata. 

Parameters describe how the model is configured. Metrics describe how well the model did in making predictions. Artifacts are a place to keep other information or software objects associated with a particular run. Metadata is for any information about the information in the run. 

This notebook demonstrates how to use annotations for `runs` and `experiments`.

## Setup

In [None]:
!pip install tensorflow==1.12.0
!pip install keras==2.2.2

import numpy
import pandas

from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasClassifier

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.model_selection import train_test_split

from cortex import Cortex

client = Cortex.local()
exp = client.experiment('example/experiment-anno')

This example uses the same data set that was used in the basic experiment notebook.

In [None]:
df = pandas.read_csv('./data/iris.data')
dataset = df.values
ds_classes = pandas.get_dummies(df).values # one-hot encode the classes

X = dataset[:,0:4].astype(float)
Y = ds_classes[:,4:8].astype(int)

(train_inputs, test_inputs, train_classes, test_classes) = train_test_split(X, Y, test_size=0.333, train_size=0.667)

## Experiment models

This example uses a neural network model. Neural networks have many possible configurations and model parameters.

In [None]:
lf = 'categorical_crossentropy'

def adam_opt_model():
    model = Sequential()
    model.name = 'adam optimizer'
    model.add(Dense(16, input_dim=4, activation='relu'))
    model.add(Dense(3, activation='softmax'))
    model.compile(loss=lf, optimizer='adam', metrics=['accuracy'])
    return model

Create a second model with a different layer structure and a different optimizer function.

In [None]:
def small_sgd_opt_model():
    model = Sequential()
    model.name = 'small sgd optimizer'
    model.add(Dense(8, input_dim=4, activation='relu'))
    model.add(Dense(3, activation='softmax'))
    model.compile(loss=lf, optimizer='sgd', metrics=['accuracy'])
    return model

## Annotate the experiment

To capture some of the hyperparameters related to this run, the experiment is annotated: 

In [None]:
exp.meta = {'loss function':lf, 'nn type':'sequential'}

## Annotate runs 

The `runs` in this experiment explore the results given by the two different models. Training results will vary as you vary the batch size and number of training epochs. For brevitiy, this notebook only tests one batch size and one epoch count, but you can uncomment the lines relating to epoch_cts to see the difference in batch runtime. This experiment is using [cross validation](https://scikit-learn.org/stable/modules/cross_validation.html) to compare models trained with different subsets of the data to improve the accuracy of the metrics:

In [None]:
epoch_cts = [1, 10]
batch_sz = 3

for model in [adam_opt_model, small_sgd_opt_model]:
    for epoch_ct in epoch_cts:
        with exp.start_run() as run: 
            estimator = KerasClassifier(build_fn=model, epochs=epoch_ct, batch_size=batch_sz, verbose=0)
            kfold = KFold(n_splits=3, shuffle=True)
            results = cross_val_score(estimator, test_inputs, test_classes, cv=kfold)
            run.log_artifact('model', model)
            run.log_param('batch size', batch_sz)
            run.log_param('epochs', epoch_ct)
            run.log_param('model name', model().name)
            run.log_metric('mean % acc', results.mean()*100)
            run.log_metric('margin of err', results.std()*100)

Examining the experiment shows how the two models compare. 

In [None]:
exp

## Use artifacts

Examine the table for speed, accuracy, and the lowest prediction errors. Models that performed well can be retrieved and used for predictions or for further experiments. In the following example, get the last run to make a prediction.

In [None]:
run = exp.last_run()
model = run.get_artifact('model')

sample = numpy.array([[4.9,3.1,1.5,0.2]]) # sample of one

pred = model().predict(sample)
pred

Undo the one hot encoding variable, and display the model's prediction.

In [None]:
x = numpy.argmax(pred, axis=1).item(0)

iris_dict = {0:'Iris-setosa',1:'Iris-versicolor',2:'Iris-virginica'}

iris_dict[x]