# Step 1: Building the network structure

## Define the layers
This step defines the layer structure for the neural network. We're using a default `relu` activation function for each of the neurons in the hidden layers. The output layer gets a `log_softmax` activation function.

In [400]:
from cntk import default_options, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import log_softmax, relu, sigmoid

In [401]:
model = Sequential([
    Dense(4, activation=sigmoid),
    Dense(3, activation=log_softmax)
])

## Define the input for the neural network
The input for the model is a vector with four features:
 
 - Sepal length
 - Sepal width
 - Petal length
 - Petal width
 
In order for the model to work we need to define its input as an `input_variable`. This variable should have the same size as the number of features that we want to use for making a prediction. In this case it should be 4, because we have 4 different features in our dataset.

In [402]:
features = input_variable(4)

## Finalize the neural network structure
The last step is to finalize the neural network structure. We define a new variable `z` and invoke the model function with the input variable to bind it as the input for our model. 

In [403]:
z = model(features)

# Train the model
After we've defined the model we need to setup the training logic. This is done in three steps:

 1. Load the dataset and prepare it for use
 2. Define the loss for the model.
 3. Set up the trainer and learner for the model.
 3. Use the trainer to train the model with the loaded data.

## Loading the data
Before we can actually train the model, we need to load the data from disk. We will use pandas for this.
Pandas is widely used python library for working with data. It contains functions to load and process data 
as well as a large amount functions to perform statistical operations.

In [404]:
import pandas as pd

In [405]:
df_source = pd.read_csv('iris.csv', 
    names=['sepal_length', 'sepal_width','petal_length','petal_width', 'species'], 
    index_col=False)

In [406]:
df_source.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
sepal_length    150 non-null float64
sepal_width     150 non-null float64
petal_length    150 non-null float64
petal_width     150 non-null float64
species         150 non-null object
dtypes: float64(4), object(1)
memory usage: 5.9+ KB


In [407]:
df_source.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


We split the dataset into features `X` and labels `y`. We need to feed these separately to the trainer later on to train the model. We convert the features and labels to numpy arrays as this is what CNTK expects as input.

In [408]:
import numpy as np

In [409]:
X = df_source.iloc[:, :4].values
y = df_source.iloc[:, -1:].values

Our model doesn't take strings as values. It needs floating point values to do its job. So we need to encode the strings into a floating point representation. We can do this using a standard label encoder which is available in the `scikit-learn` python package.

In [410]:
from sklearn.preprocessing import LabelBinarizer

In [411]:
label_encoder = LabelBinarizer()

In [412]:
y = label_encoder.fit_transform(y)

CNTK is configured to use 32-bit floats by default. Right the features are stored as 64-bit floats and the labels are stored as integers. In order to help CNTK make sense of this, we will have to convert our data to 32-bit floats.

In [413]:
X = X.astype(np.float32)
y = y.astype(np.float32)

One of the challenges with machine learning is the fact that your model will try to memorize every bit of data it saw. This is called overfitting and bad for your model as it is no longer able to correctly predict outcome correctly for samples it didn't see before. We want our model to learn a set of rules that predict the correct class of flower. 

In order for us to detect overfitting we need to split the dataset into a training and test set. This is done using a utility function found in the scikit-learn python package which is included with your standard anaconda installation.

In [414]:
from sklearn.model_selection import train_test_split

In [415]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.2, stratify=y)

## Defining the target and loss
Let's define a target for our model and a loss function. The loss function measures the distance between the actual and predicted value. The loss is later used by the learner to optimize the parameters in the model.

In [416]:
from cntk.losses import cross_entropy_with_softmax
from cntk.metrics import classification_error

In [417]:
label = input_variable(3)

In [418]:
loss = cross_entropy_with_softmax(z, label)

In [419]:
error_rate = classification_error(z, label)

## Setting up the learner and trainer
When we have a model and loss we can setup the learner and trainer to train the model.
We first define the learner, which is going to use the loss function and target to optimize the model.

In [420]:
from cntk.learners import sgd
from cntk.train.trainer import Trainer

In [421]:
learner = sgd(z.parameters, 0.001)

In [422]:
trainer = Trainer(z, (loss, error_rate), [learner])

# Train the model
To train the model you can use different methods on the trainer. The `train_minibatch` method can be used to manually feed data into the model as minibatches. You typically use this method when you have a dataset that you've loaded manually using Pandas or numpy. 

We're going to train our model by running our dataset 10 times through the trainer.  Each time we perform a full pass over the dataset we perform one training epoch. 

At the end of the training process we have a fully trained model that we can use to make predictions.

In [423]:
for _ in range(5):
    trainer.train_minibatch({ features: X_train, label: y_train })
    
    print('Loss: {}, Acc: {}'.format(
        trainer.previous_minibatch_loss_average,
        trainer.previous_minibatch_evaluation_average))

Loss: 1.336875279744466, Acc: 0.6666666666666666
Loss: 1.3364276885986328, Acc: 0.6666666666666666
Loss: 1.3359808603922525, Acc: 0.6666666666666666
Loss: 1.3355345408121744, Acc: 0.6666666666666666
Loss: 1.335089111328125, Acc: 0.6666666666666666


# Evaluate the model
After we've trained the model using the training set we can measure the models performance using a call to the test_minibatch method on the trainer instance we used earlier. This outputs a value between 0 and 1. A value closer to 1 indicates a perfectly working classifier.

Please note that at this point the model performance may be a little underwhelming. You can try running all the cells in the notebook again and it will most likely improve. This happens because the weights are initialized using a random number which changes every time you rerun all the cells in this notebook. You may get lucky!

In [424]:
trainer.test_minibatch( {features: X_test, label: y_test })

0.6666666666666666

# Make a prediction with the trained model
Once trained we can make predictions with our model by simply invoking the model. This produces a vector with the activation values of the output layer of our model. We can then use the `argmax` function from numpy to determine the neuron with the highest activation, which is the species the flower was classified as.

In [425]:
sample_index = np.random.choice(X_test.shape[0])
sample = X_test[sample_index]

In [426]:
prediction = z(sample)
predicted_label = label_encoder.inverse_transform(prediction)

print(predicted_label)

['Iris-setosa']
