# Training with a manual minibatch loop
In this notebook we'll retrain our flower classification model using a manual minibatch loop.
The model is the same as before, 4 input features and a binary encoded label as output. 

We're going to pretent that de dataset is too big to fit in memory. So we'll load it in chunks. Since our data is stored as CSV we can still use pandas but with a different configuration. The LabelBinarizer that we've used in previous samples no longer works as you can't train that component in chunks. Instead we'll use a different technique to encode the labels.

In [1]:
# Fixate the random seed so you get the same results every time.

import cntk
import numpy 

cntk._cntk_py.set_fixed_random_seed(1337)
numpy.random.seed = 1337

## The model
The model is a classification neural network with 4 inputs and 3 outputs. The 4 inputs correspond with the number of input features that we have in our dataset. The 3 outputs represent a binary encoding of 3 possible species of flowers that we can classify.

The loss function for the model is a categorical cross entropy function because we're dealing with a multi-class classification problem. The learner is a standard SGD (Stochastic Gradient Descent) algorithm.

In [2]:
from cntk import default_options, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import log_softmax, sigmoid

model = Sequential([
    Dense(4, activation=sigmoid),
    Dense(3, activation=log_softmax)
])

features = input_variable(4)

z = model(features)

## Encoding the labels
We still have a set of labels  in our dataset so we need to encode to a binary representation. Sadly sklearn requires us to load the whole dataset into memory if we want to train a LabelBinarizer for this purpose like we did before in previous samples. So instead of using a LabelBinarizer we create a manual mapping between the labels and their encoded values.

In [3]:
label_mapping = {
    'Iris-setosa': 0,
    'Iris-versicolor': 1,
    'Iris-virginica': 2
}

## Training the model
This next section implements a single epoch of training using a manual minibatch loop.
You can wrap the code after the creation of the trainer in an extra for-loop to introduce multiple epochs.

In [4]:
import pandas as pd
import numpy as np
from cntk.logging import ProgressPrinter
from cntk.losses import cross_entropy_with_softmax
from cntk.learners import sgd 
from cntk.train import Trainer

labels = input_variable(3)
loss = cross_entropy_with_softmax(z, labels)
learner = sgd(z.parameters, 0.1)

progress_writer = ProgressPrinter(0)
trainer = Trainer(z, (loss, None), learner, progress_writer)

input_data = pd.read_csv('iris.csv', 
    names=['sepal_length', 'sepal_width','petal_length','petal_width', 'species'], 
    index_col=False, chunksize=16)

for df_batch in input_data:
    feature_values = df_batch.iloc[:,:4].values
    feature_values = feature_values.astype(np.float32)
    
    label_values = df_batch.iloc[:,-1]

    label_values = label_values.map(lambda x: label_mapping[x])
    label_values = label_values.values

    encoded_labels = np.zeros((label_values.shape[0], 3))
    encoded_labels[np.arange(label_values.shape[0]), label_values] = 1.

    trainer.train_minibatch({features: feature_values, labels: encoded_labels})

 average      since    average      since      examples
    loss       last     metric       last              
 ------------------------------------------------------


  (sample.dtype, var.uid, str(var.dtype)))


Learning rate per minibatch: 0.1
      1.1        1.1          0          0            16
    0.835      0.704          0          0            48
    0.993       1.11          0          0           112
