# Training with a minibatch source
This notebook demonstrates how to use minibatch sources in CNTK to work with datasets that don't fit in memory fully.
We'll work on a basic classification model just like in the other notebooks for this chapter. Except this time we're using a minibatch source to train the neural network.

In [1]:
# Fixate the random seed so you get the same results every time.

import cntk
import numpy 

cntk._cntk_py.set_fixed_random_seed(1337)
numpy.random.seed = 1337

## Creating the minibatch source
In order to train the model we'll create a minibatch source. The minibatch source in CNTK needs a deserializer that can read the input data. We're using a CTF deserializer here as we're reading a CTF file. The CTF file contains two streams: features and labels. We'll have to define separate stream sources for these.

In [2]:
from cntk.io import StreamDef, StreamDefs, MinibatchSource, CTFDeserializer, INFINITELY_REPEAT

labels_stream = StreamDef(field='labels', shape=3, is_sparse=False)
features_stream = StreamDef(field='features', shape=4, is_sparse=False)

deserializer = CTFDeserializer('iris.ctf', StreamDefs(labels=labels_stream, features=features_stream))

minibatch_source = MinibatchSource(deserializer, randomize=True, max_sweeps=INFINITELY_REPEAT)

## Building the model
The model is a basic classification model. We use one hidden layer and an output layer. 
Both have a sigmoid activation function. 

In [3]:
from cntk import default_options, input_variable
from cntk.layers import Dense, Sequential
from cntk.ops import log_softmax, relu, sigmoid

model = Sequential([
    Dense(4, activation=sigmoid),
    Dense(3, activation=log_softmax)
])

features = input_variable(4)
labels = input_variable(3)

z = model(features)

The loss for this model is a cross-entropy loss function. We're using a SGD learner to optimize the parameters in the network.

In [4]:
from cntk.losses import cross_entropy_with_softmax
from cntk.learners import sgd 

loss = cross_entropy_with_softmax(z, labels)
learner = sgd(z.parameters, 0.1)

## Training the model
Now that we have a minibatch source we can setup a training session and a trainer. The trainer uses the loss and learner to train the model. The training session is configured to read training data from our minibatch source and feeds the data into the trainer to optimize the parameters of the model.

In [5]:
from cntk.logging import ProgressPrinter
from cntk.train import Trainer, training_session

minibatch_size = 16
samples_per_epoch = 150
num_epochs = 30
max_samples = samples_per_epoch * num_epochs

input_map = {
    features: minibatch_source.streams.features,
    labels: minibatch_source.streams.labels
}

progress_writer = ProgressPrinter(0)

train_history = loss.train(minibatch_source, 
           parameter_learners=[learner],
           model_inputs_to_streams=input_map,
           callbacks=[progress_writer],
           epoch_size=samples_per_epoch,
           max_epochs=num_epochs)

 average      since    average      since      examples
    loss       last     metric       last              
 ------------------------------------------------------
Learning rate per minibatch: 0.1
     1.21       1.21          0          0            32
     1.15       1.12          0          0            96
     1.09       1.09          0          0            32
     1.03       1.01          0          0            96
    0.999      0.999          0          0            32
    0.999      0.998          0          0            96
    0.972      0.972          0          0            32
    0.968      0.966          0          0            96
    0.928      0.928          0          0            32
    0.957      0.972          0          0            96
    0.928      0.928          0          0            32
    0.936       0.94          0          0            96
    0.922      0.922          0          0            32
    0.919      0.917          0          0            96
 