# Writing a custom dataset
This notebook will walk you through designing a network on the Street View House Number (SVHN) dataset. 

## SVHN dataset

This dataset is a collection of 73,257 images of house numbers collected from Google Streetview. The original dataset has bounding boxes for all the digits in the image:

<img src="http://ufldl.stanford.edu/housenumbers/examples_new.png" width=500px>

We have modified the dataset such that each image is 64x64 pixels (with 3 color channels), and the target is a *single* bounding box over all the digits. Your goal is to build a network that, given an image, returns bounding box coordinates for the location of the digit sequence.

## Data

We've saved the dataset as a pickle file `svhn_64.p`. This file has a few variables:
- `X_train`: a numpy array of shape `(num_examples, num_features)`, where `num_examples = 26624`, and `num_features = 3*64*64 = 12288`
- `y_train`: a numpy array of shape `(num_examples, 4)`, with the target bounding box coordinates in `(x_min, y_min, w, h)` format.
- `X_test`: a numpy array of shape `(3328, 12288)`
- `y_test`: a numpy array of shape `(3328, 4)`

Let's first import our backend:

In [None]:
from neon.backends import gen_backend

be = gen_backend(batch_size=128, backend='gpu')

# set the debug level to 10 (the minimum)
# to see all the output
import logging
main_logger = logging.getLogger('neon')
main_logger.setLevel(10)

Next, we load the pickle file with our SVHN dataset.

In [None]:
import cPickle

fileName = 'data/svhn_64.p'
print("Loading {}...".format(fileName))

with open(fileName) as f:
    svhn = cPickle.load(f)

We've written a custom data iterator for this dataset which will, with each call, return a tuple of `(X, Y)` for the input and the target bounding boxes.

In [None]:
from data.SVHN import SVHN

Below we grab an iteration and print out the output of the dataset.

In [None]:
# setup datasets
train_set = SVHN(X=svhn['X_train'], Y=svhn['y_train'], lshape=(3, 64, 64))

# grab one iteration from the train_set
iterator = train_set.__iter__()
(X, Y) = iterator.next()
print X  # this should be shape (12288, 128)
print Y  # this should be shape (4, 128)

You are now ready to try training on this data! First, let's reset the dataset to zero (since you drew one example from above). We also add a test set for evaluation.

In [None]:
train_set.reset()

# generate test set
test_set = SVHN(X=svhn['X_test'], Y=svhn['y_test'], lshape=(3, 64, 64))

### Model architecture
We recommend using a VGG-style convolutional neural network to train this model (alternating Convolution and Pooling layers). We've imported some relevant packages that you may want to use, and put in a tiny toy network. Experiment with networks of different sizes and depths!

Some tips:
- Training a model for 10 epochs should take 30s/epoch. If you are taking longer than that, your network is too large.
- Compare the training set cost and the validation set loss to make sure you are not overfitting on the data.
- Try to get a validation set loss of ~220 after 10 epochs

Note: Because the goal of the network is output a bounding box, the last layer has 4 units for the four coordinates of the bounding box.

In [None]:
from neon.callbacks.callbacks import Callbacks
from neon.initializers import Gaussian
from neon.layers import GeneralizedCost, Affine, Conv, Pooling, Linear, Dropout
from neon.models import Model
from neon.optimizers import GradientDescentMomentum, RMSProp
from neon.transforms import Rectlin, Logistic, CrossEntropyMulti, Misclassification, SumSquared

init_norm = Gaussian(loc=0.0, scale=0.01)

# set up model layers
conv = dict(init=init_norm, batch_norm=True, activation=Rectlin())
convp1 = dict(init=init_norm, batch_norm=True, activation=Rectlin(), padding=1)

layers = [Conv((3, 3, 64), **convp1),  # 64x64 feature map
          Conv((3, 3, 64), **convp1),
          Pooling((2, 2)),
          Dropout(keep=.5),
          Conv((3, 3, 96), **convp1),  # 32x32 feature map
          Conv((3, 3, 96), **convp1),
          Pooling((2, 2)),
          Linear(nout=4, init=init_norm)] # last layer good for bbox

# use SumSquared cost
cost = GeneralizedCost(costfunc=SumSquared())

# setup optimizer
optimizer = RMSProp()

# initialize model object
mlp = Model(layers=layers)

# configure callbacks
callbacks = Callbacks(mlp, eval_set=test_set, output_file='data.h5', eval_freq=1)

# run fit
mlp.fit(train_set, optimizer=optimizer, num_epochs=10, cost=cost, callbacks=callbacks)

Below we plot the cost data over time to help you visualize the training progress. This is similiar to using the `nvis` command line tool to generate plots.

In [None]:
from neon.visualizations.figure import cost_fig, hist_fig, deconv_summary_page
from neon.visualizations.data import h5_cost_data, h5_hist_data, h5_deconv_data
from bokeh.plotting import output_notebook, show

cost_data = h5_cost_data('data.h5', False)
output_notebook()
show(cost_fig(cost_data, 300, 600, epoch_axis=False))

To understand how the network performed, we sample images and plot the network's predicted bounding box against the ground truth bounding box. We evaluate this on the `test_set`, which was not used to train the network.

In [None]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

# get a minibatch's worth of
# inputs (X) and targets (T)
iterator = test_set.__iter__()
(X, T) = iterator.next()

# fprop the input to get the model output
y = mlp.fprop(X)

# transfer from device to numpy arrays
y = y.get()
T = T.get()

Our ground truth box `T` and the model prediction `y` are both arrays of size `(4, batch_size)`. We can plot an image below. Feel free to modify `i` to check performance on various test images. Red boxes are the model's guess, and blue boxes are the ground truth boxes.

In [None]:
plt.figure(2)
imgs_to_plot = [0, 1, 2, 3]
for i in imgs_to_plot:
    plt.subplot(2, 2, i+1)

    title = "test {}".format(i)
    plt.imshow(X.get()[:, i].reshape(3, 64, 64).transpose(1, 2, 0))
    ax = plt.gca()
    ax.add_patch(plt.Rectangle((y[0,i], y[1,i]), y[2,i], y[3,i], fill=False, edgecolor="red")) # model guess
    ax.add_patch(plt.Rectangle((T[0,i], T[1,i]), T[2,i], T[3,i], fill=False, edgecolor="blue")) # ground truth
    plt.title(title)
    plt.axis('off')

In [None]:
i=0
print "Target box had coordinates: {}".format(T[:,i])
print "Model prediction has coordinates: {}".format(y[:, i])