# Writing a custom dataset
This notebook will walk you through implementing a custom iterator for a modified version of the Street View House Number (SVHN) dataset. You will then design a network to train on this dataset. 

## SVHN dataset

This dataset is a collection of 73,257 images of house numbers collected from Google Streetview. The original dataset has bounding boxes for all the digits in the image:

<img src="http://ufldl.stanford.edu/housenumbers/examples_new.png" width=500px>

We have modified the dataset such that each image is 64x64 pixels (with 3 color channels), and the target is a *single* bounding box over all the digits. Your goal is to build a network that, given an image, returns bounding box coordinates for the location of the digit sequence.

This notebook is split into two parts:
* Writing a custom dataiterator
* Building a prediction network

## Custom dataset

Because the training set of ~27,000 images can fit into the memory of a single Titan X GPU, we could use the `ArrayIterator` class to provide data to the model. However, when the dataset may have more images or larger image sizes, that is no longer an option. Our high-performance `DataLoader`, which loads image in batches and performs complex augmentation, cannot currently handle bounding box data (stay tuned, an object localization dataloader is coming in a future neon release!).

We've saved the dataset as a pickle file `svhn_64_box_truncated.p`. This file has a few variables:
- `X_train`: a numpy array of shape `(num_examples, num_features)`, where `num_examples = 26624`, and `num_features = 3*64*64 = 12288`
- `y_train`: a numpy array of shape `(num_examples, 4)`, with the target bounding box coordinates in `(x_min, y_min, x_max, y_max)` format.

Let's first import our backend:

In [None]:
from neon.backends import gen_backend

be = gen_backend(batch_size=128)

Below is a skeleton of the SVHN data iterator for you to fill out, with notes to help along the way. The goal is an object that returns, with each call, a tuple of `(X, Y)`, where:
- `X`: tensor of shape (num_features, batch_size)
- `Y`: tensor of shape (4, batch_size)

In [None]:
# import some useful packages
from neon.data import NervanaDataIterator
import numpy as np
import cPickle
import os

class SVHN(NervanaDataIterator):

    def __init__(self):

        # load data from pickle file
        with open('svhn/svhn_64_box_truncated.p') as f:
            data = cPickle.load(f)

        # generate some random numbers
        self.X = data['X_train'] / 255.
        self.Y = data['y_train']
        
        # allocate buffers on the GPU
        self.dev_X = self.be.zeros((self.X.shape[1], self.be.bsz))
        self.dev_Y = self.be.zeros((self.Y.shape[1], self.be.bsz))

        # assign some required attributes
        self.ndata = self.X.shape[0]  # number of examples
        self.nbatches = self.ndata / self.be.bsz  # number of batches
        self.start = 0  # start at zero
        self.shape = (3, 64, 64)  # shape of the input

    def __iter__(self):
        for index in range(self.start, self.ndata, self.be.bsz):
            # grab the right slice from the numpy arrays
            inputs = self.X[index:(index + self.be.bsz), :]
            outputs = self.Y[index:(index + self.be.bsz), :]
            inputs = np.ascontiguousarray(inputs.T)
            outputs = np.ascontiguousarray(outputs.T)
            # transfer to device
            self.dev_X.set(inputs)
            self.dev_Y.set(outputs)

            yield (self.dev_X, self.dev_Y)

Check your implementation! Below we grab an iteration and print out the output of the dataset. Importantly: make sure that the output tensors are contiguous (e.g. `is_contiguous = True` in the output below). This means that they are allocated on a contiguous set of memory, which is important for the downstream calculations. Contiguity can be broken by operations like transpose.

In [None]:
# setup datasets
train_set = SVHN(set_name="train")
test_set = SVHN(set_name="test")

# grab one iteration from the train_set
iterator = train_set.__iter__()
(X, Y) = iterator.next()
print X
print Y

If all goes well, you are ready to try training on this network! First, let's reset the dataset to zero (since you drew one example from above).

In [None]:
SVHN.reset()

### Model architecture
To get you started, below we use a toy example that reaches ?? cost after 10 epochs. But you can do better! Play around with adding more layers. 

If you are feeling ambitious, you can delete the below and try to build a model from scratch.

In [None]:
from neon.callbacks.callbacks import Callbacks
from neon.initializers import Gaussian
from neon.layers import GeneralizedCost, Affine, Conv, Pooling, Linear, Dropout
from neon.models import Model
from neon.optimizers import GradientDescentMomentum, RMSProp
from neon.transforms import Rectlin, Logistic, CrossEntropyMulti, Misclassification, SumSquared

init_norm = Gaussian(loc=0.0, scale=0.01)

# set up model layers
conv = dict(init=init_norm, batch_norm=True, activation=Rectlin())
convp1 = dict(init=init_norm, batch_norm=True, activation=Rectlin(), padding=1)

layers = [Conv((3, 3, 64), **convp1),  # 64x64 feature map
          Conv((3, 3, 64), **convp1),
          Pooling((2, 2)),
          Dropout(keep=.5),
          Conv((3, 3, 96), **convp1),  # 32x32 feature map
          Conv((3, 3, 96), **convp1),
          Pooling((2, 2)),
          Dropout(keep=.5),
          Conv((3, 3, 128), **convp1),  # 16x16 feature map
          Conv((3, 3, 128), **convp1),
          Pooling((2, 2)),
          Dropout(keep=.5),
          Conv((3, 3, 192), **convp1),  # 8x8 feature map
          Conv((1, 1, 192), **conv),
          Linear(nout=4, init=init_norm)] # last layer good for bbox

# use SumSquared cost
cost = GeneralizedCost(costfunc=SumSquared())

# setup optimizer
optimizer = RMSProp()

# initialize model object
mlp = Model(layers=layers)

# configure callbacks
callbacks = Callbacks(mlp)

# run fit
mlp.fit(train_set, optimizer=optimizer, num_epochs=10, cost=cost, callbacks=callbacks)