# Building a new layer

This tutorial is similar to the model specified in `examples/mnist_mlp.py`.

## Preamble
The first step is to set up our compute backend, and initialize our dataset.

In [None]:
import neon
print neon.__version__

from neon.backends import gen_backend
be = gen_backend('gpu', batch_size=128)

from neon.data import load_mnist
from neon.data import ArrayIterator

# download or reuse cached data
(X_train, y_train), (X_test, y_test), nclass = load_mnist()

# setup training and test set iterator
train_set = ArrayIterator(X_train, y_train, nclass=nclass)
test_set = ArrayIterator(X_test, y_test, nclass=nclass)

## Adding new functionality
We demonstrate through simple examples how to add an activation function and layer to neon.

### Build your own layer
Instead of importing the neon supplied 'Affine' Layer, we will build our own.

Note- Affine is a 'container' layer- meaning it bundles real layer with an activation and batch normalization layers.  The real layer inside of Affine is a 'Linear' layer which implements a fully connected MLP layer.  

First, lets build a linear layer, and then we will wrap it in an affine container.

In the implementation below, fprop is implemented using element-wise operations.  It will be very slow.  Try replacing it with the neon backend implementation of compound_dot, like in the bprop function.

In [None]:
from neon.layers.layer import ParameterLayer, interpret_in_shape

class MyLinear(ParameterLayer):

    def __init__(self, nout, init, name=None):
        super(MyLinear, self).__init__(init, name, "Disabled")
        self.nout = nout
        self.inputs = None

    def __str__(self):
        return "Linear Layer '%s': %d inputs, %d outputs" % (
               self.name, self.nin, self.nout)

    def configure(self, in_obj):
        """
        Define some sizes that get used by the allocate method inherited from Layer.
        """
        super(MyLinear, self).configure(in_obj)
        
        (self.nin, self.nsteps) = interpret_in_shape(self.in_shape)
        
        self.out_shape = (self.nout, self.nsteps)
        if self.weight_shape is None:
            self.weight_shape = (self.nout, self.nin)
      
        return self

    def fprop(self, inputs, inference=False, beta=0.0):
        self.inputs = inputs

        for r in range(self.outputs.shape[0]):
            for c in range(self.outputs.shape[1]):
                self.outputs[r,c] = self.be.sum(self.be.multiply(self.W[r], self.inputs[:,c].T))

        return self.outputs

    def bprop(self, error, alpha=1.0, beta=0.0):
        if self.deltas:
            self.be.compound_dot(A=self.W.T, B=error, C=self.deltas, alpha=alpha, beta=beta)
        self.be.compound_dot(A=error, B=self.inputs.T, C=self.dW)
        return self.deltas

Wrap the raw layer in a container, which bundles an activation and batch normalization.

In [None]:
from neon.layers.layer import CompoundLayer
class MyAffine(CompoundLayer):

    def __init__(self, nout, init, bias=None,
                 batch_norm=False, activation=None, name=None):
        super(MyAffine, self).__init__(bias=bias, activation=activation, name=name)
        self.append(MyLinear(nout, init, name=name))
        self.add_postfilter_layers()

## Defining an activation function (transform)

We can play with more of the backend element-wise functions with this example.

Implement the Softmax function.  


In [None]:
from neon.transforms.transform import Transform

class MySoftmax(Transform):
    """
    SoftMax activation function. Ensures that the activation output sums to 1.
    """
    def __init__(self, name=None, epsilon=2**-23):
        """
        Class constructor.
        Arguments:
            name (string, optional): Name (default: none)
            epsilon (float, optional): Not used.
        """
        super(MySoftmax, self).__init__(name)
        self.epsilon = epsilon

    def __call__(self, x):
        """
        Implement the softmax function.
        """
        return (self.be.reciprocal(self.be.sum(
                self.be.exp(x - self.be.max(x, axis=0)), axis=0)) *
                self.be.exp(x - self.be.max(x, axis=0)))

    def bprop(self, x):
        """
        We take a shortcut here- the derivative cancels out with the CrossEntropy term.
        """
        return 1


### Putting together all of the pieces
The architecture here is the same as in the mnist_mlp.py example.  In summary, 2 fully connected layers, one larger hidden layer with rectified linear units and one the size of the number of output classes with a softmax.

Use our activation and layer rather than the neon provided ones.

In [None]:
from neon.initializers import Gaussian
from neon.models import Model
from neon.transforms.activation import Rectlin

init_norm = Gaussian(loc=0.0, scale=0.01)

# assemble all of the pieces
layers = []
layers.append(MyAffine(nout=100, init=init_norm, activation=Rectlin()))
layers.append(MyAffine(nout=10, init=init_norm, activation=MySoftmax()))

# initialize model object
mlp = Model(layers=layers)

### Fit
Using Cross Entropy loss and Gradient Descent optimizer, train the model.

In [None]:
from neon.layers import GeneralizedCost
from neon.transforms import CrossEntropyMulti
from neon.optimizers import GradientDescentMomentum
from neon.callbacks.callbacks import Callbacks

cost = GeneralizedCost(costfunc=CrossEntropyMulti())
optimizer = GradientDescentMomentum(0.1, momentum_coef=0.9)
callbacks = Callbacks(mlp, eval_set=test_set)

mlp.fit(train_set, optimizer=optimizer, num_epochs=10, cost=cost,
        callbacks=callbacks)

At the beginning of the fitting procedure, neon propagates train_set through the model to set the input and output shapes of each layer. Each layer has a configure() method that determines the appropriate layer shapes, and an allocate() method to set up the needed buffers for holding the forward propagation information.

During the training, neon sends batches of the training data through the model, calling each layers’ fprop() and bprop() methods to compute the gradients and update the weights.

### Using the trained model
Now that the model is successfully trained, we can use the trained model to classify a novel image, measure performance, and visualize the weights and training results.

#### Inference
Given a set of images such as those contained in the iterable test_set, we can fetch the ouput of the final model layer via



In [None]:
results = mlp.get_outputs(test_set)

The variable results is a numpy array with shape (num_test_examples, num_outputs) = (10000,10) with the model probabilities for each label.

#### Performance
Neon supports convenience functions for evaluating performance using custom metrics. Here we measure the misclassification rate on the held out test set.

In [None]:
from neon.transforms import Misclassification

# evaluate the model on test_set using the misclassification metric
error = mlp.eval(test_set, metric=Misclassification())*100
print('Misclassification error = %.1f%%' % error)