# Theano and Lasagne for Multi-layer Perceptron

The next architecture we are going to present using Theano is the single-hidden-layer Multi-Layer Perceptron (MLP). An MLP can be viewed as a logistic regression classifier where the input is first transformed using a learnt non-linear transformation $\Phi$. This transformation projects the input data into a space where it becomes linearly separable. This intermediate layer is referred to as a hidden layer. A single hidden layer is sufficient to make MLPs a universal approximator. However we will see later on that there are substantial benefits to using many such hidden layers, i.e. the very premise of deep learning. 

## The Model

Formally, a one-hidden-layer MLP is a function $f: R^D \rightarrow R^L,$ where $D$ is the size of input vector $x$ and $L$ is the size of the output vector $f(x)$, such that, in matrix notation:

\begin{equation}
f(x) = G( b^{(2)} + W^{(2)}( s( b^{(1)} + W^{(1)} x))),
\end{equation}

with bias vectors $b^{(1)}, b^{(2)}$; weight matrices $W^{(1)}, W^{(2)}$ and activation functions $G$ and $s$.

The vector $h(x) = \Phi(x) = s(b^{(1)} + W^{(1)} x)$ constitutes the hidden layer. $W^{(1)} \in R^{D \times D_h}$ is the weight matrix connecting the input vector to the hidden layer. Each column $W^{(1)}_{\cdot i}$ represents the weights from the input units to the i-th hidden unit. Typical choices for $s$ include $tanh$, with $tanh(a)=(e^a-e^{-a})/(e^a+e^{-a})$, or the logistic sigmoid function, with $sigmoid(a)=1/(1+e^{-a})$. We will be using $ReLU = max(0,x)$. 

The output vector is then obtained as: $o(x) = G(b^{(2)} + W^{(2)} h(x))$. As before, class-membership probabilities can be obtained by choosing $G$ as the softmax function (in the case of multi-class classification).

## Lasagne
Lasagne is a lightweight library to build and train neural networks in Theano. we will be using this library for making our code more comprehensible.With Lasagne, we will be able to use better descent algorithms like Nestrove momentum, AdaGrad etc.

We assume that you have already installed Theano. Now for installing Lasagne, open your anaconda prompt and type

<code> conda install lasagne </code>

Press 'y' to confirm installation and you are ready to go. 

You can clone the git repository of Lasagne by opening your GIT Bash, going to the location where you want it to be installed and typing

<code> git clone https://github.com/Lasagne/Lasagne.git </code>

Now, we have installed it in the same folder as the one in which this notebook exists.

### Now lets get started!



In [5]:
# Loading required directories
import numpy as np
import theano
import theano.tensor as T

import lasagne
import time
import csv

In [11]:
# A simple function to load the dataset.
#from code import lasagne_dataset_loader as dl
#X_train, y_train, X_val, y_val, X_test, y_test = dl.load_dataset()
csv_file = csv.reader(open("train.csv"))
header = csv_file.next()
data = []
for row in csv_file:
    data.append(row)
data_full = np.array(data, dtype = "float32")
datax = data_full[:,1:]
datay = data_full[:,0:1]
X_train = datax[0:37000,:]
y_train = datay[0:37000,:]
y_train.shape =(int(np.shape(y_train)[0]))
X_val =datax[37000:,:]
y_val = datay[37000:,:]
y_val.shape =(int(np.shape(y_val)[0]))

In [22]:
from code import lasagne_dataset_loader as dl
X_train, y_train, X_val, y_val, X_test, y_test = dl.load_dataset()

In [13]:
# ############################# Batch iterator ###############################
# This is just a simple helper function iterating over training data in
# mini-batches of a particular size, optionally in random order. It assumes
# data is available as numpy arrays. For big datasets, you could load numpy
# arrays as memory-mapped files (np.load(..., mmap_mode='r')), or write your
# own custom data iteration function. For small datasets, you can also copy
# them to GPU at once for slightly improved performance. This would involve
# several changes in the main program, though, and is not demonstrated here.
# Notice that this function returns only mini-batches of size `batchsize`.
# If the size of the data is not a multiple of `batchsize`, it will not
# return the last (remaining) mini-batch.

def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
    assert len(inputs) == len(targets)
    if shuffle:
        indices = np.arange(len(inputs))
        np.random.shuffle(indices)
    for start_idx in range(0, len(inputs) - batchsize + 1, batchsize):
        if shuffle:
            excerpt = indices[start_idx:start_idx + batchsize]
        else:
            excerpt = slice(start_idx, start_idx + batchsize)
        yield inputs[excerpt], targets[excerpt]

## Multi-Layer Perceptron

The MLP we will be making will be an MLP of two hidden layers of 80 units each, followed by a softmax output layer of 10 units. It applies 20% dropout to the input data and 50% dropout to the hidden layers. It is similar, but not fully equivalent to the smallest MLP in [Hinton2012](http://lasagne.readthedocs.io/en/latest/user/tutorial.html#hinton2012) (that paper uses different nonlinearities, weight initialization and training). We can easily change this by changing the values in the function.

## Glorot's scheme
We use the Glorot's scheme for initialization. More can be read [here](http://andyljones.tumblr.com/post/110998971763/an-explanation-of-xavier-initialization)


In [14]:
def build_mlp(input_var=None):
    # This creates an MLP of two hidden layers of 80 units each, followed by
    # a softmax output layer of 10 units. It applies 20% dropout to the input
    # data and 50% dropout to the hidden layers.

    #The four numbers in the shape tuple represent, in order: (batchsize, channels, rows, columns)
    # Input layer, specifying the expected input shape of the network
    # (unspecified batchsize, 1 channel, 28 rows and 28 columns) and
    # linking it to the given Theano variable `input_var`, if any:
    l_in = lasagne.layers.InputLayer(shape=(None, 1, 28, 28),
                                     input_var=input_var)

    # Apply 20% dropout to the input data:
    l_in_drop = lasagne.layers.DropoutLayer(l_in, p=0.2)

    # Add a fully-connected layer of 80 units, using the linear rectifier, and
    # initializing weights with Glorot's scheme (which is the default anyway):
    l_hid1 = lasagne.layers.DenseLayer(
            l_in_drop, num_units=80,
            nonlinearity=lasagne.nonlinearities.rectify,
            W=lasagne.init.GlorotUniform())

    # We'll now add dropout of 50%:
    l_hid1_drop = lasagne.layers.DropoutLayer(l_hid1, p=0.5)

    # Another 80-unit layer:
    l_hid2 = lasagne.layers.DenseLayer(
            l_hid1_drop, num_units=80,
            nonlinearity=lasagne.nonlinearities.rectify)

    # 50% dropout again:
    l_hid2_drop = lasagne.layers.DropoutLayer(l_hid2, p=0.5)

    # Finally, we'll add the fully-connected output layer, of 10 softmax units:
    l_out = lasagne.layers.DenseLayer(
            l_hid2_drop, num_units=10,
            nonlinearity=lasagne.nonlinearities.softmax)

    # Each layer is linked to its incoming layer(s), so we only need to pass
    # the output layer to give access to a network in Lasagne:
    return l_out

In [15]:
# Prepare Theano variables for inputs and targets
input_var = T.tensor4('inputs')
target_var = T.ivector('targets')

# Create neural network model (depending on first command line parameter)
print("Building model and compiling functions...")
network = build_mlp(input_var)


Building model and compiling functions...


In [16]:
#Create a loss expression for training, i.e., a scalar objective we want
# to minimize (for our multi-class problem, it is the cross-entropy loss):
prediction = lasagne.layers.get_output(network)
loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)
loss = loss.mean()

In [17]:
# Create update expressions for training, i.e., how to modify the
# parameters at each training step. Here, we'll use Stochastic Gradient
# Descent (SGD) with Nesterov momentum, but Lasagne offers plenty more.
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.nesterov_momentum(
        loss, params, learning_rate=0.01, momentum=0.9)

In [18]:
# Create a loss expression for validation/testing. The crucial difference
# here is that we do a deterministic forward pass through the network,
# disabling dropout layers.
test_prediction = lasagne.layers.get_output(network, deterministic=True)
test_loss = lasagne.objectives.categorical_crossentropy(test_prediction,
                                                        target_var)
test_loss = test_loss.mean()
# As a bonus, also create an expression for the classification accuracy:
test_acc = T.mean(T.eq(T.argmax(test_prediction, axis=1), target_var),
                  dtype=theano.config.floatX)
test_classification = T.argmax(test_prediction,axis=1)

In [20]:
# Compile a function performing a training step on a mini-batch (by giving
# the updates dictionary) and returning the corresponding training loss:
train_fn = theano.function([input_var, target_var], loss, updates=updates)

# Compile a second function computing the validation loss and accuracy:
val_fn = theano.function([input_var, target_var], [test_loss, test_acc])
# for true test set
predict_fn = theano.function([input_var],[test_prediction,test_classification])


In [21]:
# We iterate over epochs:
# setting the number of epochs to train the network over
num_epochs = 15
for epoch in range(num_epochs):
    # In each epoch, we do a full pass over the training data:
    train_err = 0
    train_batches = 0
    start_time = time.time()
    for batch in iterate_minibatches(X_train, y_train, 500, shuffle=True):
        inputs, targets = batch
        train_err += train_fn(inputs, targets)
        train_batches += 1
    # And a full pass over the validation data:
    val_err = 0
    val_acc = 0
    val_batches = 0
    for batch in iterate_minibatches(X_val, y_val, 500, shuffle=False):
        inputs, targets = batch
        err, acc = val_fn(inputs, targets)
        val_err += err
        val_acc += acc
        val_batches += 1
    # Then we print the results for this epoch:
    print("Epoch {} of {} took {:.3f}s".format(
        epoch + 1, num_epochs, time.time() - start_time))
    print("  training loss:\t\t{:.6f}".format(train_err / train_batches))
    print("  validation loss:\t\t{:.6f}".format(val_err / val_batches))
    print("  validation accuracy:\t\t{:.2f} %".format(
        val_acc / val_batches * 100))
# After training, we compute and print the test error:
#test_err = 0
#test_acc = 0
#test_batches = 0
#for batch in iterate_minibatches(X_test, y_test, 500, shuffle=False):
#    inputs, targets = batch
#    err, acc = val_fn(inputs, targets)
#    test_err += err
#    test_acc += acc
#    test_batches += 1
#print("Final results:")
#print("  test loss:\t\t\t{:.6f}".format(test_err / test_batches))
#print("  test accuracy:\t\t{:.2f} %".format(
#    test_acc / test_batches * 100))
# Optionally, you could now dump the network weights to a file like this:
# np.savez('model.npz', *lasagne.layers.get_all_param_values(network))
#
# And load them again later on like this:
# with np.load('model.npz') as f:
#     param_values = [f['arr_%d' % i] for i in range(len(f.files))]
# lasagne.layers.set_all_param_values(network, param_values)

TypeError: ('Bad input argument to theano function with name "<ipython-input-20-b98bcffd9b54>:3"  at index 0(0-based)', 'Wrong number of dimensions: expected 4, got 2 with shape (500L, 784L).')

## Performance on Kaggle Test Set

Now we will load the kaggle test csv file and find out the accuracy of the model in those data points.

In [53]:
#loading the data
import csv
csv_file = csv.reader(open("test.csv"))
header = csv_file.next()
data = []
for row in csv_file:
    data.append(row)
datax = np.array(data, dtype = "float32")
shared_xx = theano.shared(np.asarray(data,
                        dtype=theano.config.floatX),
                        borrow=True)

In [55]:
final_prediction,classes = predict_fn(shared_xx.get_value())
final_prediction[0]#len(final_prediction)
len(classes)

TypeError: ('Bad input argument to theano function with name "<ipython-input-47-b98bcffd9b54>:8"  at index 0(0-based)', 'Wrong number of dimensions: expected 4, got 2 with shape (28000L, 784L).')

In [66]:
np.shape(X_val)

(10000L, 1L, 28L, 28L)