## Keras - Multilayer Perceptron
### Generating an end to end simple MLP for MNIST data classification
Wondering what is MLP??
(Wiki to the rescue) https://en.wikipedia.org/wiki/Multilayer_perceptron
<img src="./images/MLP-1.png" width="600" height="400">
Now that we have looked into various layers available and have made ourselves comfortable with prototxt files, we will move a step ahead and design a MLP and write a solver that trains it on MNIST data. Here our focus would be on building the layers with given number of neurons, selection of appropriate non-linear activation units and using a standard classification loss function (cross entropy). For training the network, we will use the stochastic gradient descent algorithm using backpropagation. The participants are expected to modify the parameters and observe the impact on performance and training time.

Here is one possible implimentation:

Reference:https://github.com/fchollet/keras/issues/112#issuecomment-101079731

#### The basic steps involved would be:

1. Loading and pre-processing the data
2. Building the sequential net
3. Compiling the net
4. Training the net
5. Plotting the loss and accuracy (not required for actual computation but helps in understanding the process)
6. Testing the net

### 1. Importing the required stuff

In [None]:
from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

#Load the dataset
from keras.datasets import mnist
# Import the sequential module from keras
from keras.models import Sequential
# Import the layers you wish to use in your net
from keras.layers.core import Dense, Dropout, Activation
# Import the optimization algorithms that you wish to use
from keras.optimizers import SGD, Adam, RMSprop
# Import other utilities that help in data formatting etc.
from keras.utils import np_utils

Set params for net

In [None]:
batch_size = 512
nb_classes = 10
nb_epoch = 20

### 2. Load input

#### About MNIST
The MNIST database (Mixed National Institute of Standards and Technology database) is a large database of handwritten digits that is commonly used for training various image processing systems. The database is also widely used for training and testing in the field of machine learning. Sample images may look like:
<img src="./images/mnist_digits.png" width="400" height="400">

To know more:https://en.wikipedia.org/wiki/MNIST_database

Preprocess the data if required. Here we have normalized the data to fall between 0 and 1.

In [None]:
# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# Preprocess the data
# We need to shape the data into a shape that network accepts.
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)

# Here we convert the data type of data to 'float32'
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

# We normalize the data to lie in the range 0 to 1.
X_train /= 255
X_test /= 255
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

### 3. Building a sequential net
Now we shall write the layers of the network. Initialise a sequential graph as model.

In [None]:
model = Sequential()

Add a fully connected input layer of size 784x512

In [None]:
model.add(Dense(512, input_shape=(784,)))

On top of it add a non-linear activation function and dropout

In [None]:
model.add(Activation('relu'))

Now add the hidden layer of size 512 and a non-linearity

In [None]:
model.add(Dense(512))
model.add(Activation('relu'))

Finally put an output layer

In [None]:
model.add(Dense(10))
model.add(Activation('softmax'))

### 4. Compile the net
In keras once a model is defined you need to compile it. You would need to specify, the <b>loss function</b>, here we have use 'categorical_crossentropy', an <b>optimizer</b>, here simple gradient descent is used, and then an metric of evaluation, here 'accuracy'.

Once your model looks good, configure its learning process with .compile():

In [None]:
model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(),
              metrics=['accuracy'])

### 5. Training the net

In [None]:
history = model.fit(X_train, Y_train,
                    batch_size=batch_size, nb_epoch=nb_epoch,
                    verbose=1, validation_data=(X_test, Y_test))

### 6. Plotting accuracy and loss

We might be interested in seeing the accuracy or convergence of our model. To do so:

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
# list all data in history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

### 7. Testing the net
Print the predictions.

In [None]:
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Now to the fun parts...
### 8. Exercise

Q1 : Can you make the confusion matrix for the above network.

(Hint: What's confusion matrix?? Find here:http://www.dataschool.io/simple-guide-to-confusion-matrix-terminology/)

In [None]:
# Your *hardwork* here

Q2 : Try out the following variations and see how it affects loss and accuracy:
- Batch size
- Size of hidden layer
- choice of activation function ('relU','sigmoid','tanh')

In [None]:
# Your *hardwork* here

Q3: <b>Challenge</b>: Try classification on XOR data using similar MLP.

The problem can be stated as follows. Build a neural network that will produce the following truth table, called the 'exclusive or' or 'XOR' (either A or B but not both):

<img src="./images/xor-table.png" width="200" height="200">

Here (X,Y) will be the input and the label would be X xor Y.

Hint: You would have to downsize the network described above as XOR is just a four point(two dimentional) two class dataset as opposed to MNIST which has tens of thousands of samples with comparitively higher dimention and ten classes. But the architecture would remain the same.

In [None]:
# Your *hardwork* here

Plot the decision boundary and verify that it looks similar to:
<img src="./images/xor.png" width="200" height="200">

Functions that might come in handy, meshgrid, contour, linspace. (Moreover google exists for a reason!)