### Building a Multi-Layer Perceptron in Keras

In this demo we'll try to build a Multi-Layer Perceptron using the library Keras. The Keras functionality is also available in Tensorflow if you import ```tf.keras``` instead of ```keras``` directly.

It is very simple to create an MLP in Keras and it can be done in a few steps.

Let's just import some stuff first.

In [1]:
import numpy as np

from keras.layers import Dense, Dropout
from keras.models import Sequential
from keras.datasets import boston_housing

from sklearn.metrics import mean_absolute_error

Using TensorFlow backend.


In [2]:
# Loading the dataset
(x_train, y_train), (x_test, y_test) = boston_housing.load_data()
x_train.shape, y_train.shape, x_test.shape, y_test.shape

((404, 13), (404,), (102, 13), (102,))

We'll use the Boston Housing Dataset as an example. It's a regression problem. This means the number of neurons in the last layer will always be equal to 1.

There are two ways of creating models in Keras, using the ```Sequential``` model and using the ```Model``` class. We'll do it using the first way this time.

**Steps**
1. Initialise model as ```Sequential()```.
2. Keep adding layers to it using the ```add()``` method.
3. Compile model using the ```compile()``` method.
4. Fit the training set using the ```fit()``` method.
5. Make predictions and evaluate using either the ```predict()``` method or ```evaluate()``` method.\

**Some things to remember**
1. The dimension of first layer will be the number of columns in the dataset and dimension of last layer will be according to the problem (see table below).
2. Activations in the hidden layers should most probably be ```ReLU``` or ```LeakyReLU```.They usually work best.
3. Deeper networks always learn more complex things, but if the dataset is small, you'll benefit from a shallow network.
4. Some things about the architecture vary depending on the problem:

| **Problem type** | **Dimension of output layer** | **Loss** | **Activation of last layer** | **Some metrics** |
| --- |:---:|:---:|:---:|---:|
| Regression | 1 | ```mean_squared_error``` | None | mae, mse |
| Binary classification | 1 | ```binary_crossentropy``` | Sigmoid | accuracy |
| Multi-class classification | No. of classes | ```categorical_crossentropy``` | Softmax | accuracy |


Finally, Keras offers a wide variety of layers, but the ones that we are interested in this lab are ```Dense``` and ```Dropout```. ```Dense``` is your typical fully connected layer in an MLP, while ```Dropout``` is a special kind of layer designed for regularization in Keras models (will be explained in lab today or in the next lab).

In [3]:
# Build the architecture
model = Sequential()
model.add(Dense(13, activation='relu'))
model.add(Dense(5, activation='relu'))
model.add(Dropout(0.3))
model.add(Dense(1))

In [4]:
# Compile 
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae'])

In [5]:
# Fit the training set on the model
model.fit(x_train, y_train, validation_split=0.2, epochs=15)

Train on 323 samples, validate on 81 samples
Epoch 1/15
Epoch 2/15
Epoch 3/15
Epoch 4/15
Epoch 5/15
Epoch 6/15
Epoch 7/15
Epoch 8/15
Epoch 9/15
Epoch 10/15
Epoch 11/15
Epoch 12/15
Epoch 13/15
Epoch 14/15
Epoch 15/15


<keras.callbacks.History at 0x7fe056783048>

**What did I make?**

You can print the summary of the model using the ```summary()``` method as follows:

In [6]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_1 (Dense)              (None, 13)                182       
_________________________________________________________________
dense_2 (Dense)              (None, 5)                 70        
_________________________________________________________________
dropout_1 (Dropout)          (None, 5)                 0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 6         
Total params: 258
Trainable params: 258
Non-trainable params: 0
_________________________________________________________________


Right now our model is small and shallow and so you probably don't need the ```summary()``` method. But for large networks, typically with Convolutional Networks, you should pay attention to the number of parameters, as this will give you an estimate about if the model will be trainable in your CPU or not.

In [7]:
pred = model.predict(x_test)
y_test = np.reshape(y_test, (y_test.shape[0], 1))

In [8]:
print ("MAE on test =", mean_absolute_error(pred, y_test))

MAE on test = 14.288117753801979


Other important related stuff which you can use in Keras and references (**you should absolutely read these**):
1. [Early Stopping](https://chrisalbon.com/deep_learning/keras/neural_network_early_stopping/)
2. [Model Checkpoint](https://chrisalbon.com/deep_learning/keras/neural_network_early_stopping/)
3. [Dropout](https://machinelearningmastery.com/dropout-regularization-deep-learning-models-keras/)
4. [Batch Normalization](https://www.dlology.com/blog/one-simple-trick-to-train-keras-model-faster-with-batch-normalization/)