# Implementing MLPs with Keras

### What is **MLPs**?

Multilayer Perceptron (MLP) is a type of artificial neural network. This consists of one or more hidden layers of nodes, which are fully connected to the input and output layers. Each node in the hidden layer applies a linear transformation to the input and passes the result through a non-linear activation function.

**Perceptron** refers to a single-layer neural network and **Mulyilayer Perceptrion** refers to a neural network with multiple layers.

**MLPs** are supervised learning models, which means that they are trained using labeled data.

## Building an Image Classifier Using the Sequential API

In this case we are going to use a replacement of **MNIST** dataset, the **Fashion MNIST**. It has the exact same format as MNIST (70,000 grayscale images of 28 * 28 pixels each, with 10 classes), but the images represent fashion items rather than handwritten digits.

### Load the dataset

Keras provide functions to fetch and load common datasets. **Fashion MNIST** it's already shuffled and split into a training set (60,000 images) and a test set (10,000 images).

We'll hold out the last 5,000 images from training set for validation.

In [None]:
import tensorflow as tf

fashion_mnist = tf.keras.datasets.fashion_mnist.load_data()
(X_train_full, y_train_full), (X_test, y_test) = fashion_mnist
X_train, y_train = X_train_full[:-5000], y_train_full[:-5000]
X_valid, y_valid = X_train_full[-5000:], y_train_full[-5000:]

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/train-images-idx3-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-labels-idx1-ubyte.gz
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/t10k-images-idx3-ubyte.gz


When loading **Fasion MNIST** using Keras every image is represented as a 28 * 28 array. Moreover, the pixel intensities are represented as integers.

In [None]:
X_train.shape

(55000, 28, 28)

For simplicity, we’ll scale the pixel intensities down to the 0–1 range by dividing them by 255.0 (this also converts them to floats):

In [None]:
X_train, X_valid, X_test = X_train / 255., X_valid / 255., X_test / 255.

For **Fashion MNIST** we need the list of class names to know what we are dealing with.

In [None]:
class_names = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat",
               "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]


We can see what represent the first image of the training set.

In [None]:
class_names[y_train[0]]

'Ankle boot'

## Creaing the model using the sequential API

In [None]:
tf.random.set_seed(42)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Input(shape=[28, 28]))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(300, activation="relu"))
model.add(tf.keras.layers.Dense(100, activation="relu"))
model.add(tf.keras.layers.Dense(10, activation="softmax"))

- `tf.random.set_seed(42)`: sets a random seed for Tensorflow to ensure that the results are reproducible.
- `model = tf.keras.Sequential()`: The Sequential model in Keras API is a linear stack of layers that allow you to build a neural network model layer by layer. As we can see in the code, we add layers with different values.
- `model.add(tf.keras.layers.Input(shape=[28, 28]))`: This layer is used to define the input shape of the neural network. In this case we create an input layer for the model with a shape of *(28,28)*, which corresponds to the dimensions of the input omages in the Fashion MNIST dataset. With this info, Tensorflow will automatically infer the shapes of the intermediate layers in the network. This is because the input layer serves as the starting point for the forward propagation of the input data through the network.
- `model.add(tf.keras.layers.Flatten())`: This reshapes the input data into a 1-dimensional array.
- `model.add(tf.keras.layers.Dense(300, activation="relu"))`: Adds a dense layer to the model with 300 neurons and a ReLU activation function. The dense layer is fully connected, which means that each neuron is connected to every neuron in the previously layer.
> ReLU stans for Rectified Linear Unit and it is a type of activation function used in neural networks. The ReLU activation function is defined as f(x) = max(0, x), which means that if the input x is greater than zero, the output will be equal to x, and if x is less than or equal to zero, the output will be zero.
- `model.add(tf.keras.layers.Dense(100, activation="relu"))`: This layer takes the output of the first dense layer, which has 300 neurons, and applies a second set of weights to produce a new set of 100 activations. The purpose of adding a second layer is to allow the neural network to learn more complex relationships between the input and output.
- `model.add(tf.keras.layers.Dense(10, activation="softmax"))`: This line adds the output layer to the model with 10 neurons (one for each class in the Fashion MNIST dataset). *softmax* produces a probability distribution over the 10 classes.

We can do this in instead:

```
model = tf.keras.Sequential([

  tf.keras.layers.Flatten(input_shape=[28, 28]),

  tf.keras.layers.Dense(300, activation="relu"),

  tf.keras.layers.Dense(100, activation="relu"),

  tf.keras.layers.Dense(10, activation="softmax")

])
```

When we get the model we can use the `summary()` method that displays all the model's layers, including each layer's name, its output shape, and its number of parameters.

In [None]:
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 784)               0         
                                                                 
 dense (Dense)               (None, 300)               235500    
                                                                 
 dense_1 (Dense)             (None, 100)               30100     
                                                                 
 dense_2 (Dense)             (None, 10)                1010      
                                                                 
Total params: 266,610
Trainable params: 266,610
Non-trainable params: 0
_________________________________________________________________


The `Dense` layer often have a lot of parameters. For example, the first hidden layer has 784 * 300 connections weights, plus 300 bias term, which adds up to 235,500 parameters. This gives the model quite a lot of flexibility to fit the training data, but it also means that the model runs the risk of overfitting, especially when you do not have a lot of training data.

We can use the `layers` attributes to get the model's list of layers, or use the `get_layer()` method to access a layer by name.

In [None]:
model.layers

[<keras.layers.reshaping.flatten.Flatten at 0x7f5de9a3e190>,
 <keras.layers.core.dense.Dense at 0x7f5d4df59f10>,
 <keras.layers.core.dense.Dense at 0x7f5d4c6c9d90>,
 <keras.layers.core.dense.Dense at 0x7f5d4c6c96a0>]

In [None]:
hidden = model.layers[1]
hidden.name

'dense'

In [None]:
model.get_layer('dense') is hidden

True

### Compiling the model

The method `compile()` method is used to specify the loss function and the optimizer to use.

In [None]:
model.compile(loss="sparse_categorical_crossentropy",
              optimizer="sgd",
              metrics=["accuracy"])

The `compile()` method is used to configure the model for training by specyfing the loss function, optimizer, and evaluating metrics.

In the code we have the following parameters:

- `loss`: This parameter specifies the loss function that the model will use to evaluate how well it is learning during training. The `"sparse_categorical_crossentropy"` loss function is commonly used for multi-class classification problems where the target variable is represented as integers.
- `optimizer`: The `sgd` optimizer stands for Stochastic Gradient Descent and is a common choice for neural network training.
- `metrics`: This parameter specifies the evaluation metric that will be used to monitor the performance of the model during training and testing. In this case, we are using the `"accuracy"` metric to tack how ofthen the model correctly predicts the class of the input data.

Now we can then move on to training the model using the `fit()` method.

### Training and evaluating the model

In [None]:
history = model.fit(X_train, y_train, epochs=30,
                    validation_data=(X_valid, y_valid))

Epoch 1/30
Epoch 2/30
Epoch 3/30
Epoch 4/30
Epoch 5/30
Epoch 6/30
Epoch 7/30
Epoch 8/30
Epoch 9/30
Epoch 10/30
Epoch 11/30
Epoch 12/30
Epoch 13/30
Epoch 14/30
Epoch 15/30
Epoch 16/30
Epoch 17/30
Epoch 18/30
Epoch 19/30
Epoch 20/30
Epoch 21/30
Epoch 22/30
Epoch 23/30
Epoch 24/30
Epoch 25/30
Epoch 26/30
Epoch 27/30
Epoch 28/30
Epoch 29/30
Epoch 30/30


## Using the model to make predictions

Now, we can use the model's `predict()` method to make predictions on new instances. To make predictions we're going to use the first three instances of the test set:

In [None]:
X_new = X_test[:3]
y_proba = model.predict(X_new)
y_proba.round(2)



array([[0.  , 0.  , 0.  , 0.  , 0.  , 0.01, 0.  , 0.01, 0.  , 0.98],
       [0.  , 0.  , 1.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ],
       [0.  , 1.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  , 0.  ]],
      dtype=float32)

For each isntance the model estimates one proability per class, from class 0 to class 9. For example, for the first image it estimate that the probability of class 9 is 98%.

In [None]:
import numpy as np
y_pred = model.predict(X_new)
classes_y=np.argmax(y_pred,axis=1)



In [None]:
classes_y

array([9, 2, 1])

In [None]:
np.array(class_names)[classes_y]

array(['Ankle boot', 'Pullover', 'Trouser'], dtype='<U11')