# Multilayer Perceptron (MLP)

A multi-layer perceptron (MLP) has the same structure of a single layer perceptron with one or more hidden layers. It is a neural network where the mapping between inputs and output is non-linear. The backpropagation algorithm consists of two phases: the forward phase where the activations are propagated from the input to the output layer, and the backward phase, where the error between the observed actual and the requested nominal value in the output layer is propagated backwards in order to modify the weights and bias values.

#### Load dataset

In [1]:
import numpy as np
import pandas as pd

# Load data
data=pd.read_csv('HR_comma_sep.csv')

data.head()

Unnamed: 0,satisfaction_level,last_evaluation,number_project,average_montly_hours,time_spend_company,Work_accident,left,promotion_last_5years,sales,salary
0,0.38,0.53,2,157,3,0,1,0,sales,low
1,0.8,0.86,5,262,6,0,1,0,sales,medium
2,0.11,0.88,7,272,4,0,1,0,sales,medium
3,0.72,0.87,5,223,5,0,1,0,sales,low
4,0.37,0.52,2,159,3,0,1,0,sales,low


#### Preprocessing: Label Encoding

In [4]:
# Import LabelEncoder
from sklearn import preprocessing

# Creating labelEncoder
le = preprocessing.LabelEncoder()

# Converting string labels into numbers.
data['salary']=le.fit_transform(data['salary'])


#### Split the dataset

In [5]:
# Spliting data into Feature and
X=data[['satisfaction_level', 'last_evaluation', 'number_project',
        'average_montly_hours', 'time_spend_company', 'Work_accident',
        'promotion_last_5years','salary']]
y=data['left']

# Import train_test_split function
from sklearn.model_selection import train_test_split

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)  # 70% training and 30% test

#### Build Classification Model
First, import the MLPClassifier module and create MLP Classifier object using MLPClassifier() function. Then, fit your model on the train set using fit() and perform prediction on the test set using predict().

##### Parameters:
- hidden_layer_sizes: it is a tuple where each element represents one layer and its value represents the number of neurons on each hidden layer.
- learning_rate_init: It used to controls the step-size in updating the weights.
- activation: Activation function for the hidden layer. Examples, identity, logistic, tanh, and relu. by default, relu is used as an activation function.
- random_state: It defines the random number for weights and bias initialization. 
- verbose: It used to print progress messages to standard output.

In [6]:
# Import MLPClassifer 
from sklearn.neural_network import MLPClassifier

# Create model object
clf = MLPClassifier(hidden_layer_sizes=(6,5),
                    random_state=5,
                    verbose=True,
                    learning_rate_init=0.01)

# Fit data onto the model
clf.fit(X_train,y_train)

Iteration 1, loss = 0.58375699
Iteration 2, loss = 0.55089476
Iteration 3, loss = 0.53811793
Iteration 4, loss = 0.51521772
Iteration 5, loss = 0.50091513
Iteration 6, loss = 0.50279914
Iteration 7, loss = 0.47967376
Iteration 8, loss = 0.48207220
Iteration 9, loss = 0.48021354
Iteration 10, loss = 0.47412735
Iteration 11, loss = 0.47203837
Iteration 12, loss = 0.46928072
Iteration 13, loss = 0.45686180
Iteration 14, loss = 0.45709884
Iteration 15, loss = 0.45132223
Iteration 16, loss = 0.46630415
Iteration 17, loss = 0.45162435
Iteration 18, loss = 0.46152781
Iteration 19, loss = 0.45317280
Iteration 20, loss = 0.45251283
Iteration 21, loss = 0.46049935
Iteration 22, loss = 0.45274210
Iteration 23, loss = 0.45780399
Iteration 24, loss = 0.45557925
Iteration 25, loss = 0.45298486
Iteration 26, loss = 0.44800479
Iteration 27, loss = 0.44150279
Iteration 28, loss = 0.43542267
Iteration 29, loss = 0.42998854
Iteration 30, loss = 0.45012144
Iteration 31, loss = 0.43029912
Iteration 32, los

MLPClassifier(hidden_layer_sizes=(6, 5), learning_rate_init=0.01,
              random_state=5, verbose=True)

#### Make Prediction and Evaluate the Model

In [7]:
# Make prediction on test dataset
ypred=clf.predict(X_test)

# Import accuracy score 
from sklearn.metrics import accuracy_score

# Calcuate accuracy
accuracy_score(y_test,ypred)

0.9148888888888889

### Using the TensorFlow library

In [1]:
# importing modules
import tensorflow as tf
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Activation
import matplotlib.pyplot as plt


In [2]:
# Download the dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

In [3]:
# Cast the records into float values
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# normalize image pixel values by dividing by 255
x_train = x_train / 255.
x_test = x_test / 255.


In [4]:
print("Feature matrix:", x_train.shape)
print("Target matrix:", x_test.shape)
print("Feature matrix:", y_train.shape)
print("Target matrix:", y_test.shape)


Feature matrix: (60000, 28, 28)
Target matrix: (10000, 28, 28)
Feature matrix: (60000,)
Target matrix: (10000,)


- The Sequential model allows to create models layer-by-layer as we need in a multi-layer perceptron and is limited to single-input, single-output stacks of layers.
- Flatten flattens the input provided without affecting the batch size. 
- Activation is for using the sigmoid activation function.
- The first two Dense layers are used to make a fully connected model and are the hidden layers.
- The last Dense layer is the output layer which contains 10 neurons that decide which category the image belongs to.

In [5]:
# Form the Input, hidden, and output layers

model = Sequential([
    # reshape 28 row * 28 column data to 28*28 rows
    Flatten(input_shape=(28, 28)),
    # dense layer 1
    Dense(256, activation='sigmoid'),
    # dense layer 2
    Dense(128, activation='sigmoid'),
    # output layer
    Dense(10, activation='sigmoid'),
])


In [6]:
# Compile the model

model.compile(optimizer='adam',loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])


- Epochs tell us the number of times the model will be trained in forwarding and backward passes.
- Batch Size represents the number of samples, If it’s unspecified, batch_size will default to 32.
- Validation Split is a float value between 0 and 1. The model will set apart this fraction of the training data to evaluate the loss and any model metrics at the end of each epoch. (The model will not be trained on this data)

In [7]:
# Fit the model

model.fit(x_train, y_train, epochs=10,
          batch_size=2000,
          validation_split=0.2)


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x25081d1d4f0>

In [None]:
# Find Accuracy of the model

results = model.evaluate(x_test, y_test, verbose = 0)
print('test loss, test acc:', results)


# Batch Normalization

In 1998, Yan LeCun in his famous paper Effiecient BackProp highlighted the importance of normalizing the inputs. Batch normalization is a technique for training very deep neural networks that normalizes the contributions to a layer for every mini-batch. This has the impact of settling the learning process and drastically decreasing the number of training epochs required to train deep neural networks.


In [19]:
from tensorflow.keras.layers import Dense, BatchNormalization

model = Sequential([
    Flatten(input_shape=(28, 28)),
    # dense layer 1
    Dense(256, activation='sigmoid'),
    BatchNormalization(),
    # dense layer 2
    Dense(128, activation='sigmoid'),
    BatchNormalization(),
    # output layer
    Dense(10, activation='sigmoid'),
])


## Assignment

### Wine dataset

These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis determined the quantities of 13 constituents found in each of the three types of wines.

##### Attribute information
- Class: target, the category of cultivator, discrete, values: 1, 2, 3
- Alcohol: the alcohol content in the wine, numerical
- Malic acid: the malic acid content in the wine, numerical
- Ash: the ash content in the wine, numerical
- Alcalinity of ash: the alcanity of ash present in the wine, numerical
- Magnesium: the magnesium content in the wine, numerical
- Total phenols: the total amount of phenols in the wine, numerical
- Flavanoids: the flavanoids content in the wine, numerical
- Nonflavanoid phenols: the content of nonflavanoid phenols in the wine, numerical
- Proanthocyanins: the content of proanthocyanins in the wine, numerical
- Hue: the amount of hue added to the wine, numerical
- OD280/OD315 of diluted wines: a standard measure of the quality of wine, numerical
- Proline: the proline content in the wine, numerical

##  Question 1
- Perform the required preprocessing , 70,30 split and build a classifier using the Multi-Layer Perceptron Classifier model. Use three layers with the same number of neurons as there are features.
- Tabulate the accuracy, precision, recall and F1-score.
- Display the MLP weights and biases after training your model.

###  The CIFAR-10 dataset

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

##  Question 2

- Using CIFAR-10 dataset, build a Multi-Layer Perceptron Classifier model both with and without batch normalization.
- Tabulate the accuracy, precision, recall and F1-score.
