# Mulilayer Neural Network

---
## Table of contents
* [Introduction](#introduction)
* [Types of Neural Network](#type)
* [Multi-Layer Neural Network](#multi)
* [Formula for Multi-Layered Neural Network](#formula)
* [Application](#app)



---
## Introduction <a class="anchor" id="Introduction"></a>

A series or set of algorithms that try to recognize the underlying relationship in a data set through a definite process that mimics the operation of the human brain is known as Neural Network. Hence, the neural networks could refer to the neurons of the human, either artificial or organic in nature. A neural network can easily adapt to the changing input to achieve or generate the best possible result by the network and does not need to redesign the output criteria.

---
## Types of Neural Network<a class="anchor" id="type"></a >

Neural Networks can be classified into multiple types based on their Layers and depth activation filters, Structure, Neurons used, Neuron density, data flow, and so on. The types of Neural Networks are as follows: 

* Perceptron
* Multi-Layer Perceptron or Multi-Layer Neural Network
* Feed Forward Neural Networks
* Convolutional Neural Networks
* Radial Basis Function Neural Networks
* Recurrent Neural Networks
* Sequence to Sequence Model
* Modular Neural Network

---
## Multi-Layer Neural Network<a class="anchor" id="multi"></a >

To be accurate a fully connected Multi-Layered Neural Network is known as Multi-Layer Perceptron. A Multi-Layered Neural Network consists of multiple layers of artificial neurons or nodes. Unlike Single-Layer Neural Network, in recent times most of the networks have Multi-Layered Neural Network. The following diagram is a visualization of a multi-layer neural network. 
![image](https://media.geeksforgeeks.org/wp-content/uploads/20200702205951/nn.PNG)

### Explanation: 
Here the nodes marked as “1” are known as bias units. The leftmost layer or Layer 1 is the input layer, the middle layer or Layer 2 is the hidden layer and the rightmost layer or Layer 3 is the output layer. It can say that the above diagram has 3 input units (leaving the bias unit), 1 output unit, and 3 hidden units.
A Multi-layered Neural Network is the typical example of the Feed Forward Neural Network. The number of neurons and the number of layers consists of the hyperparameters of Neural Networks which need tuning. In order to find ideal values for the hyperparameters, one must use some cross-validation techniques. Using the Back-Propagation technique, weight adjustment training is carried out.

## Formula for Multi-Layered Neural Network<a class="anchor" id="formula"></a >

Suppose we have xn inputs(x1, x2….xn) and a bias unit. Let the weight applied be w1, w2…..wn. Then find the summation and bias unit on performing dot product among inputs and weights as:
$$r =\sum_{i=1}^m(x_{i})+bias$$
On feeding the r into activation function F(r) we find the output for the hidden layers. For the first hidden layer h1, the neuron can be calculated as: 
$$h_1^1 = F(r)$$
For all the other hidden layers repeat the same procedure. Keep repeating the process until reach the last weight set.


## Application<a class="anchor" id="app"></a >


In [1]:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split
from keras.datasets import fashion_mnist

from sklearn.metrics import accuracy_score
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
import seaborn as sns
class MultilayerPerceptron():
  
    def __init__(self, layers = [784, 60, 60, 10], actFun_type='relu'):
        self.actFun_type = actFun_type
        self.layers = layers
        self.L = len(self.layers)
        self.W =[[0.0]]
        self.B = [[0.0]]
        for i in range(1, self.L):
            w_temp = np.random.randn(self.layers[i], self.layers[i-1]) * np.sqrt(2/self.layers[i-1])
            b_temp = np.random.randn(self.layers[i], 1) * np.sqrt(2/self.layers[i-1])

            self.W.append(w_temp)
            self.B.append(b_temp)

    def reset_weights(self, layers = [784, 60, 60, 10]):
        self.layers = layers
        self.L = len(self.layers)
        self.W = [[0.0]]
        self.B = [[0.0]]
        for i in range(1, self.L):
            w_temp = np.random.randn(self.layers[i], self.layers[i-1])*np.sqrt(2/self.layers[i-1])
            b_temp = np.random.randn(self.layers[i], 1)*np.sqrt(2/self.layers[i-1])

            self.W.append(w_temp)
            self.B.append(b_temp)

    def forward_pass(self, p, predict_vector = False):
        Z =[[0.0]]
        A = [p[0]]
        for i in range(1, self.L):
            z = (self.W[i] @ A[i-1]) + self.B[i]
            a = self.actFun(z, self.actFun_type)
            Z.append(z)
            A.append(a)

        if predict_vector == True:
            return A[-1]
        else:
            return Z, A

    def mse(self, a, y):
        return .5*sum((a[i]-y[i])**2 for i in range(10))[0]

    def MSE(self, data):
        c = 0.0
        for p in data:
            a = self.forward_pass(p, predict_vector=True)
            c += self.mse(a, p[1])
        return c/len(data)

    def actFun(self, z, type):
        if type == 'tanh':
            return np.tanh(z)
        elif type == 'sigmoid':
            return 1.0 / (1.0 + np.exp(-z))
        elif type == 'relu':
            return np.maximum(0, z)
        else:
            return None

    def diff_actFun(self, z, type):
        if type == 'tanh':
            return 1.0 - (np.tanh(z))**2
        elif type == 'sigmoid':
            return self.actFun(z, type) * (1-self.actFun(z, type))
        elif type == 'relu':
            return np.where(z > 0, 1.0, 0)
        else:
            return None

    def deltas_dict(self, p):
        Z, A = self.forward_pass(p)
        deltas = dict()
        deltas[self.L-1] = (A[-1] - p[1])*self.diff_actFun(Z[-1], self.actFun_type)
        for l in range(self.L-2, 0, -1):
            deltas[l] = (self.W[l+1].T @ deltas[l+1]) *self.diff_actFun(Z[l], self.actFun_type)

        return A, deltas

    def stochastic_gradient_descent(self, data, alpha = 0.04, epochs = 3):
        print(f"Initial Cost = {self.MSE(data)}")
        for k in range(epochs):
            for p in data:
                A, deltas = self.deltas_dict(p)
                for i in range(1, self.L):
                    self.W[i] = self.W[i] - alpha*deltas[i]@A[i-1].T
                    self.B[i] = self.B[i] - alpha*deltas[i]
        print(f"{k} Cost = {self.MSE(data)}")


    def mini_batch_gradient_descent(self, data, batch_size = 15, alpha = 0.04, epochs = 3):
        print(f"Initial Cost = {self.MSE(data)}")
        data_length = len(data)
        for k in range(epochs):
            for j in range(0, data_length-batch_size, batch_size):
                delta_list = []
                A_list = []
                for p in data[j:j+batch_size]:
                    A, deltas = self.deltas_dict(p)
                    delta_list.append(deltas)
                    A_list.append(A)

                for i in range(1, self.L):
                    self.W[i] = self.W[i] - (alpha/batch_size)*sum(da[0][i]@da[1][i-1].T for da in zip(delta_list, A_list))
                    self.B[i] = self.B[i] - (alpha/batch_size)*sum(deltas[i] for deltas in delta_list)
            print(f"{k} Cost = {self.MSE(data)}")

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [2]:
(train_X, train_y), (test_X, test_y) = fashion_mnist.load_data()

# Check the shape of the training set
train_X.shape

(60000, 28, 28)

In [3]:
# Check the shape of the first matrix in the training set
train_X[0].shape

(28, 28)

In [4]:
# Check the shape of the test set
test_X.shape

(10000, 28, 28)

In [5]:
train_X = train_X/255
test_X = test_X/255
train_X[0].flatten().reshape(28*28, 1).shape

(784, 1)

In [6]:
X = []
for x in train_X:
    X.append(x.flatten().reshape(28*28, 1))

# Y will temp store one-hot encoded label vectors
Y = []
for y in train_y:
    temp_vec = np.zeros((10, 1))
    temp_vec[y][0] = 1.0
    Y.append(temp_vec)

# Our data will be stored as a list of tuples. 
train_data = [p for p in zip(X, Y)]

# the same method to deal with test data
X = []
for x in test_X:
  X.append(x.flatten().reshape(784, 1))

Y = []
for y in test_y:
    temp_vec = np.zeros((10, 1))
    temp_vec[y][0] = 1.0
    Y.append(temp_vec)

test_data = [p for p in zip(X, Y)]

We will train MLP's using sigmoid, hyperbolic tangent, and rectified linear activation functions by mini batch gradient descent, and compare their performances.

In [7]:
net_tanh = MultilayerPerceptron(layers=[784, 60, 60, 10], actFun_type='tanh')
net_tanh.mini_batch_gradient_descent(train_data, batch_size = 16, alpha = 0.01, epochs = 5)

Initial Cost = 1.576435207350686
0 Cost = 0.17838318571101405
1 Cost = 0.15550323442493222
2 Cost = 0.14375768223367047
3 Cost = 0.13607407914044428
4 Cost = 0.13047589072584215


In [8]:
net_tanh.MSE(test_data)

0.13757809796154796

In [9]:
net_relu = MultilayerPerceptron(layers=[784, 100, 100, 10], actFun_type='relu')
net_relu.mini_batch_gradient_descent(train_data, batch_size = 16, alpha = 0.01, epochs = 5)

Initial Cost = 0.6419399947820255
0 Cost = 0.28038644767539217
1 Cost = 0.2706564139346542
2 Cost = 0.26597957673599076
3 Cost = 0.2627727399124922
4 Cost = 0.2599156259155153


In [10]:
net_relu.MSE(test_data)

0.26594929199648265

In [11]:
net_sig = MultilayerPerceptron(layers=[784, 100, 100, 10], actFun_type='sigmoid')
net_sig.mini_batch_gradient_descent(train_data, batch_size = 16, alpha = 0.01, epochs = 5)

Initial Cost = 1.037490266181791
0 Cost = 0.43563291676217736
1 Cost = 0.4082627949629969
2 Cost = 0.3645917560730352
3 Cost = 0.3256380543481417
4 Cost = 0.2968073652881028


In [12]:
net_sig.MSE(test_data)

0.2977860616592044

### conclusion:  from the output, we think sigmoid activation function has the best perfomance on the test data.