# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint


##Learning Objectives



At the end of the experiment, you will be able to:

* Understand the challenges of handwriting recognition

* Understand the MNIST dataset and classify the MNIST dataset using MLP and back propagation techniques

* Understand the feedforward and backpropagation implementation

##Dataset 




###Description


1. The dataset contains 60,000 Handwritten digits as training samples and 10,000 Test samples, 
which means each digit occurs 6000 times in the training set and 1000 times in the testing set. 
2. Each image is Size Normalized and Centered 
3. Each image is 28 X 28 Pixel with 0-255 Gray Scale Value. 
4. That means each image is represented as 784 (28 X28) dimension vector where each value is in the range 0- 255.

### History

Yann LeCun (Director of AI Research, Facebook, Courant Institute, NYU) was given the task of identifying the cheque numbers (in the 90’s) and the amount associated with that cheque without manual intervention. That is when this dataset was created which raised the bars and became a benchmark.

Yann LeCun and Corinna Cortes (Google Labs, New York) hold the copyright of MNIST dataset, which is a subset of the original NIST datasets. This dataset is made available under the terms of the Creative Commons Attribution-Share Alike 3.0 license. 

It is the handwritten digits dataset in which half of them are written by the Census Bureau employees and remaining by the high school students. The digits collected among the Census Bureau employees are easier and cleaner to recognize than the digits collected among the students.



###Challenges

Now, if you notice the images below, you will find that between 2 characters there are always certain similarities and differences. To teach a machine to recognize these patterns and identify the correct output.

![altxt](https://www.researchgate.net/profile/Radu_Tudor_Ionescu/publication/282924675/figure/fig3/AS:319968869666820@1453297931093/A-random-sample-of-6-handwritten-digits-from-the-MNIST-data-set-before-and-after.png)

Hence, all these challenges make this a good problem to solve in Machine Learning.

##Domain Information


Handwriting changes person to person. Some of us have neat handwriting and some have illegible handwriting such as doctors. However, if you think about it even a child who recognizes alphabets and numerics can identify the characters of a text even written by a stranger. But even a technically knowledgeable adult cannot describe the process by which he or she recognizes the text/letters. As you know this is an excellent challenge for Machine Learning.

![altxt](https://i.pinimg.com/originals/f2/7a/ac/f27aac4542c0090872110836d65f4c99.jpg)

The experiment handles a subset of text recognition, namely recognizing the 10 numerals (0 to 9) from scanned images.


## Block Diagram 



The overview of the experiment, i.e. flow of the experiment is explained  through a below block diagram.

![alttxt]( https://cdn.talentsprint.com/aiml/Experiment_related_data/IMAGES/flow chart.png)



##AI/ML Technique


### What is  MLP ?


A multilayer perceptron is a class of feedforward artificial neural network. An MLP consists of, at least, three layers of nodes as shown in below image: 

**Layer1** :   Input Layer

**Layer 2** :  Hidden Layer

**Layer 3** : Output Layer

![alt text](https://www.researchgate.net/profile/Mohamed_Zahran6/publication/303875065/figure/fig4/AS:371118507610123@1465492955561/A-hypothetical-example-of-Multilayer-Perceptron-Network.png)

The number of nodes in the input layer is determined by the dimensionality of our data. 

The number of nodes in the output layer is determined by the number of classes we have.


 ### Making predictions using Feedforward Propagation



Our network makes predictions using *forward propagation*, which is just a bunch of matrix multiplications and the application of the activation function(s) which we defined above. If $x$ is the $N$-dimensional input to our network then we calculate our prediction $\hat{y}$ (of lets say dimension $C$) as mentioned below:

![alttxt]( https://cdn.talentsprint.com/aiml/Experiment_related_data/IMAGES/1_b.png)

$$
\begin{aligned}
z_1 & = xW_1 + b_1 \\
a_1 & = \tanh(z_1) \\
z_2 & = a_1W_2 + b_2 \\
a_2 & = \hat{y} = \mathrm{softmax}(z_2)
\end{aligned}
$$

$z_i$ is the weighted sum of inputs of layer $i$ (bias included) and $a_i$ is the output of layer $i$ after applying the activation function. $W_1, b_1, W_2, b_2$ are  parameters of our network, which we learn from our training data. You can think of them as matrices transforming data between layers of the network. 

Looking at the above matrix multiplications we can figure out the dimensionality of these matrices. If we use 100 nodes for our hidden layer then $W_1 \in \mathbb{R}^{N\times100}$, $b_1 \in \mathbb{R}^{100}$, $W_2 \in \mathbb{R}^{100\times C}$, $b_2 \in \mathbb{R}^{C}$ . 

### Using Backpropogation 

Learning the parameters for our network means that we should find parameters ($W_1, b_1, W_2, b_2$) that minimizes the error on our training data. But how do we define error? We call the function that measures our error the *loss function*. A common choice with the softmax output is the cross-entropy loss. If we have $N$ training examples and $C$ classes then the loss for our prediction $\hat{y}$ with respect to the true labels $y$ is given by:

$$
\begin{aligned}
L(y,\hat{y}) = - \frac{1}{N} \sum_{n \in N} \sum_{i \in C} y_{n,i} \log\hat{y}_{n,i}
\end{aligned}
$$

The formula looks complicated, but all it really does is sum over our training examples and adds to the loss if we predict the incorrect class. So, the further away $y$ (the correct labels) and $\hat{y}$ (our predictions) are, the greater our loss will be. 

Remember that our goal is to find the parameters that minimize our loss function. We use gradient descent to find its minimum. Here, we implement the most vanilla version of gradient descent, also called batch gradient descent with a fixed learning rate. Variations such as SGD (stochastic gradient descent) or minibatch gradient descent typically perform better in practice but we are not applying  that in this experiment.

As an input, gradient descent needs the gradients (vector of derivatives) of the loss function with respect to our parameters: $\frac{\partial{L}}{\partial{W_1}}$, $\frac{\partial{L}}{\partial{b_1}}$, $\frac{\partial{L}}{\partial{W_2}}$, $\frac{\partial{L}}{\partial{b_2}}$. To calculate these gradients we use the famous *back propagation algorithm*, which efficiently calculates the gradient starting from the output.

By applying the back propagation formula using chain rule we find the following:
![alttxt]( https://cdn.talentsprint.com/aiml/Experiment_related_data/IMAGES/2_b.png)

$$
\begin{aligned}
& \delta_3 = \frac{\partial{L}}{\partial{z_2}} = \frac{\partial{L}}{\partial{a_2}}\times\frac{\partial{a_2}}{\partial{z_2}} = -(y - \hat{y})\\
\end{aligned}
$$
where $a_2$ is $\hat{y}$
$$
\begin{aligned}
& \frac{\partial{L}}{\partial{W_2}} = \frac{\partial{L}}{\partial{z_2}}\times\frac{\partial{z_2}}{\partial{W_2}} = a_1^T \delta_3  \\
& \frac{\partial{L}}{\partial{b_2}} = \frac{\partial{L}}{\partial{z_2}}\times\frac{\partial{z_2}}{\partial{b_2}} = \delta_3\\
& \delta_2 = \frac{\partial{L}}{\partial{z_1}} = \frac{\partial{L}}{\partial{z_2}}\times\frac{\partial{z_2}}{\partial{a_1}}\times\frac{\partial{a_1}}{\partial{z_1}} = (1 - \tanh^2z_1) \circ \delta_3W_2^T \\
& \frac{\partial{L}}{\partial{W_1}} = \frac{\partial{L}}{\partial{z_1}}\times\frac{\partial{z_1}}{\partial{W_1}} = x^T \delta_2\\
& \frac{\partial{L}}{\partial{b_1}} = \frac{\partial{L}}{\partial{z_1}}\times\frac{\partial{z_1}}{\partial{b_1}} = \delta_2 \\
\end{aligned}
$$

$\delta_3 = $ derivative of cross-entropy loss with Softmax as Activation [We will not go into its derivation]


##Keywords



Multilayer Perceptron (MLP)

 BackPropagation

 Chain Rule

 Softmax

 Activation Function

 Gradient Descent

## Expected time to complete the experiment is : 60min

### Setup Steps

In [0]:
#@title Please enter your registration id to start: (e.g. P181900101) { run: "auto", display-mode: "form" }
Id = "P181902118" #@param {type:"string"}


In [0]:
#@title Please enter your password (normally your phone number) to continue: { run: "auto", display-mode: "form" }
password = "8860303743" #@param {type:"string"}


In [3]:
#@title Run this cell to complete the setup for this Notebook

from IPython import get_ipython
ipython = get_ipython()
  
notebook="Backpropagation_Backup" #name of the notebook
Answer = 'UNGRADED'
def setup():
#  ipython.magic("sx pip3 install torch")
   
   print ("Setup completed successfully")
   return
def submit_notebook():
    
    ipython.magic("notebook -e "+ notebook + ".ipynb")
    
    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:        
        print(r["err"])
        return None        
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getAnswer() and getComplexity() and getAdditional() and getConcepts():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional, 
              "concepts" : Concepts, "record_id" : submission_id, 
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook}

      r = requests.post(url, data = data)
      r = json.loads(r.text)
      print("Your submission is successful.")
      print("Ref Id:", submission_id)
      print("Date of submission: ", r["date"])
      print("Time of submission: ", r["time"])
      print("View your submissions: https://iiith-aiml.talentsprint.com/notebook_submissions")
      print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
      return submission_id
    else: submission_id
    

def getAdditional():
  try:
    if Additional: return Additional      
    else: raise NameError('')
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None
  
def getConcepts():
  try:
    return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None

def getAnswer():
  try:
    return Answer
  except NameError:
    print ("Please answer Question")
    return None

def getId():
  try: 
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup 
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()
  
else:
  print ("Please complete Id and Password cells before running setup")



Setup completed successfully


We will perform the experiment in 3 steps :

#### 1. Loading the dataset

In [0]:
# Importing Required Packages
import numpy as np
from scipy import ndimage 
from matplotlib import pyplot as plt
from sklearn import manifold, datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Perceptron
from sklearn.metrics import accuracy_score

####  Now we will load the dataset from sklearn datasets package

In [5]:
#Load MNIST dataset 
digits = datasets.load_digits(n_class=10)
# Storing features in 'X' variable
X = digits.data
# Storing labels in 'Y' variable
Y = digits.target
# Checking the shape of variable 'X' and 'Y'
print(X.shape, Y.shape)
# Storing number of samples
num_examples = X.shape[0]  ## training set size
## input layer dimensionality
nn_input_dim = X.shape[1]       
print(nn_input_dim)
## output layer dimensionality
nn_output_dim = len(np.unique(Y))    
print(nn_output_dim)
# Defining the parameters
params = {
    "lr":1e-5,        ## learning_rate
    "max_iter":1000,
    "h_dimn":40,     ## hidden_layer_size
}

(1797, 64) (1797,)
64
10


#### 2. Writing helper functions for the MLP

Let us define the skeleton of the model which is a 3 layer neural network with one input layer, one hidden layer and one output layer.

In [0]:
def build_model():
    hdim = params["h_dimn"]
    # Initialize the parameters to random values.
    np.random.seed(0)
    # here nn_input_dim is nothing but the number of features, hdim is the dimension of hidden layer.
    # So the total nbr of weights for first layer (input to hidden) would have dimension nn_input_dim*hdim
    # Note: We also normalize the weights so that lie within a standard range (defined by sq-root of nbr of input dimensions)
    W1 = np.random.randn(nn_input_dim, hdim) / np.sqrt(nn_input_dim)
    b1 = np.zeros((1, hdim))
    # The nbr of 2nd layer i.e. hidden to output 
    W2 = np.random.randn(hdim, nn_output_dim) / np.sqrt(hdim)
    b2 = np.zeros((1, nn_output_dim))

    # This is what we return at the end
    model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}
    return model

**Softmax**

The softmax function is often used in the final layer of a neural network-based classifier. The main advantage of using Softmax is that it outputs probability. The range of the probability will be from 0 to 1, and the sum of all the probabilities in the output layer will be equal to one. If the softmax function used for multi-classification model it returns the probabilities of each class and the target class is the one with the highest probability.

Now let us define a function to calculate the softmax value:

In [0]:
def softmax(x):
    exp_scores = np.exp(x)
    probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True)
    return probs

Let us define a function for forward propagation.

In [0]:
def feedforward(model, x):
    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
    z1 = x.dot(W1) + b1
    a1 = np.tanh(z1)#This is the non-linearity applied to the output
    z2 = a1.dot(W2) + b2
    probs = softmax(z2)
    return a1, probs

Let us define a function for backpropagation

In [0]:
def backpropagation(model, x, y, a1, probs):
    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
    
    delta3 = probs
    delta3[range(y.shape[0]), y] -= 1
    dW2 = (a1.T).dot(delta3)
    db2 = np.sum(delta3, axis=0, keepdims=True)
    delta2 = delta3.dot(W2.T) * (1 - np.power(a1, 2))
    dW1 = np.dot(x.T, delta2)
    db1 = np.sum(delta2, axis=0)
    return dW2, db2, dW1, db1

Now let us write a function to calculate loss.

In [0]:
def calculate_loss(model, x, y):
    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
   
    # Forward propagation to calculate predictions
    _, probs = feedforward(model, x)
    
    # Calculating the cross entropy loss
    corect_logprobs = -np.log(probs[range(y.shape[0]), y])
    data_loss = np.sum(corect_logprobs)
    
    return 1./y.shape[0] * data_loss

Now let us define a function to calculate the predictions  by forward propagation

In [0]:
def test(model, x, y):
    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
    # Forward propagation to calculate predictions
    _, probs = feedforward(model, x)
    preds = np.argmax(probs, axis=1)
    return np.count_nonzero(y==preds)/y.shape[0]

Now let us define a function to train the model. First we will perform forward propagation then backpropagation to update the gradient descent parameters then assign updated paramters to the model.

In [0]:
def train(model, X_train, X_test, Y_train, Y_test, print_loss=True):
    # Gradient descent. For each batch...
    W1, b1, W2, b2 = model['W1'], model['b1'], model['W2'], model['b2']
    for i in range(0, params["max_iter"]):

        # Forward propagation
        a1, probs = feedforward(model, X_train)

        # Backpropagation
        dW2, db2, dW1, db1 = backpropagation(model, X_train, Y_train, a1, probs)

        # Gradient descent parameter update
        W1 += -params["lr"] * dW1
        b1 += -params["lr"] * db1
        W2 += -params["lr"] * dW2
        b2 += -params["lr"] * db2
        
        # Assign new parameters to the model
        model = { 'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}
        if print_loss and i % 50 == 0:
            print("Loss after iteration %i: %f" %(i, calculate_loss(model, X_train, Y_train)),
                  ", Test accuracy:", test(model, X_test, Y_test), "\n")
    return model

#### 3. Training the model

In [14]:
model = build_model()

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.5)

model = train(model, X_train, X_test, Y_train, Y_test)

Loss after iteration 0: 2.492749 , Test accuracy: 0.13236929922135707 

Loss after iteration 50: 1.712895 , Test accuracy: 0.45050055617352613 

Loss after iteration 100: 1.311046 , Test accuracy: 0.6596218020022246 

Loss after iteration 150: 1.043388 , Test accuracy: 0.7686318131256952 

Loss after iteration 200: 0.858650 , Test accuracy: 0.8131256952169077 

Loss after iteration 250: 0.725068 , Test accuracy: 0.8498331479421579 

Loss after iteration 300: 0.625338 , Test accuracy: 0.8654060066740823 

Loss after iteration 350: 0.548757 , Test accuracy: 0.882091212458287 

Loss after iteration 400: 0.486890 , Test accuracy: 0.8921023359288098 

Loss after iteration 450: 0.436110 , Test accuracy: 0.8987764182424917 

Loss after iteration 500: 0.392825 , Test accuracy: 0.9087875417130145 

Loss after iteration 550: 0.355599 , Test accuracy: 0.917686318131257 

Loss after iteration 600: 0.324155 , Test accuracy: 0.9210233592880979 

Loss after iteration 650: 0.297017 , Test accuracy: 0.

### Please answer the questions below to complete the experiment:

In [0]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Good and Challenging me" #@param ["Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging me", "Was Tough, but I did it", "Too Difficult for me"]


In [0]:
#@title If it was very easy, what more you would have liked to have been added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "test" #@param {type:"string"}

In [0]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["Yes", "No"]

In [18]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id =return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Your submission is successful.
Ref Id: 4275
Date of submission:  27 Apr 2019
Time of submission:  14:18:57
View your submissions: https://iiith-aiml.talentsprint.com/notebook_submissions
For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.
