<a href="https://colab.research.google.com/github/deepakk7195/IISC_CDS_DS/blob/foundations_of_ds/M1_AST_06_Automatic_differentiation_A.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Advanced Certification Program in Computational Data Science
## A program by IISc and TalentSprint
### Assignment 6: Automatic differentiation

## Learning Objectives

At the end of the experiment, you will be able to:

* understand the basics of automatic differentiation

* understand the backward and forward propagation for a given neural network

## Information

**Understanding Basics of Neural Network and its parameters**

**Neural Network**

Neural networks are a class of machine learning algorithms used to model complex patterns in datasets using multiple hidden layers and non-linear activation functions. A neural network takes an input, passes it through multiple layers of hidden neurons (mini-functions with unique coefficients that must be learned), and outputs a prediction representing the combined input of all the neurons.

Neural networks are trained iteratively using optimization techniques like gradient descent. After each cycle of training, an error metric is calculated based on the difference between prediction and target.

**Neuron**

A neuron takes a group of weighted inputs, applies an activation function, and returns an output.

**Weights**

Weights are values that control the strength of the connection between two neurons. That is, inputs are typically multiplied by weights, and that defines how much influence the input will have on the output. In other words: when the inputs are transmitted between neurons, the weights are applied to the inputs along with an additional value (the bias)

**Bias**

Bias terms are additional constants attached to neurons and added to the weighted input before the activation function is applied. Bias terms help models represent patterns that do not necessarily pass through the origin.

**Layers**

*Input Layer*

Holds the data your model will train on. Each neuron in the input layer represents a unique attribute in your dataset (e.g. height, hair color, etc.).

*Hidden Layer*

Sits between the input and output layers and applies an activation function before passing on the results. There are often multiple hidden layers in a network.

*Output Layer*

The final layer in a network. It receives input from the previous hidden layer, optionally applies an activation function, and returns an output representing your model’s prediction.

### Setup Steps:

In [1]:
#@title Please enter your registration id to start: { run: "auto", display-mode: "form" }
Id = "2301931" #@param {type:"string"}

In [2]:
#@title Please enter your password (your registered phone number) to continue: { run: "auto", display-mode: "form" }
password = "9665220904" #@param {type:"string"}

In [3]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()

notebook= "M1_AST_06_Automatic_differentiation_A" #name of the notebook

def setup():
#  ipython.magic("sx pip3 install torch")
    from IPython.display import HTML, display
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():
    ipython.magic("notebook -e "+ notebook + ".ipynb")

    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:
        print(r["err"])
        return None
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getAnswer() and getComplexity() and getAdditional() and getConcepts() and getComments() and getMentorSupport():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional,
              "concepts" : Concepts, "record_id" : submission_id,
              "answer" : Answer, "id" : Id, "file_hash" : file_hash,
              "notebook" : notebook,
              "feedback_experiments_input" : Comments,
              "feedback_mentor_support": Mentor_support}
      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:
        print(r["err"])
        return None
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://cds-iisc.talentsprint.com/notebook_submissions")
        #print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
        return submission_id
    else: submission_id


def getAdditional():
  try:
    if not Additional:
      raise NameError
    else:
      return Additional
  except NameError:
    print ("Please answer Additional Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None

def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None


# def getWalkthrough():
#   try:
#     if not Walkthrough:
#       raise NameError
#     else:
#       return Walkthrough
#   except NameError:
#     print ("Please answer Walkthrough Question")
#     return None

def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None


def getMentorSupport():
  try:
    if not Mentor_support:
      raise NameError
    else:
      return Mentor_support
  except NameError:
    print ("Please answer Mentor support Question")
    return None

def getAnswer():
  try:
    if not Answer:
      raise NameError
    else:
      return Answer
  except NameError:
    print ("Please answer Question")
    return None


def getId():
  try:
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()
else:
  print ("Please complete Id and Password cells before running setup")



Setup completed successfully


#### Importing required packages

In [4]:
import numpy as np
import scipy
import matplotlib.pyplot as plt
import numpy.linalg as npl #Linear algebra from numpy
from scipy.optimize import differential_evolution #Finds the global minimum of a multivariate function.
import math
from scipy.stats import norm

## Automatic Differentiation

How do neural networks calculate the partial derivatives of an expression? The answer lies in a process known as automatic differentiation. Automatic differentiation can only calculate the partial derivative of an expression on a certain point.

## Backpropagation

Backpropagation is a special case of automatic differentiation. We can think of automatic differentiation as a set of techniques to numerically (in contrast to
symbolically) evaluate the exact gradient of a
function by working with intermediate variables and applying the chain
rule. Automatic differentiation applies a series of elementary arithmetic operations, e.g., addition and multiplication and elementary functions,
e.g., sin, cos, exp, log. By applying the chain rule to these operations, the
gradient of quite complicated functions can be computed automatically.
Automatic differentiation applies to general computer programs and has
forward and reverse modes.

Let us look at an instuctive example to understand reverse mode of propagation.

*Example:* Consider the function

$f(x) = \sqrt{x^{2}+ \exp{(x^{2})}} + \cos{(x^{2}+ \exp{(x^{2})}) } $

If we were to implement a function f on a computer, we would be able to save some computation by using intermediate variables:

$a=x^{2},$

$b=\exp{(a)},$

$c=a+b,$

$d=\sqrt{c},$

$e=\cos{c},$

$f=d+e.$

![Image]( https://cdn.iisc.talentsprint.com/CDS/Images/Automatic_differentiation.png)

$\text{Figure: Computation graph with inputs x, function values f, and intermediate variables a, b, c, d, e.}$

The set of equations that include intermediate variables can be thought
of as a computation graph, a representation that is widely used in implementations of neural network software libraries. We can directly compute
the derivatives of the intermediate variables with respect to their corresponding inputs by recalling the definition of the derivative of elementary
functions. We obtain the following:

$\frac{\partial a}{\partial x} = 2x $

$\frac{\partial b}{\partial a} = \exp{(a)} $

$\frac{\partial c}{\partial a} = 2x = \frac{\partial c}{\partial b} $

$\frac{\partial d}{\partial c}= \frac{1}{2\sqrt{c}}$

$\frac{\partial e}{\partial c} = -\sin{(c)} $

$\frac{\partial f}{\partial e} = 1 = \frac{\partial f}{\partial d}$

By looking at the computation graph in Figure above, we can compute
$∂f /∂x$ by working backward from the output and obtain:

$\frac{∂f}{∂c} = \frac{∂f}{∂d}  \frac{∂d}{∂c} + \frac{∂f}{∂e}  \frac{∂e}{∂c}$

$\frac{∂f}{∂b} = \frac{∂f}{∂c}\frac{∂c}{∂b}$

$\frac{∂f}{∂a} = \frac{∂f}{∂b}\frac{∂b}{∂a} + \frac{∂f}{∂c}\frac{∂c}{∂a}$

$\frac{∂f}{∂x} = \frac{∂f}{∂a}\frac{∂a}{∂x}$

Note that we implicitly applied the chain rule to obtain $∂f/∂x$. By substituting the results of the derivatives of the elementary functions, we get

$\frac{∂f}{∂c} = 1 · \frac{1}{2√c} + 1 · (− sin(c))$

${∂f}{∂b} = \frac{∂f}{∂c} · 1$

$\frac{∂f}{∂a} = \frac{∂f}{∂b} exp(a) + \frac{∂f}{∂c} · 1$

$\frac{∂f}{∂x} = \frac{∂f}{∂a} · 2x$

By thinking of each of the derivatives above as a variable, we observe
that the computation required for calculating the derivative is of similar
complexity as the computation of the function itself.

Backpropagation is a standard method of training artificial neural networks.  This method is used for fine-tuning the weights of a neural net based on the error rate obtained in the previous iteration. Proper tuning of the weights reduces the error rates and allows the model to make increasingly reliable predictions. It traverses the network in reverse order, from the output to the input layer, according to the chain rule from calculus and helps to calculate the gradient of a loss function with respect to all the weights in the network.

![NN](https://cdn.iisc.talentsprint.com/CDS/NN.jpg)

**Training a neural network: Forward and Backward propagation**

- Forward propagation sequentially calculates and stores intermediate variables within the computational graph defined by the neural network. It proceeds from the input to the output layer.

- Backpropagation sequentially calculates and stores the gradients of intermediate variables and parameters within the neural network in the reversed order.

**Implementing backpropagation**

The back propagation algorithm begins by comparing the actual value output by the forward propagation process to the expected value and then moves backward through the network, slightly adjusting each of the weights in a direction that reduces the size of the error by a small degree. Both forward and back propagation are re-run thousands of times on each input combination until the network can accurately predict the expected output of the possible inputs using forward propagation.
Here we take a simple example consisting of input X and output y as given below

In [5]:
# Initialize the input and output
X = np.array(([0, 0], [0, 1], [1, 0], [1, 1]), dtype=float)
y = np.array(([0], [1], [1], [0]), dtype=float)

In [7]:
# Initialize the parameters
output = None
weights = [np.random.uniform(low=-0.2, high=0.2, size=(2, 2)), np.random.uniform(low=-2, high=2, size=(2, 1)) ]

# YOUR CODE HERE
iterations = 2000
learning_rate = 0.1

#### Forward propagation

In forward propagation, input is multiplied with weights and resultant output is passed as input to hidden layers and finally output is carried at final layer.

Below function `feed_forward_pass()` takes input as argument and produces the output by multiplying with weights in sequential layers. It return final layer output and also all the layers outputs which can be useful in backpropagation.  


In [9]:
def feed_forward_pass(x_values):
    # forward
    input_layer = x_values
    print("1 - ",input_layer)
    hidden_layer = tang(np.dot(input_layer, weights[0]))
    print("2 - ",hidden_layer)
    # dot product of hidden layer output with weights and applying activation over it
    output_layer = tang(np.dot(hidden_layer, weights[1]))
    print("3 - ",output_layer)
    # YOUR CODE HERE
    layers = [input_layer,hidden_layer,output_layer]
    return layers, layers[2]

#### Backpropagation

Backpropagation is an algorithm commonly used to train neural networks. When the neural network is initialized, weights are set for its individual elements, called neurons. Inputs are loaded, they are passed through the network of neurons, and the network provides an output for each one, given the initial weights. Backpropagation helps to adjust the weights of the neurons so that the result comes closer and closer to the known true result.

![image.png](https://cdn.iisc.talentsprint.com/CDS/BP.JPG)

In [10]:
# back propagation error through the network layers
def backward_pass(target_output, actual_output, layers):
    global weights
    # divergence of network output
    err = (target_output - actual_output)
    # backward from output to input layer
    # propagate gradients using chain rule
    for backward in range(2, 0, -1):
        err_delta = err * derivative_tang(layers[backward])
        # update weights using computed gradient
        weights[backward - 1] += learning_rate * np.dot(layers[backward - 1].T, err_delta)
        # propagate error using updated weights of previous layer
        err = np.dot(err_delta, weights[backward - 1].T)
    return err

#### Activation functions

Activation function is used to determine the output of neural network like yes or no. It maps the resulting values in between 0 to 1 or -1 to 1 etc. (depending upon the function)

In [11]:
# activation functions
def tang(y):
    # YOUR CODE HERE
    return np.tanh(y)

# derivative of tang function to use in backpropagation
def derivative_tang(y):
    # YOUR CODE HERE
    return 1.0 - y ** 2

def sigmoid(y):
    # YOUR CODE HERE
    return 1 / (1 + np.exp(-y))

# derivative of sigmoid function to use in backpropagation
def derivative_sigmoid(y):
    # YOUR CODE HERE
    return y * (1 - y)

#### Train the network by calling `feed_forward_pass` and `backward_pass`

In [12]:
def train(x_values, target):
    # YOUR CODE HERE
    layers , output = feed_forward_pass(x_values)
    error = backward_pass(target, output,layers)

##### we train the network for n iterations to update the weights accordingly

In [15]:
# training the network for 500 iterations i.e. weights will update 500 times
for i in range(iterations):
  output = train(X, y)
  ten = iterations // 10
  # if i % ten == 0:
  #   print("Iteration number: {} / Squared loss:{} ".format(str(i), str(np.mean(np.square(y - output)))))


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
2 -  [[ 0.          0.        ]
 [-0.42846913 -0.98948781]
 [-0.42839524 -0.98939331]
 [-0.72397565 -0.99994366]]
3 -  [[0.        ]
 [0.8924965 ]
 [0.89250118]
 [0.03757459]]
1 -  [[0. 0.]
 [0. 1.]
 [1. 0.]
 [1. 1.]]
2 -  [[ 0.          0.        ]
 [-0.42846862 -0.98949118]
 [-0.42839478 -0.98939674]
 [-0.72397508 -0.99994369]]
3 -  [[0.        ]
 [0.89254195]
 [0.89254662]
 [0.03754359]]
1 -  [[0. 0.]
 [0. 1.]
 [1. 0.]
 [1. 1.]]
2 -  [[ 0.          0.        ]
 [-0.4284681  -0.98949454]
 [-0.42839431 -0.98940016]
 [-0.7239745  -0.99994373]]
3 -  [[0.        ]
 [0.89258735]
 [0.89259201]
 [0.03751265]]
1 -  [[0. 0.]
 [0. 1.]
 [1. 0.]
 [1. 1.]]
2 -  [[ 0.          0.        ]
 [-0.42846759 -0.98949789]
 [-0.42839384 -0.98940357]
 [-0.72397393 -0.99994377]]
3 -  [[0.        ]
 [0.89263269]
 [0.89263735]
 [0.03748175]]
1 -  [[0. 0.]
 [0. 1.]
 [1. 0.]
 [1. 1.]]
2 -  [[ 0.          0.        ]
 [-0.42846707 -0.98950125]
 [-0

#### Predict

let us define a function to predict the input by forward_pass using the updated weights

In [16]:
def predict(x_values):
    # passing inputs through the forward pass
    # YOUR CODE HERE
    return feed_forward_pass(x_values)[1]

In [17]:
# predict
for i in range(len(X)):
    # YOUR CODE HERE
    print('-' * 20)
    print('Input value: ' + str(X[i]))
    print('Predicted target: ' + str(predict(X[i])))
    print('Actual target: ' + str(y[i]))

--------------------
Input value: [0. 0.]
1 -  [0. 0.]
2 -  [0. 0.]
3 -  [0.]
Predicted target: [0.]
Actual target: [0.]
--------------------
Input value: [0. 1.]
1 -  [0. 1.]
2 -  [-0.4282938  -0.99059803]
3 -  [0.90785754]
Predicted target: [0.90785754]
Actual target: [1.]
--------------------
Input value: [1. 0.]
1 -  [1. 0.]
2 -  [-0.42823521 -0.99052242]
3 -  [0.90786067]
Predicted target: [0.90786067]
Actual target: [1.]
--------------------
Input value: [1. 1.]
1 -  [1. 1.]
2 -  [-0.72378014 -0.99995502]
3 -  [0.02779896]
Predicted target: [0.02779896]
Actual target: [0.]


### Please answer the questions below to complete the experiment:




In [18]:
# @title Backpropagation is a way of computing the partial derivatives of a loss function with respect to: { run: "auto", form-width: "500px", display-mode: "form" }
Answer = "the weights and biases of the network" #@param ["","the weights and biases of the network","the weights and biases of only the first layer of the network","the weights and biases of only the last layer of the network"]

In [20]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "Too Difficult for me" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [19]:
#@title If it was too easy, what more would you have liked to be added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "Found it difficult." #@param {type:"string"}


In [21]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "Yes" #@param ["","Yes", "No"]


In [22]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "Somewhat Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [23]:
#@title Mentor Support: { run: "auto", vertical-output: true, display-mode: "form" }
Mentor_support = "Very Useful" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [24]:
#@title Run this cell to submit your notebook for grading { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id = return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")

Your submission is successful.
Ref Id: 1628
Date of submission:  23 Jan 2024
Time of submission:  23:01:17
View your submissions: https://cds-iisc.talentsprint.com/notebook_submissions
