# Advanced Certification Program in Computational Data Science
## A program by IISc and TalentSprint
### Demo Notebook: Autograd

## Learning Objectives

At the end of the experiment, you will be able to

* understand and implement linear regression model and calculate gradient loss function using autograd

## Information

**Goal:** Fit the parameters (slope and intercept) of a simple linear regression model via gradient descent (GD), using Autograd 

**Dataset:** The Dataset consists of age vs systolic blood pressure measurements of 33 American women 

**Task-flow:**

* fit a linear model via the sklearn machine learning library of python to get the fitted values of the intercept and slope as reference. 

* use the autograd library and the contained *grad* function to fit the parameters of the simple linear model via GD with the objective to minimize the MSE loss. 
    * define the mse loss function 
    * determine the gradients of the loss w.r.t. the parameters via automatic differentiation
    * use these gradients to update the parameter values via the update formula
    * iterate over the two former steps for many steps and check the current values of the estimated model parameters and the loss after each update step 
    * verify that the estimated parameter values converge to the values which you got from the sklearn fit.  

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('default')
# from sklearn importing linear regression model
from sklearn.linear_model import LinearRegression

Here we read in the systolic blood pressure and the age of the 33 American women in our dataset. Then we use the sklearn library to find the optimal values for the slope a and the intercept b.

In [None]:
# Blood Pressure data
x = [22, 41, 52, 23, 41, 54, 24, 46, 56, 27, 47, 57, 28, 48, 58,  9, 
     49, 59, 30, 49, 63, 32, 50, 67, 33, 51, 71, 35, 51, 77, 40, 51, 81]
y = [131, 139, 128, 128, 171, 105, 116, 137, 145, 106, 111, 141, 114, 
     115, 153, 123, 133, 157, 117, 128, 155, 122, 183,
     176,  99, 130, 172, 121, 133, 178, 147, 144, 217] 
# Convert the input to an array
x = np.asarray(x, np.float32) 
y = np.asarray(y, np.float32)

In [None]:
# A scatter plot of y (Blood Pressure) vs. x (age)
plt.scatter(x=x,y=y)
plt.title("blood pressure vs age")
plt.xlabel("x (age)")
plt.ylabel("y (sbp)")

# importing linear regression model
model = LinearRegression()
# fit training data, for supervised learning applications, this accepts two arguments: the data X and the labels y
res = model.fit(x.reshape((len(x),1)), y)         # here .reshape() changes the data shape to a 1D matrix or column   
predictions = model.predict(x.reshape((len(x),1)))          
plt.plot(x, predictions)
plt.show()
print("intercept = ",res.intercept_,"slope = ", res.coef_[0],)

## Autograd

Now we want to use Autograd, a library for automatic differentiation. First we need to install it. Then we again can define our mse loss and calculate the minimal loss with the optimal values for the slope a and the inercept b from above.

In [None]:
!pip install autograd        

In [None]:
import autograd.numpy as np  # automatically differentiate native Python and Numpy code
from autograd import grad

In [None]:
# defining a loss function as Mean Square Error
def loss(a,b):
  y_hat = a*x + b
  return np.sum((y_hat - y)**2) / len(x)

In [None]:
loss(1.1050216,87.67143)    #minimal loss for the optimal values for slope a and intercept b

Now we define that we want to have the gradients of the loss w.r.t to our two model parameters, the slope a and the intercept b. In the next cell we print the gradient of the loss w.r.t to a and gradient of the loss w.r.t to b. Note that we calculated the loss for all data points and therefore we get diffrent gradients then in nb_04, where we only used one datapoint. Autograds *grad* function, takes a function as input and returns a function that computes its derivative. You can use the derivative function to compute the gradient at a specific position of the loss function.

In [None]:
# calculating gradient of loss function
grad_loss_a = grad(loss,0)
grad_loss_b = grad(loss,1)
print(grad_loss_a(0.,139.))
print(grad_loss_b(0.,139.))

Now, let's use gradient descent to optimize the slope a and the intercept b. The start values are a=0 and b=139  (139 is the mean of the blood pressure and slope a=0 implies that the model predicts the mean for each age). Our learning rate eta is 0.0004 and we do 80000 update steps with all 33 observations. 

In [None]:
eta = 0.0004    # learning rate
a = 0.0         # inital guess for slope
b = 139.0       # initial guess for intercept

# making a for loop with 80000 iterations for finding the optimal values of a and b 
for i in range(80000):
    grad_a, grad_b  = grad_loss_a(a,b),grad_loss_b(a,b)
    a = a - eta * grad_a
    b = b - eta * grad_b
    if (i % 5000 == 0):
      print("Epoch:",i, "slope=",a,"intercept=",b,"gradient_a", grad_a, "gradient_b",grad_b, "mse=", loss(a,b))


Let's look at the final values for the slope a, the intercept b and the mse loss. We know form the closed formula solution that:

1.   optimal value for a: 1.1050216
2.   optimal value for b: 87.67143
3.   minimal loss: 349.200787168560

After 80000 update steps we are very close to the optimal values


In [None]:
# display a, b and loss function
print(a,b, loss(a,b))