## Learning Objectives

At the end of the experiment,  you will be able to :

* Understand the intution of Gradient Descent

## Overview 

In general terms Gradient means slope or slant of a surface. So gradient descent means descending a slope to reach the lowest point on that surface. Gradient Descent aims to minimize the cost function, a function reaches its minimum value when the slope is equal to 0. Gradient descent is an iterative algorithm, that starts from a random point on a function and travels down its slope in steps until it reaches the minimum point of that function.

### Import required packages

In [None]:
import matplotlib.pyplot as plt
from sympy import *
import numpy as np

### Intution behind Gradient Descent

Mathematical concepts before understanding the gradient descent

* Let's take a function,  $f(x) = x^{2} - x + 3$ 

* Task to find the minimum value of the function. It means that find the value of X where the value of Y is minimum.

In [None]:
# Keep all the given values of x into the function and see the output values of y.
def func(x):
    return  x**2 -x + 3

X = np.array([-6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7]) # Input values
Y = func(X) # Call the function to get the y values

print("X values: " , X)
print("f(x) values: \n", Y)

Now let's plot the function $f(x) = x^{2} - x + 3$  for the given values of X

In [None]:
# Plotting x and y values
plt.plot(X, Y, marker='o',color='b',linestyle='-');
plt.plot(X, Y, 'o', color='r');

## Gradient Descent 

It is an iterative optimization algorithm that finds the minimum value of a function. In this function the minimum value occurs at X = 0.5



### How to converge the gradient ?

* Where to start ?

* In which direction to move?

* How to reach the minimum point?


The general idea is to start with a random starting point X, the Gradient of a function gives always the direction of greatest rate of increase, and move towards the negative direction which minimizes the value of the function.



### The steps of the Gradient Descent algorithm 

1. Compute the gradient of the function (first order derivative of the function)

2. Start from the random point by choosing learning rate ($\eta$) and the no.of iterations

3. Update the gradient and move towards negative slope to reach the minimum point.



The **learning rate ($\eta$)** is a parameter which  influences the convergence of the algorithm. Larger learning rates make the algorithm take huge steps down the slope and it might jump across the minimum point thereby missing it. A low learning rate is more precise but calculating the gradient is time-consuming


In [None]:
def f_prime(x1):
  var = np.poly1d([1,-1,3])
  derivative = var.deriv()
  f_derivative = derivative(x1)   
  return f_derivative

In [None]:
# Below is the function to converge the gradient
def gradient_converge(eta , gradient_x, iterations):
  for i in range(0,iterations):
      
      # The derivative is the rate of change or the slope of a function at a given point
      deriv_x = f_prime(gradient_x)
      # Calculating the gradient_x = x - eta * f'(x)  
      gradient_x = gradient_x - eta * deriv_x 
      
  return gradient_x

Set the parameters for the gradient converge with iterations = 200, eta = 0.001

In [None]:
# Specify the number of iterations for the process to repeat.
iterations = 200

# Set an initial value, to start with
Initial_point = 6

# Learning rate  
eta = 0.001         

In [None]:
6-0.001*200

In [None]:
# Calling the gradient_converge() function
gradient_x = gradient_converge(eta, Initial_point, iterations)
print("iterations = ",iterations,"\ngradient_x = ",gradient_x)

Visualization of the Gradient Convergence for the above parameters

In [None]:
plt.plot(X, Y, marker='o',color='b',linestyle='-');
plt.plot(X, Y, 'o', color='r');
y_gradient = func(gradient_x)
plt.plot(gradient_x, y_gradient,'ko',  markersize = 10)
plt.show()

From the above graph, with iterations = 200, eta = 0.001 the gradient has moved down to point 4, which is not the minimum point.

Now let's try changing the parameters for the gradient converge with, iterations = 9000, eta = 0.001 

In [None]:
# Change the no.of iterations to 9000
iterations = 9000

# Start point
Initial_point = 6 

# Learning rate
eta = 0.001 

# Calling the gradient_converge() function
gradient_x = gradient_converge(eta, Initial_point, iterations)
print("iterations = ",iterations,"\ngradient_x = ",gradient_x)

Visualize the Gradient Convergence for the changed parameters

In [None]:
plt.plot(X, Y, marker='o',color='b',linestyle='-');
plt.plot(X, Y, 'o', color='r');
y_gradient = func(gradient_x)
plt.plot(gradient_x, y_gradient,'ko',  markersize = 10)
plt.show()

From the above graph, observe that after 9000 iterations the gradient reaches the minimum point in the plot which is very close to 0.5

Also try with learning rate ($\eta$) = 0.01 and observe how the gradient converges quickly within 1000 iteration and reaches the minimum point