# Understanding Gradient Descent

Gradient Descent is one of the most widely used optimisation algorithms in Machine Learning and Deep Learning. It works on a simple idea of finding the parameters that minimize the __cost function__ (for example __Sum of Squared Errors__ i.e SSE) by moving in a direction opposite to the gradient of the cost function to eventually reach the __local/global minima__. 

<img src="attachment:0*fU8XFt-NCMZGAWND..png" width="500"/>

# Example

Suppose we are to find the local minima of the loss function given by the equation $y = (x + 5)^2$


<img src="attachment:Screen%20Shot%202021-03-16%20at%2011.40.38%20pm.png" width="500"/>

## Step 1:

For the first step of this algorithm, we randomly select x = 3 as the starting point.

We then compute the gradient of the loss function: $dy/dx = 2(x+5)$

In [22]:
df = lambda x: 2*(x+5) #Gradient of our function 

## Step 2:

We then move in the direction of the negative of the gradient. To know how much we move, we specify a __learning rate__. Let us assume it to be __0.01__ 

Also, to specify when to stop the algorithm, we need to specify the __precision__ level and the __maximum number of iterations__. 

In [18]:
cur_x = 3 #The algorithm starts at x=3
rate = 0.01 #Learning rate
precision = 0.000001 #This tells us when to stop the algorithm
previous_step_size = 1 #
max_iters = 10000 # maximum number of iterations

## Step 3: 

Now moving to the fun part where we perform the iterations of gradient descent. 

In [19]:
iters = 0 #iteration counter

while previous_step_size > precision and iters < max_iters:
    prev_x = cur_x #Store current x value in prev_x
    cur_x = cur_x - rate * df(prev_x) #Grad descent
    previous_step_size = abs(cur_x - prev_x) #Change in x
    iters = iters+1 #iteration count
    print("Iteration #",iters,"\nX value is",cur_x) #Print iterations

Iteration # 1 
X value is 2.84
Iteration # 2 
X value is 2.6832
Iteration # 3 
X value is 2.529536
Iteration # 4 
X value is 2.37894528
Iteration # 5 
X value is 2.2313663744
Iteration # 6 
X value is 2.0867390469119997
Iteration # 7 
X value is 1.9450042659737599
Iteration # 8 
X value is 1.8061041806542846
Iteration # 9 
X value is 1.669982097041199
Iteration # 10 
X value is 1.5365824551003748
Iteration # 11 
X value is 1.4058508059983674
Iteration # 12 
X value is 1.2777337898784
Iteration # 13 
X value is 1.152179114080832
Iteration # 14 
X value is 1.0291355317992152
Iteration # 15 
X value is 0.9085528211632309
Iteration # 16 
X value is 0.7903817647399662
Iteration # 17 
X value is 0.6745741294451669
Iteration # 18 
X value is 0.5610826468562635
Iteration # 19 
X value is 0.44986099391913825
Iteration # 20 
X value is 0.3408637740407555
Iteration # 21 
X value is 0.23404649855994042
Iteration # 22 
X value is 0.1293655685887416
Iteration # 23 
X value is 0.026778257216966764
It

X value is -4.99953731593078
Iteration # 484 
X value is -4.999546569612165
Iteration # 485 
X value is -4.999555638219921
Iteration # 486 
X value is -4.999564525455523
Iteration # 487 
X value is -4.999573234946412
Iteration # 488 
X value is -4.9995817702474845
Iteration # 489 
X value is -4.999590134842535
Iteration # 490 
X value is -4.999598332145684
Iteration # 491 
X value is -4.99960636550277
Iteration # 492 
X value is -4.999614238192715
Iteration # 493 
X value is -4.999621953428861
Iteration # 494 
X value is -4.999629514360284
Iteration # 495 
X value is -4.999636924073078
Iteration # 496 
X value is -4.999644185591617
Iteration # 497 
X value is -4.999651301879784
Iteration # 498 
X value is -4.999658275842188
Iteration # 499 
X value is -4.999665110325345
Iteration # 500 
X value is -4.999671808118838
Iteration # 501 
X value is -4.9996783719564615
Iteration # 502 
X value is -4.999684804517332
Iteration # 503 
X value is -4.999691108426985
Iteration # 504 
X value is -4

In [21]:
print("The local minimum occurs at", cur_x)
print("It took ", iters, "iterations to end!")
print("The last change in step size was", previous_step_size)

The local minimum occurs at -4.9999518490318176
It took  595 iterations to end!
The last change in step size was 9.826728204487267e-07


## Remarks

Through this small demonstration, we were able to get a very brief idea about how the Gradient Descent algorithm works. To know more about this algo, I would suggest watching this video below which thoroughly explains this algo step by step. 

https://www.youtube.com/watch?v=sDv4f4s2SB8&t=110s