# Gradient Descent

## 02. Learning using the gradient

![image.png](attachment:8e96abd9-dc59-4a0b-846e-f21b27137643.png)

![image.png](attachment:3c4efde2-7b0c-4180-ba5c-8e0a7546536d.png)

### Follow the gradient

![image.png](attachment:d0756adc-6652-4e2d-a01a-8e7f76a9dec2.png)

![image.png](attachment:31adb055-249f-4f18-87e2-2a5f689c817f.png)

![image.png](attachment:bf34746a-0e90-4a12-8019-d6973d270aa6.png)

![image.png](attachment:ddc67d1e-8286-4509-be5d-f5975d3e0adc.png)

### Local minimum

![image.png](attachment:70329b06-f5bd-41f9-92c6-66cb38cec46c.png)

### Summary
Let’s summarize what we’ve learned in this unit. Here are a few takeaways.

- Gradient descent follows the opposite direction of the **gradient**.
- It may return **suboptimal solutions** when the function is not **convex**.

In the next unit, we will see how to implement the algorithm for the simple linear regression model with the mean squared error (MSE) objective function.

## 03. Gradient descent algorithm

![image.png](attachment:49f5dce8-5fa1-44be-92be-bcfb9c7f75ac.png)

### Step size

![image.png](attachment:14b79027-7138-418c-8f5a-0668bc6d86d1.png)

### Compute the gradient

![image.png](attachment:af2deeea-e672-4e55-a9b2-bb6985aaea08.png)

![image.png](attachment:11492c88-13dd-4fb2-8b4d-e06834a6af74.png)

As we can see, the gradient is proportional to the residuals, and the algorithm makes **bigger steps** when it’s further away from the optimal solution.

### Algorithm

We can now use these partial derivatives to write the algorithm. Note that the learning rate lr, the initial a, b values and the number of iterations n_steps depend on the dataset. For this reason, we don’t assign them to specific values in the code below.

In [None]:
import numpy as np

# Initialization
lr = ... # learning rate
a, b = ... # initial a,b values
n_steps = ... # number of iterations

# n_steps iterations
for step in range(n_steps):
    # Predictions with the current a,b values
    y_pred = a*x + b

    # Compute the error vector
    error = y - y_pred

    # Partial derivative with respect to a
    a_grad = -2*np.mean(x*error)

    # Partial derivative with respect to b
    b_grad = -2*np.mean(error)

    # Update a and b
    a -= lr*a_grad
    b -= lr*b_grad

At each iteration, we compute predictions y_pred using the current a, b values. Then, we create an error variable which is an array of shape (n,), and use it to compute the two partial derivatives. Finally, we update the parameters using the learning rate lr.

### Summary
In this unit, we saw how to derive and implement the gradient descent algorithm for the simple linear regression model with MSE. Here are a few takeaways.

- Gradient descent uses a **learning rate** and the gradient of the loss function to compute the **parameters update**.
- It’s an **iterative algorithm** that takes **small steps** in the opposite direction of the gradient.

In the next unit, we will test our implementation on the bike sharing dataset.


## 04. Numpy implementation

In the last unit, we derived the gradient descent algorithm and implemented it with Numpy. We will now test our implementation on the bike sharing dataset. You can download it from the resource section.

At the end of this unit, you should have a better understanding of the algorithm and know how to track its progress.

### Monitor the loss value

Let’s start by loading the dataset.


In [1]:
import pandas as pd

# Load the data
data_df = pd.read_csv("Ressources/c3_bike-sharing.csv")
data_df.head()

Unnamed: 0,temp,users
0,0.1964,120
1,0.2,108
2,0.227,82
3,0.2043,88
4,0.1508,41
