# Gradient Descent

In this exercice, you will create the necessary functions to go through the steps of a single Gradient Descent Epoch. You will then combine the functions and create a loop through the entire Gradient Descent procedure.

## 1. Data Exploration

👇 Import the dataset located in the folder

In [None]:
import pandas as pd

data = pd.read_csv("data.csv")

data.head()

👇 Check for missing values

👇 Visualize the relation between the variables `Phosphorus (mg/100g)` and `Zinc (mg/100g)`.

The visualization should hint at a somewhat Linear relationship between the variables. Let's use Gradient Descent to find the line of best fit between them!

## 2. Data Preprocessing

👇 Before you start, scale the two features with a `MinMaxScaler`. This will allow the Gradient Descent to be more efficient and converge faster. Add the scaled features as new columns in the dataframe.

👇 Create the two `pd.Series`
- `x` for zinc scaled
- `y` for phosphorus scaled

In [None]:
x = ?
y = ?

## 3. Code one Epoch

In this section of the exercice, you will define the key functions used to update the parameters during one epoch $\color {red}{(k)}$ of gradient descent. Recall the formula below

$$
\beta_0^{\color {red}{(k+1)}} = \beta_0^{\color {red}{(k)}} - \eta \frac{\partial L}{\partial \beta_0}(\beta^{\color{red}{(k)}})
$$


$$
\beta_1^{\color {red}{(k+1)}} = \beta_1^{\color {red}{(k)}} - \eta \frac{\partial L}{\partial \beta_1}(\beta^{\color {red}{(k)}})
$$


### Hypothesis Function

$$
\hat{y} =  a x + b
$$

👇 Define the hypothesis function of a Linear Regression. Let `a` be the slope and `b` the intercept.


In [None]:
def h(x,a,b):
    pass

❓ What would be your predicted amount of phosphorus if:
- zinc = 0.1
- a = 1
- b = 1

Use your hypothesis function to compute the answer. 

In [None]:
h(0.1,1,1)

⚠️ If the answer is not 1.1, something is wrong with your function. Fix it before moving on!

### Loss Function

$$
Least\ Squares\ Loss = \sum_{i=0}^n (y^{(i)} - \hat{y}^{(i)} )^2
$$

👇 Define the Least Squares Loss Function for the above created Hypothesis Function.



<details>
<summary>💡 Hint</summary>
You must use the Hypothesis Function within the Loss function to compute the predictions at given parameter values.
</details>



In [None]:
import numpy as np

def loss(x,y,a,b):
    pass

❓ What would be the total Loss if:
- a = 1 
- b = 1

⚠️ You should be getting 63.86. If not, something is wrong with your function. Fix it before moving on!

### Gradient

$$
\frac{d\ SSR}{d\ slope}= \sum_{i=0}^n -2(y^{(i)} - \hat{y}^{(i)} )\times x
$$

$$
\frac{d\ SSR}{d\ intercept}= \sum_{i=0}^n -2(y^{(i)} - \hat{y}^{(i)} ) 
$$

👇 Define a function to compute the partial derivatives of the Loss Function relative to parameter `a` and `b` at a given points.


<details>
<summary>💡 Hint</summary>
Again, you must use the Hypothesis Function within to compute the predictions at given points.
</details>

In [None]:
def gradient(x,y,a,b):
    
    return derivative_a, derivative_b

❓ Using your function, what would be the partial derivatives of each parameter if:
- a = 1
- b = 1

⚠️ You should be getting 48.45 and  115.17. If not, fix your function!

### Step Sizes

$$
step\ size = gradient \cdot learning\ rate
$$

👇 Define a function that calculates the step sizes alongside each parameter (a,b), according to their derivatives (derivative_a, derivative_b) and a learning_rate equals to 0.01 by default

In [None]:
def steps(derivative_a,derivative_b, learning_rate = 0.01):
    
    return (step_a, step_b)

❓ What would be the steps to take for the derivatives computed above for (a,b) = (1,1)?

⚠️ The steps should be 0.48 for a and 1.15 for b

### Update parameters (a, b)

$$
updated\ parameter = old\ parameter\ value - step\ size
$$

👇 Define a function that computes the updated parameter values from the old parameter values and the step sizes.

In [None]:
def update_params(a, b, step_a, step_b):
    
    return a_new , b_new

### Gradient Descent Epoch

👇 Using the functions you just created, compute the updated parameters `a_new` and `b_new` at the end of the first Epoch, had you started with parameters:
- a = 1
- b = 1

⚠️ You should be getting the following values:
   - a_new = 0.51
   - b_new = -0.15

## 4. Gradient Descent

👇 Now that you have the necessary functions for a Gradient Descent, loop through epochs until convergence.

- Initialize parameters `a = 1` and  `b = 1`
- Consider convergence to be **100 epochs**
- Don't forget to start each new epoch with the updated parameters
- Append the value of the loss, a, and b at each epoch to a list called `loss_history`, `a_history` and `b_history`

❓ What are the parameter values `a_100` and `b_100` at the end of the 100 epochs?

In [50]:
a_100 = ?
b_100 = ?

In [None]:
# 🧪 Test your code
from nbresult import ChallengeResult
result = ChallengeResult('descent',
                         a_100=a_100,
                         b_100=b_100)
result.write()
print(result.check())

## 5. Visual check

👇 Wrap this iterative approach into a method `gradient_descent()`

In [None]:
def gradient_descent(x, y, a_init=1, b_init=1, learning_rate=0.001, n_epochs=100):

    return a_new, b_new, history

👇 Plot the line of best fit through Zinc and Phosphorus using the parameters of your Gradient Descent.

## 6. Visualize your descent

Our goal is to plot our loss function and the descent steps on a 2D surface using matplotlib [contourf]

👇Start by creating the data we need for the plot
- `range_a` a range of 100 values for `a` equally spaced between -1 and 1
- `range_b` a range of 100 values for `b` equally spaced between -1 and 1 
- `Z` a 2D-array where each elements `Z[i,j]` is equal to the value of the loss function at `a` = `range_a[i]` and `b` = `range_b[j]`

👇 Now, plot in one single subplot:
- your gradient as a 2D-surface using matplotlib [contourf](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.contourf.html)
- all historical (a,b) points as red dots to visualize your gradient descent!

Change your learning rate and observe it's impact on the graph!

👇 [optional] What about 3D? Try out this [plot.ly - 3D contour plot](https://plotly.com/python/3d-surface-plots/) below

In [None]:
import plotly.graph_objects as go

surface = go.Surface(x=range_a, y=range_b, z=Z)
scatter = go.Scatter3d(x=history['a'], y=history['b'], z=history['loss'], mode='markers')
fig = go.Figure(data=[surface, scatter])

#fig.update_layout(title='Loss Function', autosize=False, width=500, height=500)
fig.show()

👇 Plot the history of the `loss` values as a function of number of `epochs`. Vary the `learning_rate` from 0.001 to 0.01 and make sure to understand the difference

## 7. With Sklearn...

👇 Using Sklearn, train a Linear Regression model on the same data. Compare its parameters to the ones computed by your Gradient Descent.

They should be almost identical!

### 🏁 Congratulation! Please, push your exercice when you are done