<a href="https://colab.research.google.com/github/chefs-kiss/intro2ml/blob/main/LIN2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Gradient at a point

##Partial derivatives of a function F(x,y)

The code below computes the partial derivative of a function `F` with respect to one of its variables.

To use it:

* Update the function `F` to match the example from the slide deck.

* Modify the `var` variable to specify which parameter you want to differentiate with respect to. This can be `x`, `y`, or any other variable used in your function.

This will return the partial derivative of `F` with respect to the specified variable.

In [25]:
import sympy as sp
def derive(func, var):
  F = sp.sympify(func)
  var = sp.symbols(var)
  return sp.diff(F, var)

F = "x**2 + 4*y*x"
var = "y"

partial = "dFd" + var + " :"
print(partial,derive(F, var))

dFdy : 4*x


##Gradient

The gradient of a function is a vector, gives the direction in which the function increases the fastest near a point.

We'd like to see some information about how our function `F` is chaning with respect to a particular point `(2,1)`.

Use the `derive` function we created above to find the two partials. This will give us the gradient for `F(2,1)`.

In [None]:
point = (2,1)
x = point[0]
y = point[1]

partial_x = #your code here
partial_y = #your code here

partial_x, partial_y

This tells us the rate of steepest increase from the point `(2,1)`, or rather, the direction to go in order to maximize our cost function.

However, we'd like to minimize our cost function, so we're more interested in going in the direction of the greatest decrease. This is simply the negative gradient of `F(x,y)`. Run the code chunk below to get this value.

In [None]:
-1*partial_x, -1*partial_y

Now that we've seen how to find a gradient at a particular point, we're going to see how this can be used in linear regression.

#Gradient Descent


##General solution to gradient of cost function

Let the cost function be denoted by `C`. The gradient of `C` with respect to the parameters is given by the vector `(dC/dB0, dC/dB1)`.

Each component of the gradient tells us the rate of change of the cost function with respect to one of the parameters.

In this gradient:

* The first component `dC/dB0` represents how the cost function changes when `B0` increases by one unit.

* The second component `dC/dB1`, represents how the cost function changes when `B1` increases by one unit.

This gives us a general expression for the gradient that can be evaluated for any specific values of `B0` and `B1`.

## `compute_gradient` function

Below is a function `compute_gradient` that takes in some data (x_values, y_values) and returns the gradient of the cost function for linear regression.

In [None]:
import sympy as sp

def compute_gradient(x_values, y_values):
    # define b0, b1, x, and y
    b0, b1, x = sp.symbols('B0 B1 x')

    # linear regression model
    y_pred = b0 + b1 * x

    # cost function (MSE)
    C = sum((y_pred.subs(x, x_val) - y_val)**2 for x_val, y_val in zip(x_values, y_values)) / len(x_values)

    # partial derivatives of the cost function with respect to b0 and b1
    dC_db0 = sp.diff(C, b0)
    dC_db1 = sp.diff(C, b1)

    print(f"The partial derivative of cost with respect to B0: {dC_db0}")
    print(f"The partial derivative of cost with respect to B1: {dC_db1}")

    return dC_db0, dC_db1




Let's now take some points and store the x-values and y-values separately. We'll use the function above to find the gradient.

In [None]:
x_values = [1, 3, 2, 3, 4]
y_values = [3, 4, 3, 5, 5]

partial_b0, partial_b1 = compute_gradient(x_values, y_values)

These partial derivatives give us a general form of any gradient for this particular set of data.

## Finding explicit gradients and updating betas
Once we have a generalized gradient calculated, we'll need to do two steps:
* Step1: Plug in whatever the current `B0` and `B1` values are into the gradient above. This gives us an explicit gradient (an actual point in space)
* Step2: Update `B0` and `B1` values by moving a small step away from the gradient from Step 1.



## `adjust_gradient` function

Below is a function called `adjust_gradient` that takes in the partials from `compute_gradient`, our current beta values, and a learning rate, and returns the explicit gradient (some point) and our new beta values.

In [None]:
def adjust_gradient(partial_b0, partial_b1, b0_init, b1_init, l):
  b0, b1, x = sp.symbols('B0 B1 x')
  # step1: substitute current values of b0 and b1 into the derivatives
  partial_b0_val = partial_b0.subs({b0: b0_init, b1: b1_init, x: x_values[0], y: y_values[0]})
  partial_b1_val = partial_b1.subs({b0: b0_init, b1: b1_init, x: x_values[0], y: y_values[0]})

  # step2: Update b0 and b1, taking a small step in the opposite direction of the gradient
  b0_val = b0_init - learning_rate * partial_b0_val
  b1_val = b1_init - learning_rate * partial_b1_val

  return partial_b0_val, partial_b1_val, b0_val, b1_val

Let's see an example of using this function!

In [None]:
b0 = 1
b1 = 2
learning_rate = 0.1
partial_bo_updated, partial_b1_updated, b0_adj, b1_adj = adjust_gradient(partial_b0, partial_b1, b0,b1,learning_rate)

print(f"The gradient is ({partial_bo_updated}, {partial_b1_updated}) \nThe adjusted betas are ({round(b0_adj,2)}, {round(b1_adj,2)})")

Cool, now let's bring it all together!

##Example data
Let's see an example of finding a general gradient for a set of data, and then doing a few iterations of gradient descent. Below is some code to find the general form of a gradient, along with the first couple of iterations of gradient descent.

Run the code chunk as is.

We'd like to do a few more iterations. Update the for loop to do six iterations of this.

In [None]:
#data
x_values = [1, 3, 2, 3, 4]
y_values = [3, 4, 3, 5, 5]

#starting values
b0 = 1
b1 = 2
learning_rate = 0.1

#computing general gradient
print("General Gradient")
partial_b0, partial_b1 = compute_gradient(x_values, y_values)

#a few iterations of gradient descent
for i in range(2):
  print(f"\nIteration {i+1}")
  partial_bo_updated, partial_b1_updated, b0, b1 = adjust_gradient(partial_b0, partial_b1, b0,b1,learning_rate)
  print(f"The gradient is ({round(partial_bo_updated,2)}, {round(partial_b1_updated,2)}) \nThe adjusted betas are ({round(b0,2)}, {round(b1,2)})")

#Evaluating the beta values with MSE

Let's build a function `find_MSE` which takes in the beta values `B0`, `B1`, and our data `(x_values, y_values)` and computes the MSE.

* Add code to use our model to predict y-values given our feature set X. The format is `model.predict(features)`
* Add code to compute the mse between those predicted y-values and the actual y-values. Hint: what function do we import at the top of the code.

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np

def find_MSE(B0, B1, x, y):
  #reshaping data to be arrays
  x_train = np.array(x).reshape(-1, 1)
  y_train = np.array(y)
  #building lin reg model with betas from gradient descent
  lin_reg = LinearRegression()
  lin_reg.intercept_ = B0
  lin_reg.coef_ = np.array([B1])
  #predict y values given the model above
  y_pred = #your code here
  #compute mse
  mse = #your code here
  return mse


Testing

In [None]:
assert(find_MSE(0.56, 0.60, x_values, y_values)==3.7488)

## Example data, now with MSE

Let's add this to our code from above.
* Copy and paste the gradient descent code from the example data section (where we had the for loop) to the code chunk below.

* Add a new line to the body of the for loop to compute and print the mse for the given beta values and data.

In [None]:
#your code here

## How long to iterate?
Now, we'd like to run this a bit until we find our ideal beta values. Change the for loop to run 150 times.

* What iteration do we find our ideal beta values? Hint: find the iteration where the beta values stop changing.

* What do you notice about the gradient? Why is this happening?

* What are the ideal B0 and B1 values? What is the equation of the regression line for this data?

* Update the B0 and B1 variables in the plot below with these values. Does this look like a good fit?

In [None]:
import matplotlib.pyplot as plt
import pandas as pd

# Plot the points
plt.scatter(x_values, y_values, color='black', marker='o')
plt.grid(True)

#equation of the line
x_line = np.linspace(min(x_values) - 1, max(x_values) + 1, 100)
B1 = 0 #change slope
B0 = 0 #change intercept
y_line = B1 * x_line + B0
plt.plot(x_line, y_line, color='tab:olive', linewidth=2)

##Learning rate

This is the size of step we take towards the direction of steepest descent. Too big of a step, we’ll overstep where our target is (similar to divergence). Too small, and it’ll take forever to get where we’re trying to go (small convergence). We're looking for the ideal learning rate to get us to the minima as efficiently as possible.


Now let's see what happens when we change the learning rate.
* update the loop so that it only runs 5 times.
* update the learning rate to 0.15. What is happening to the MSE?
* update learning rate to 0.3. What is happening to MSE?
* update learning rate to 0.05. What is happening to MSE?
* update learning rate to 0.001. What is happening to MSE?
* play around with the learning rate until the last iteration MSE is as low as possible. What is this value?