**Homework 10**


In [52]:
import pandas as pd
import numpy as np

*Problem 1.*

Let $f(x,y)=x^2+2xy+2y^2-4x-4y$.

Calculate  $\nabla f(x,y)$, the gradient of $f(x,y)$ on paper. (No need to turn this in, but you'll need it for the next parts of the problem.) In this problem you will use the gradient to find the minimum of $f(x,y)$. Do this first on paper by setting the gradient equal to $\langle 0,0 \rangle$ and solving for $x$ and $y$, so you can check that gradient descent is giving you the right answer.


Next, write a function fGD which implements gradient descent to find the minimum of $f(x,y)$. Your function should take in the following parameters:
* `lr` (learning rate)
* `max_iter` (maximum number of iterations)
* `x_init` (initial value of x)
* `y_init` (initial value of y)

Your function should return the final values of x and y

In [53]:
# x = -6, y = 4 is the minimum of f(x,y)

def fGD(lr,max_iter,x_init,y_init):
  x=x_init
  y=y_init
  for i in np.arange(max_iter):
    p_x = 2*x + 2*y - 4
    p_y = 2*x + 4*y -4
    x -= lr * p_x
    y -= lr * p_y
  return x,y

Now check your answer by calling this function with a learning rate of 0.0001, max_iter of 10000, and inital values of 5 and 5 for `x` and `y`. Did your function come close to the correct answers?

In [54]:
xmin1,ymin1=fGD(0.0001,10000,5,5) #Don't change this
xmin1,ymin1

(1.9858947847209645, 0.045139149713967715)

*Problem 2*

Write a class GDRegressor which implements gradient descent on MSE loss to fit an approximate linear model to a given data set.

In [None]:
class GDRegressor():
    def __init__(self,learning_rate,max_iter):
        self.lr=learning_rate
        self.max_iter=max_iter

    def fit(self,X,y):
        self.coef=np.ones((X.shape[1],)) #Initial values
        self.intercept=1 #Initial value
        for i in range(self.max_iter):
            residuals=  y - self.predict(X)                  
            coef_grad = (2/X.shape[0]) * np.dot(X.T,residuals)
            intercept_grad= 2*np.mean(residuals)
            self.coef+= self.lr * coef_grad
            self.intercept+= self.lr * intercept_grad

    def predict(self,X):
        return np.dot(X, self.coef) + self.intercept

You can test your code here. Is the result close to what you would expect?

In [56]:
x=np.arange(10)
y=3*x+2
X=x[:,np.newaxis] #Converts shape to (10,1)
lin_mod=GDRegressor(.01,2000)
lin_mod.fit(X,y)
lin_mod.coef, lin_mod.intercept

(array([3.00000128]), 1.999991997794565)

We now try your new class on the `disp` vs `mpg` problem from previous assignments. Let's bring those data sets back:

In [57]:
cars=pd.read_csv('https://vincentarelbundock.github.io/Rdatasets/csv/causaldata/auto.csv')
disp=np.array(cars.displacement)
mpg=np.array(cars.mpg)

index=np.argsort(disp)
disp=disp[index]
mpg=mpg[index]

Gradient descent works best with scaled data, so we'll need to import a `StandardScalar` class from sklearn:

In [58]:
from sklearn.preprocessing import StandardScaler

This class works almost exactly the same as the one you wrote in previous assignments, except that it expects a 2D-array, even when you have one column of data. To fix this, we reshape our data:

In [59]:
disp=disp[:,np.newaxis]
disp

array([[ 79],
       [ 85],
       [ 86],
       [ 86],
       [ 89],
       [ 90],
       [ 91],
       [ 97],
       [ 97],
       [ 97],
       [ 97],
       [ 97],
       [ 98],
       [ 98],
       [105],
       [105],
       [107],
       [119],
       [119],
       [119],
       [121],
       [121],
       [121],
       [131],
       [134],
       [134],
       [140],
       [140],
       [140],
       [146],
       [151],
       [151],
       [151],
       [156],
       [163],
       [163],
       [196],
       [196],
       [200],
       [200],
       [225],
       [225],
       [231],
       [231],
       [231],
       [231],
       [231],
       [231],
       [231],
       [231],
       [231],
       [231],
       [231],
       [231],
       [231],
       [250],
       [250],
       [250],
       [258],
       [302],
       [302],
       [302],
       [302],
       [304],
       [318],
       [318],
       [350],
       [350],
       [350],
       [350],
       [350],
      

Now, fit a `StandardScaler` object to `disp` and transform it:

In [60]:
sd = StandardScaler()
sd.fit(disp)
scaled_disp=sd.transform(disp)

In [62]:
scaled_disp

array([[-1.29691206],
       [-1.23113311],
       [-1.22016995],
       [-1.22016995],
       [-1.18728047],
       [-1.17631731],
       [-1.16535415],
       [-1.0995752 ],
       [-1.0995752 ],
       [-1.0995752 ],
       [-1.0995752 ],
       [-1.0995752 ],
       [-1.08861204],
       [-1.08861204],
       [-1.01186993],
       [-1.01186993],
       [-0.98994361],
       [-0.85838571],
       [-0.85838571],
       [-0.85838571],
       [-0.83645939],
       [-0.83645939],
       [-0.83645939],
       [-0.7268278 ],
       [-0.69393832],
       [-0.69393832],
       [-0.62815937],
       [-0.62815937],
       [-0.62815937],
       [-0.56238042],
       [-0.50756462],
       [-0.50756462],
       [-0.50756462],
       [-0.45274883],
       [-0.37600672],
       [-0.37600672],
       [-0.01422248],
       [-0.01422248],
       [ 0.02963016],
       [ 0.02963016],
       [ 0.30370913],
       [ 0.30370913],
       [ 0.36948808],
       [ 0.36948808],
       [ 0.36948808],
       [ 0

Create a new `GDRegressor` object called `mpg_mod`. Use a learning rate of 0.1 and a `max_iter` of 1000. Then, fit it to `scaled_disp` and `mpg`. (Remember to first reshape `scaled_disp` appropriately).

In [64]:
mpg_mod= GDRegressor(learning_rate=.1, max_iter=1000)
mpg_mod.fit(scaled_disp, mpg)

Check the RSS of your model, and compare your answer to the RSS of the model you found by the normal equations in Homework 7.

In [66]:
preds = mpg_mod.predict(scaled_disp)
RSS=np.sum((mpg - preds)**2)
RSS

1226.7841189255553