# Gradient Descent Objectives 
* Understand the general process of gradient descent with respect to RSS(cost function) 
* Understand how derivatives are used in gradient descent 
* Use OOP to apply gradient descent in linear regression 


## Steps to find the optimal slope and intercept of a line of best fit using RSS as our cost function 

1. Take the derivative of the loss function for each parameter(gradient).
2. Pick random values for the parameters. 
3. Plug the parameter values into the derivatives. 
4. Calculate the step sizes (slope * learning rate) 
5. Calculate new parameters (old parameters - step size) 
6. Repeat steps 3-5 until max number of steps is reached or minimum step size is reached. 

## Derivatives in gradient descent 
**A derivative tells us how a function is changing at any given point in time. They calculate the rate of change** 



## Rules for taking Derivatives 

1. **Power Rule** - $$f(x) = x^r $$

Then, the derivative is: 
$$ f'(x) = r*x^{r-1} $$

2. **Constant factor rule** - $$f(x) = 2x^2 $$


$$f'(x) = 2*\frac{\Delta f}{\Delta x} x^{2} = 2*2*x^{2-1} = 4x^1 = 4x $$

3. **Addition Rule** - To take a derivative of a function that has multiple terms, simply take the derivative of each of the terms individually.  So for the function above, 

$$ f(x) = 4x^3 - x^2 + 3x $$

$$ f'(x) = 12x^2 - 2x + 3  $$  

4. **Chain Rule** - allows us to take partial derivatives of a function with respect to the other variables. See [learn.co lesson](https://learn.co/tracks/module-3-data-science-career-2-1/appendix/more-on-derivatives/derivatives-the-chain-rule)


## OOP gradient descent with Linear Regression using MSE

### Steps to find the optimal slope and intercept of a line of best fit using SSR as our cost function 
1. Take the derivative of the loss function for each parameter(gradient).
2. Pick random values for the parameters. 
3. Plug the parameter values into the derivatives. 
4. Calculate the step sizes (slope * learning rate) 
5. Calculate new parameters (old parameters - step size) 
6. Repeat steps 3-5 until max number of steps is reached or minimum step size is reached. 



In [None]:
import numpy as np 

class GradientDescentLinearRegression:

    def __init__(self, learning_rate=.01, iterations=1000):
        self.learning_rate, self.iterations = learning_rate, iterations 
        
    def fit(self, X, y):
        m = 5
        b = 0
        n = X.shape[0]
        for _ in range(self.iterations):
            b_gradient = -2 * np.sum(y - m*X + b) / n 
            m_gradient = -2 * np.sum(X*(y - (m*X + b))) / n
            b = b + (self.learning_rate * b_gradient)
            m = m - (self.learning_rate * m_gradient)
        
        self.m, self.b = m, b 
        
    def predict(self, X):
        return self.m*X + self.b 
            

In [None]:
#cloud of points normally distributed around the line y=x 
np.random.seed(42)
X = np.array(sorted(list(range(5))*20)) + np.random.normal(size=100, scale=0.5)
y = np.array(sorted(list(range(5))*20)) + np.random.normal(size=100, scale=0.25)

In [None]:
lr = GradientDescentLinearRegression()
lr.fit(X, y)

In [None]:
import matplotlib.pyplot as plt 
%matplotlib inline 

plt.scatter(X, y, color='black')
plt.plot(X, lr.predict(X))

In [None]:
print(lr.b)
print(lr.m)