# Gradient Descent Objectives 
* Understand the general process of gradient descent with respect to RSS(cost function) 
* Be able to define parameters, step size and learning rate

### Simple example of Gradient Descent

In [None]:
# imports
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

# read data into a DataFrame
data = pd.read_csv('data/Advertising.csv', index_col=0)
data.head()

In [None]:
data = data.sample(5, random_state=1234)
X = data['TV']
y = data['Sales']
print(data)

In [None]:
plt.scatter(X, y)
plt.ylabel('Sales')
plt.xlabel('Tv Advertising Dollars')
plt.show()

In [None]:
data.drop(['Radio', 'Newspaper'], axis=1)

In [None]:
def regression_formula(x, slope, intercept):
    return slope * x + intercept

> Now, we need an initial starting point for gradient descent. Let's choose 0.1 as our initial slope and 0 for our intercept.

In [None]:
slope1 = 0.1
intercept1 = 0

In [None]:
fig = plt.figure(figsize = (12, 8))
plt.scatter(X, y, label = 'Raw Data')
axes = plt.axes()
axes.set_ylim([0, 30])
plt.plot(X, regression_formula(X, slope1, intercept1), color = 'k', label = 'Regression')
plt.legend()
plt.show();

In [None]:
print(regression_formula(X, slope1, intercept1))

**Arithmatically, our function looks like this:**

    Sales = 0.1(TV $) + 0 
    
Now, let's calculate the Sum of Squared Error(Cost function) for this line by plugging in the predicted x-value into our formula and getting the predited y-value and substracting it from the actual y-value. 

![](https://github.com/justmarkham/DAT4/raw/068d887e4be2eedb1b958b345ae097153f762d75/notebooks/08_estimating_coefficients.png)

## Steps to find the optimal slope and intercept of a line of best fit using RSS as our cost function 

1. Take the derivative of the loss function for each parameter(gradient).
2. Pick random values for the parameters. 
3. Plug the parameter values into the derivatives. 
4. Calculate the step sizes (slope * learning rate) 
5. Calculate new parameters (old parameters - step size) 
6. Repeat steps 3-5 until max number of steps is reached or minimum step size is reached. 

![](https://i1.wp.com/ucanalytics.com/blogs/wp-content/uploads/2016/03/Picture1-1.jpg)

## Derivatives in gradient descent 
**A derivative tells us how a function is changing at any given point in time. They calculate the rate of change** 

## Quick Review - Rules for taking Derivatives

1. **Power Rule** - $$f(x) = x^r $$

Then, the derivative is: 
$$ f'(x) = r*x^{r-1} $$

2. **Constant factor rule** - $$f(x) = 2x^2 $$


$$f'(x) = 2*\frac{\Delta f}{\Delta x} x^{2} = 2*2*x^{2-1} = 4x^1 = 4x $$

3. **Addition Rule** - To take a derivative of a function that has multiple terms, simply take the derivative of each of the terms individually.  So for the function above, 

$$ f(x) = 4x^3 - x^2 + 3x $$

$$ f'(x) = 12x^2 - 2x + 3  $$  

4. **Chain Rule** - allows us to take partial derivatives of a function with respect to the other variables. See [Canvas lesson](https://my.learn.co/courses/123/assignments/6485?module_item_id=15085)

## Let's walk-thru the steps with the Advertising Data 
![](img/Intro-gradient-descent.jpg)
![](img/walk-thru-GD.jpg)
![](img/taking_partial_derivatives_GD.jpg)