# Gradient Descent
---
This project is an example from Chapter 4 of the Hundred Page Machine Learning Book. It looks at linear regression of a cluster of sales data.


---
The first cell reads the .csv file and creates a list out of each column of data, excluding the header.

In [1]:
import matplotlib as plt
import matplotlib
import numpy as np
from csv import reader

# Open csv file
marketing_file_open = open('advertising.csv')
marketing_data_read = reader(marketing_file_open)
marketing_data_list = list(marketing_data_read)
marketing_data = np.array(marketing_data_list)

# segment each column into an array and convert the data from string to float
sales = marketing_data[1:-1,3].astype(float)
tv_ad = marketing_data[1:-1,0].astype(float)
radio_ad = marketing_data[1:-1,1].astype(float)
newspaper_ad = marketing_data[1:-1,2].astype(float)

## Equations of Gradient Descent

$$f(x) = wx+b$$

We want to find the values of $w$ and $b$ that minimize the mean squared error:

$$
L = \frac{1}{N} \sum_{i=1}^N [y_i - (wx_i + b)]^2
$$

In the above equation, we are calculating the $y$ value predicted for a given value of $w$ and $b$ (provided by the user), then subtracting that from the actual value of $y_i$ at that $x_i$, then squaring it. We then sum up all of those values then divide by the number of data points. That is the mean squared error.

To mimimize the mean squared error, we will take the partial derivative with respect to $w$ and $b$, for which we will need the chain rule:

$$ \frac{d}{dx} \left[ f (g (x)) \right] = f'(g(x)) \cdot g'(x) $$

and in our case, $ g(x) = y_i - (wx_i + b)$ and $f(x) = [g(x)]^2$. Let's start by finding the partial derivate with respect to $w$, $dL/dw$:

$$ g'(w) = -x_i $$
$$ f'(w) = 2[y_i - (wx_i + b)] $$