# ML 02 - Gradient Descent

## Atmospheric CO<sub>2</sub> levels

Download the data file **``co2_mm_mlo.txt``** from <br />
https://climate.nasa.gov/vital-signs/carbon-dioxide/ <br \>
(Same data set as in ML 01 - Logistic Regression)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn
%matplotlib inline 

In [None]:
df = pd.read_table('co2_mm_mlo.txt', skiprows=72, delim_whitespace=True)

In [None]:
x = np.array(df[[2]])
Y = np.array(df[[4]])

We've seen that the 2nd order polinomial $ y = w_0 + w_1 x + w_2 x^2 $, with parameters

$w_0 = 4.75\times 10^4$,

$w_1 = -49.0$ , and

$w_2 = 1.27 \times 10^{-2}$,

represents a good fit for this data set.<br \>
Here we want to solve this optimization problem again, using a **gradient descent** numerical algorithm.

# Gradient Descent

$
\textrm{Gradient:} 
\quad
\nabla_{\theta}J = \frac{1}{m}X^T (X\theta-Y)
$

$
\textrm{Gradient descent:} 
\quad
\theta = \theta -\epsilon\ \nabla_{\theta}J \quad \textrm{until convergence to } \theta^*
$

In [None]:
def gradient_linear(X,Y,theta):
    return np.dot(X.T,(np.dot(X,theta)-Y))/len(X)

### Feature Normalization

$$
\chi_0 = 1
\quad , \quad
\chi_1 = \frac{x-\langle x\rangle}{\sigma_x}
\quad , \quad
\chi_2 = \frac{x^2-\langle x^2\rangle}{\sigma_{x^2}}
$$

Therefore,

\begin{eqnarray}
y  & = & w_0 + w_1 x + w_2 x^2  \\
   & = & \omega_0 + \omega_1 \chi_1 + \omega_2 \chi_2
\end{eqnarray}

With 

\begin{eqnarray}
\omega_0 & = & w_0 + \langle x\rangle w_1 + \langle x^2\rangle w_2   \\[1mm]
\omega_1 & = & \sigma_x w_1\\[1mm]
\omega_2 & = & \sigma_x^2 w_2
\end{eqnarray}

or

\begin{eqnarray}
w_0 & = & \frac{\omega_1 }{ \sigma_x }   \\[2mm]
w_1 & = & \frac{\omega_2 }{ \sigma_{x^2} } \\[2mm]
w_2 & = & \omega_0 -\frac{\langle x\rangle}{\sigma_x} \omega_1 -\frac{\langle x^2\rangle}{\sigma_{x^2}} \omega_2
\end{eqnarray}


In [None]:
chi0 = x**0;
chi1 = (x-np.mean(x))/np.std(x);
chi2 = (x**2-np.mean(x**2))/np.std(x**2);
X = np.hstack((chi0 , chi1 , chi2))

In [None]:
R = np.array([[1.,-np.mean(x**1)/np.std(x**1),-np.mean(x**2)/np.std(x**2)],
[0.,1./np.std(x**1),0.],
[0.,0.,1./np.std(x**2)]])

In [None]:
theta = np.zeros((3,1))

In [None]:
epsilon = .6

In [None]:
for i in range(0, int(1e+6)):
    theta = theta - epsilon*gradient_linear(X,Y,theta)
print theta

In [None]:
np.dot(R,theta)

### Cost Function Visualization 

In [None]:
# Cost function
def cost_function_linear(X,Y,theta):
    return (X.dot(theta)-Y).T.dot(X.dot(theta)-Y)[0][0]/2./len(Y)

In [None]:
theta = np.zeros((3,1))
epsilon = 1.

step=list()
cost=list()

for i in range(0, 1000):
    for j in range(0,1000):
        theta = theta - epsilon*gradient_linear(X,Y,theta)
    step.append(i)
    cost.append(cost_function_linear(X,Y,theta))
plt.plot(step, cost,color='dodgerblue');

In [None]:
print 'w0 = ' + '{:03.2f}'.format(np.dot(R,theta)[0][0])
print 'w1 = ' + '{:03.2f}'.format(np.dot(R,theta)[1][0])
print 'w2 = ' + '{:03.2f}'.format(np.dot(R,theta)[2][0])