# Gradient Descent

In this exercise we have a data set which consists of a some periodic
data and some noise. We don't have a full cycle for the period, so 
this isn't easy to work with.

What we'll do is use gradient descent to find a likely set of parameters.

The data is in wheel.csv.

We would like to understand the relationship between _seconds_ and _signal_

In [1]:
import pandas
wheel = pandas.read_csv('wheel.csv')

In [2]:
# Plot the graph of signal vs seconds.
# Does it look linear?

In the following cell, we define the function that we think might be
the relationship between signal and seconds, but it has many parameters (a, b and c). 

The argument _t_ is where we will pass in "seconds"; the function returns a value that is hopefully equivalent to signal.

In [3]:
import math
def underlying_function(t, a,b,c):
    return math.sin(a * t + b) + c

We need a function that says how close a match a particular set of
parameters are: let's use the mean squared error to make sense of it.

In [9]:
import sklearn.metrics

def try_a_b_c(a,b,c):
    this_try = [underlying_function(w, a, b, c) for w in wheel.seconds]
    return sklearn.metrics.mean_squared_error(this_try, wheel.signal)

We don't need an exact derivative: a close approximation will do.

try_a_b_c is a function in three dimensions (a, b and c) so the gradient
is a 3-dimensional vector (which we store in a numpy array).

In [5]:
import numpy
def gradient_of_try_a_b_c(a, b, c, delta=0.0001):
    here = try_a_b_c(a,b,c)
    a_gradient = (try_a_b_c(a+delta, b, c) - here) / delta
    b_gradient = (try_a_b_c(a, b+delta, c) - here) / delta
    c_gradient = (try_a_b_c(a, b, c+delta) - here) / delta
    return numpy.array([a_gradient,b_gradient,c_gradient])

The following code implements gradient descent.

In [6]:
def gradient_descent(gradient_func, a0, b0, c0, l=0.01):
    vector = numpy.array([a0, b0, c0])
    g = gradient_func(a0, b0, c0)
    return vector - l * g

def gradient_descent_iterate(starting_point, 
                             gradient=gradient_of_try_a_b_c, 
                             n=10,
                             l=0.01):
    points = [starting_point]
    for i in range(n):
        (a,b,c) = points[-1]
        new_point = gradient_descent(gradient, a,b,c, l)
        points.append(new_point)
    return points

Pick a starting point: it doesn't matter much which point you choose.

In [23]:
starting_point = numpy.array([-10,10,10])

gradient_descent_iterate(starting_point, n=10000)

[array([-10,  10,  10]),
 array([-10.01458305,   9.99951116,   9.81691132]),
 array([-10.03012939,   9.9988638 ,   9.63750767]),
 array([-10.04645601,   9.99808072,   9.46171898]),
 array([-10.06332839,   9.99719142,   9.2894762 ]),
 array([-10.08047382,   9.99623048,   9.12071092]),
 array([-10.09760097,   9.99523514,   8.95535505]),
 array([-10.11442296,   9.99424238,   8.79334067]),
 array([-10.13067955,   9.99328614,   8.63460002]),
 array([-10.14615477,   9.99239499,   8.47906559]),
 array([-10.16068739,   9.99159076,   8.32667046]),
 array([-10.17417369,   9.99088811,   8.17734854]),
 array([-10.1865636 ,   9.99029495,   8.0310349 ]),
 array([-10.19785251,   9.98981338,   7.88766596]),
 array([-10.20807097,   9.98944101,   7.74717971]),
 array([-10.21727426,   9.9891722 ,   7.60951569]),
 array([-10.22553314,   9.98899925,   7.47461506]),
 array([-10.23292635,   9.9889133 ,   7.34242054]),
 array([-10.23953499,   9.98890514,   7.21287635]),
 array([-10.24543859,   9.98896558,   7

What's the value of try_a_b_c at this starting point?

In [21]:
try_a_b_c(-10,  10,  10)

84.95009680793008

Iterate one step with gradient_descent_iterate. What does it suggest
is a better point (with a lower error)?

In [22]:
try_a_b_c(-10.01458305,   9.99951116,   9.81691132)

81.609666142906022

What's the value of try_a_b_c at this point?

In [24]:
try_a_b_c(-10.38670787,   9.92848139,   0.86178051)

1.1132358439197028

What about if you iterate 100 times? Or 1000 times? Try some different
learning rates.

Try to guess the values of a, b and c . Evaluate try_a_b_c at this point

Use underlying_function with the a,b and c you found, and use it to 
make predictions for signal from seconds. Plot the results. How well does it compare?