# Chapter 8: Gradient Descent

Suppose we have some function `f` that takes as input a vector of real numbers and outputs a single real number.

One simple such function is:

In [13]:
def sum_of_squares(v):
    """ computes the sum of squared elements in v """
    return sum(v_i ** 2 for v_i in v)

We'll frequently need to maximize (or minimize) such functions. That is, we need to find the input `v` that produces the largest (or smallest) possible value.

For functions like ours, the _gradient_ gives the input direction in which the function most quickly increases.

Accordingly, one approach to maximizing a function is to pick a random starting point, compute the gradient, take a small step in the directin of the gradient, and repeat with the new starting point. 

### Estimating the Gradient:

If `f` is a function of one variable, its derivative at a point `x` measures how `f(x)` changes when we make a very small change to `x`. It is defined as the limit of the difference quotients:

In [14]:
def difference_quotient(f, x, h):
    return (f(x + h) - f(x)) / h

as `h` approches zero.

The derivative is the slope of the tangent line at $(x, f(x))$, while the difference quotient is the slope of the not-quite-tangent line that runs through $(x + h, f(x + h))$. As $h$ gets smaller and smaller, the not-quite-tangent line gets closer and closer to the tangent line.

For many functions it's easy to exactly calculate derivatives. For example, the `square` function:

In [15]:
def square(x):
    return x * x

has the derivative:

In [16]:
def derivative(x):
    return 2 * x

which you can check explicitly computing the difference quotient and taking the limit.

What if you couldn't find the gradient? We can estimate the derivatives by evaluating the difference quotient for a very small `e`.

In [20]:
def show_example(x):
    derivative_estimate = lambda x: difference_quotient(square, x, h=0.00001)

    # plot to show they're basically the same 
    import matplotlib.pyplot as plt
    x = range(-10, 10)
    plt.title("Actual Derivatives vs. Estimates")
    plt.plot(x, map(derivative, x), 'rx')
    plt.plot(x, map(derivative_estimate, x), 'b+')
    plt.legend(loc=9)
    plt.show()