<h1>Gradient Descent </h1>

function that has as input a vector and as output a real number

In [1]:
def sum_of_squares(v):
    return sum(v_i **2 for v_i in v)

In [2]:
sum_of_squares([10,3,5])

134

<p> <i> Gradient: </i> the vector of
partial derivatives) gives the input direction in which the function most quickly increases.

<h2> Estimate the Gradient </h2>

If f is a function of one variable, its derivative at a point x measures how f(x) changes
when we make a very small change to x . It is defined as the limit of the difference
quotients as h approaches 0

In [3]:
def difference_quotient(f, x, h):
    return (f(x + h) - f(x)) / h

In [32]:
def quadPlus3(x):
    return x**2 +3

In [33]:
a=5
h=0.01
print("{:.2f}:  {:.2f}".format(a,quadPlus3(a)))
print("{:}:  {:.2f}".format(a+h,quadPlus3(a+h)))
print("{:.2f}".format(difference_quotient(quadPlus3,a,h)))

a=7
print("{:.2f}:  {:.2f}".format(a,quadPlus3(a)))
print("{:}:  {:.2f}".format(a+h,quadPlus3(a+h)))
print("{:.2f}".format(difference_quotient(quadPlus3,a,h)))



5.00:  28.00
5.01:  28.10
10.01
7.00:  52.00
7.01:  52.14
14.01


In [6]:
def square(x):
    return x * x

NameError: name 'partial' is not defined

<h4> Partial Derivatives </h4>

We calculate its ith partial derivative by treating it as a function of just its ith variable,
holding the other variables fixed:

In [34]:
def partial_difference_quotient(f, v, i, h):
    """compute the ith partial difference quotient of f at v"""

    w = [v_j + (h if j == i else 0)
    # add h to just the ith element of v
    for j, v_j in enumerate(v)]
    return (f(w) - f(v)) / h

In [8]:
def estimate_gradient(f, v, h=0.00001):
    return [partial_difference_quotient(f, v, i, h)    for i, _ in enumerate(v)]  

<h2> Using the Gradient </h2>

In [9]:
import random
def distance(x,y):
    d=[]
    for x_i, y_i in zip(x,y):
        d.append(abs(x_i - abs(y_i)))
    return sum(d)

def step(v, direction, step_size):
    """move step_size in the direction from v"""
    return [v_i + step_size * direction_i for v_i, direction_i in zip(v, direction)]

def sum_of_squares_gradient(v):
    return [2 * v_i for v_i in v]

# pick a random starting point
v = [random.randint(-10,10) for i in range(3)]
print (v)
tolerance = 0.0000001
while True:
    gradient = sum_of_squares_gradient(v)# compute the gradient at v
    next_v = step(v, gradient, -0.01)# take a negative gradient step
    if distance(next_v, v) < tolerance: # stop if we're converging
        break
    v = next_v #continue if not
v    



[-9, -5, -6]


[-2.2688024792639888e-08, -1.2604458218133295e-08, -1.512534986175991e-08]

In [10]:
step_size= -0.01
v2=[3,4,5] 
print("sum of squares gradient of v2: {:}".format(sum_of_squares_gradient(v2)))

step(v2, sum_of_squares_gradient(v2), step_size)

sum of squares gradient of v2: [6, 8, 10]


[2.94, 3.92, 4.9]

<p > <span style="color:purple" > rewrite the `step(v,direction, step_size)` function  </span>

t is possible that certain step sizes will result in invalid inputs for our function. So we’ll
need to create a “safe apply” function that returns infinity (which should never be the
minimum of anything) for invalid inputs:

In [11]:
def safe(f):
    """return a new function that's the same as f,except that it outputs infinity whenever f produces an error"""
    def safe_f(*args, **kwargs):
        try:
            return f(*args, **kwargs)
        except:
            return float('inf')# this means "infinity" in Python
        return safe_f

putting all together

In [12]:
def minimize_batch(target_fn, gradient_fn, theta_0, tolerance=0.000001):
    """use gradient descent to find theta that minimizes target function"""
    step_sizes = [100, 10, 1, 0.1, 0.01, 0.001, 0.0001, 0.00001]
    theta = theta_0                  #set to initial value
    target_fn = safe(target_fn)      #safe version of target_fn
    value = target_fn(theta)   #value we re minimizing

    while True:
        gradient = gradient_fn(theta)
        next_thetas = [step(theta, gradient, -step_size) for step_size in step_sizes]
# choose the one that minimizes the error function
        next_theta = min(next_thetas, key=target_fn)
        next_value = target_fn(next_theta)
# stop if we're "converging"
    if abs(value - next_value) < tolerance:
        return theta
    else:
        theta, value = next_theta, next_value