Example:
    https://github.com/mattnedrich/GradientDescentExample/blob/master/gradient_descent_example.gif
    
The example above shows how linear regression and gradient descent combined can fit a line to data. 

We start with a random linear function, compute the error, that error value tells us how well the line fits and acts as a compass for redrawing another line. Once we get the "Best" line we finish. 

Below is an example of coding this from scratch, using the error function:
https://spin.atomicobject.com/wp-content/uploads/linear_regression_error1.png

visual for gradient descent error: 
https://spin.atomicobject.com/wp-content/uploads/gradient_descent_error_surface.png

In [13]:
from numpy import *

# this function is the code for the above error function
def error_compute(b, m, points):
    total_error = 0
    # iterate through points 
    for i in range(0, len(points)):
        x = points[i, 0]
        y = points[i, 1]
        
        # compute the difference and square for error, then add to total error: 
        total_error += (y - (m*x+b))**2
    return total_error/float(len(points)) # average error



def grad_step(b_current, m_current, points, learning_rate):
    b_gradient = 0
    m_gradient = 0
    n = float(len(points))
    for i in range(0, len(points)):
        x = points[i, 0]
        y = points[i, 1]
        # direction with respect to b and m
        # compute partial derivatives of error function
        b_gradient += (2/n)*(y - ((m_current*x) + b_current))
        m_gradient += (2/n)*x*(y-(m_current*x) + b_current)
        
    #update b and m values using partial derivatives
    new_b = b_current - (learning_rate*b_gradient)
    new_m = m_current - learning_rate*m_gradient
    
    return [new_b, new_m]


# gradient descent function:
def grad_descent(points, starting_b, starting_m, learning_rate, n_iter):
    b = starting_b 
    m = starting_m 
    
    # grad descent
    for i in range(n_iter):
        #update b and m in a step
        b, m = grad_step(b, m, array(points), learning_rate)
    return [b, m]

    
    


def run():
    # step 1: data
    points = genfromtxt('data.csv', delimiter=',') # delimiter is argument
    # for how data values are separated 
    
    # step 2: the parameters for the model
    learning_rate = .0001 # defines how fast the model should converge
    initial_b = 0   # inital slope and y-intercept for y = mx+b fit
    initial_m = 0
    n_iter = 1000 # since data is small this can be small 
    
    # step 3: train the model 
    print 'initializing gradient descent at b = {0}, m = {1} error={2}'.format(initial_b,
                                                                     initial_m,
                                                                     error_compute(initial_b, initial_m, points))
    

    [b, m] = grad_descent(points, initial_b, initial_m, learning_rate, n_iter)
    
    print 'final values are: b = {0}, m = {1} error={2}'.format(n_iter, b,
                                                                     m,
                                                                     error_compute(b, m, points))
    
    
    
if __name__=='__main__':
    run()

initializing gradient descent at b = 0, m = 0 error=5565.10783448
final values are: b = 1000, m = -9.45778571658e+173 error=-4.80825345907e+175


