Goals: Automate the process of optimizing w and b using gradient descent.

In [11]:
import math
import numpy as np
import matplotlib.pyplot as plt


In [12]:
# Load our data set
x_train= np.array([1.0, 2.0]) #features
y_train= np.array([300.0, 500.0]) #target value

In [13]:
# Function to calculate the cost value
def compute_cost(x, y, w, b):
    m = x.shape[0]
    cost = 0

    for i in range(m):
        f_wb = w*x[i] + b
        cost = cost + (f_wb - y[i])**2
    total_cost = 1/(2*m) * cost

    return total_cost
    

We'll calculate the partial derivatives of the cost function with respect to w and b:

In [19]:
def compute_gradient ( x, y, w, b):

    m = x.shape[0]
    dj_dw = 0 #J'nin w'ya göre kısmi türevi
    dj_db = 0 #J'nin b'ye göre kısmi türevi 

    for i in range (m):
        f_wb = w*x[i] + b
        dj_dw_i = ( f_wb - y[i] ) * x[i]
        dj_db_i = ( f_wb - y[i] )
        dj_dw += dj_dw_i
        dj_db += dj_db_i
    dj_dw = dj_dw/m
    dj_db = dj_db/m

    return dj_dw , dj_db

Now that gradients can be computed, The details of the implementation:
    
    Args:
      x (ndarray (m,))  : Data, m examples 
      y (ndarray (m,))  : target values
      w_in,b_in (scalar): initial values of model parameters  
      alpha (float):     Learning rate
      num_iters (int):   number of iterations to run gradient descent
      cost_function:     function to call to produce cost
      gradient_function: function to call to produce gradient
      
    Returns:
      w (scalar): Updated value of parameter after running gradient descent
      b (scalar): Updated value of parameter after running gradient descent
      J_history (List): History of cost values
      p_history (list): History of parameters [w,b] 

In [26]:
def gradient_descent ( x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function ):

    J_history = []
    p_history = []
    b = b_in
    w = w_in

    for i in range (num_iters):
         dj_dw , dj_db = gradient_function(x, y, w, b) #Calculate the gradient and update the parameters using gradient_function

         b = b - alpha*dj_db
         w = w - alpha*dj_dw #We use "gradient descent algorithm equation" to update parameters
         if(i<100000): #prevent resource exhaustion
            J_history.append ( cost_function ( x, y, w, b))
            p_history.append ([w, b])
         if i % math.ceil(num_iters/10) == 0: #print cost 
            print(f"Iteration {i:4}: Cost {J_history[-1]: 0.2e}",
                  f"dj_dw: {dj_dw: 0.3e}, {dj_db:0.3e}",
                  f" w:{w: 0.3e}, b:{b:0.5e}")
    return w, b, J_history, p_history         


In [27]:
#initialize parameters
w_init= 0
b_init= 0
#some gradient descent settings
iterations= 10000
tmp_alpha= 1.0e-2
#run gradient_descent 
w_final, b_final, J_hist, p_hist = gradient_descent(x_train, y_train, w_init, b_init, tmp_alpha, iterations, compute_cost, compute_gradient)
print(f"(w,b) found by gradient descent : ({w_final:8.4f}, {b_final:8.4f})")

Iteration    0: Cost  7.93e+04 dj_dw: -6.500e+02, -4.000e+02  w: 6.500e+00, b:4.00000e+00
Iteration 1000: Cost  3.41e+00 dj_dw: -3.712e-01, 6.007e-01  w: 1.949e+02, b:1.08228e+02
Iteration 2000: Cost  7.93e-01 dj_dw: -1.789e-01, 2.895e-01  w: 1.975e+02, b:1.03966e+02
Iteration 3000: Cost  1.84e-01 dj_dw: -8.625e-02, 1.396e-01  w: 1.988e+02, b:1.01912e+02
Iteration 4000: Cost  4.28e-02 dj_dw: -4.158e-02, 6.727e-02  w: 1.994e+02, b:1.00922e+02
Iteration 5000: Cost  9.95e-03 dj_dw: -2.004e-02, 3.243e-02  w: 1.997e+02, b:1.00444e+02
Iteration 6000: Cost  2.31e-03 dj_dw: -9.660e-03, 1.563e-02  w: 1.999e+02, b:1.00214e+02
Iteration 7000: Cost  5.37e-04 dj_dw: -4.657e-03, 7.535e-03  w: 1.999e+02, b:1.00103e+02
Iteration 8000: Cost  1.25e-04 dj_dw: -2.245e-03, 3.632e-03  w: 2.000e+02, b:1.00050e+02
Iteration 9000: Cost  2.90e-05 dj_dw: -1.082e-03, 1.751e-03  w: 2.000e+02, b:1.00024e+02
(w,b) found by gradient descent : (199.9929, 100.0116)


Now that we have discovered the optimal values for the parameters w and b, we can now use the model to predict housing values 
based on our learned parameters

In [29]:
print(f"1000 sqft house prediction {w_final*1.0 + b_final:0.1f} Thousand Dollars") # f(x)= w*x +b 
print(f"1700 sqft house prediction {w_final*1.7 + b_final:0.1f} Thousand Dollars")
print(f"2000 sqft house prediction {w_final*2.0 + b_final:0.1f} Thousand Dollars")

1000 sqft house prediction 300.0 Thousand Dollars
1700 sqft house prediction 440.0 Thousand Dollars
2000 sqft house prediction 500.0 Thousand Dollars


In this section, we delved into the details of gradient descent for a singel variable!