# GRADIENT DESCENT FOR MULTIPLE LINEAR REGRESSION 

1.1 Goals
Extend our regression model routines to support multiple features
Extend data structures to support multiple features
Rewrite prediction, cost and gradient routines to support multiple features
Utilize NumPy np.dot to vectorize their implementations for speed and simplicity

In [71]:
import copy , math 

import numpy as np 
np.set_printoptions(precision=2) # reduced display precision on numpy arrays 




<a name="toc_15456_1.3"></a>
## 1.3 Notation
Here is a summary of some of the notation you will encounter, updated for multiple features.  

|General <img width=70/> <br />  Notation  <img width=70/> | Description<img width=350/>| Python (if applicable) |
|: ------------|: ------------------------------------------------------------||
| $a$ | scalar, non bold                                                      ||
| $\mathbf{a}$ | vector, bold                                                 ||
| $\mathbf{A}$ | matrix, bold capital                                         ||
| **Regression** |         |    |     |
|  $\mathbf{X}$ | training example matrix                  | `X_train` |   
|  $\mathbf{y}$  | training example  targets                | `y_train` 
|  $\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$Training Example | `X[i]`, `y[i]`|
| m | number of training examples | `m`|
| n | number of features in each example | `n`|
|  $\mathbf{w}$  |  parameter: weight,                       | `w`    |
|  $b$           |  parameter: bias                                           | `b`    |     
| $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ | The result of the model evaluation at $\mathbf{x^{(i)}}$ parameterized by $\mathbf{w},b$: $f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)}+b$  | `f_wb` | 


In [72]:
X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])

# data is stored in numpy array/matrix 

print(f"X shape : {X_train.shape},X type :{type(X_train)}")
print(X_train)
print(f"y shape : {y_train.shape} y Type : {type(y_train)}" )

print(y_train)

X shape : (3, 4),X type :<class 'numpy.ndarray'>
[[2104    5    1   45]
 [1416    3    2   40]
 [ 852    2    1   35]]
y shape : (3,) y Type : <class 'numpy.ndarray'>
[460 232 178]


In [73]:
b_init = 785.1811367994083 
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])

print(f"winit shape : {w_init.shape}, b_init type {type(b_init)}")

winit shape : (4,), b_init type <class 'float'>


In [74]:
# let's use for loop to make prediction in our multiple linear regression 


def predic_single_loop(x ,w , b): 
    "prediction made with for loop function "
    m = x.shape[0]
    prediction_value = 0

    for i in range(m): 
        prediction_value += x[i]*w[i] 
    total_predicted_value = prediction_value + b
    return total_predicted_value






In [75]:
# get a row from our training data 

x_vec = X_train[0,:]

# make a prediction for that line 

f_wb = predic_single_loop(x_vec, w_init,b_init)
print(f"value of x vect : {x_vec}")
print(f"f_wb shape {f_wb.shape} , prediction : {f_wb}")

value of x vect : [2104    5    1   45]
f_wb shape () , prediction : 459.9999976194083


In [76]:
# Single prediction vector with numpy 


def predict(x , w ,b): 

    "make prediction using dot function from numpy "

    p = np.dot(x , w) + b 

    return p 

In [77]:
# get a row from our training data 
b_init = 785.1811367994083 
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])

x_train_first_row = X_train[0,:]
print(f"x_train_first_row shape { x_train_first_row.shape}, x_vec value : {x_train_first_row}")


# make a prediction 

f_wb = predict(x_train_first_row, w_init, b_init)
print(f"prediction  : {f_wb}")

x_train_first_row shape (4,), x_vec value : [2104    5    1   45]
prediction  : 459.9999976194083


<a name="toc_15456_4"></a>
# 4 Compute Cost With Multiple Variables
The equation for the cost function with multiple variables $J(\mathbf{w},b)$ is:
$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 \tag{3}$$ 
where:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b  \tag{4} $$ 


In contrast to previous labs, $\mathbf{w}$ and $\mathbf{x}^{(i)}$ are vectors rather than scalars supporting multiple features.

In [78]:
def compute_cost(x,  y , w ,b ): 

    m = x.shape[0]
    cost = 0.0

    for i in range(m):
        f_wb_i = np.dot(x[i], w) +b 
        cost += (f_wb_i - y[i])**2
    cost /= (2*m)

    return cost 






In [79]:
# compute and display cost using our pre-chosen optimal parameters 

cost = compute_cost(X_train, y_train, w_init, b_init)

print(f'Cost at optimal w : {cost}')

Cost at optimal w : 1.5578904428966628e-12


<a name="toc_15456_5.1"></a>
## 5.1 Compute Gradient with Multiple Variables
An implementation for calculating the equations (6) and (7) is below. There are many ways to implement this. In this version, there is an
- outer loop over all m examples. 
    - $\frac{\partial J(\mathbf{w},b)}{\partial b}$ for the example can be computed directly and accumulated
    - in a second loop over all n features:
        - $\frac{\partial J(\mathbf{w},b)}{\partial w_j}$ is computed for each $w_j$.
   

In [80]:
def compute_gradient(x , y ,w ,b ): 
    m, n = x.shape
    dj_dw = np.zeros((n,))
    dj_db = 0.
    for i in range(m):

        err = (np.dot(x[i], w) +b ) - y[i]
        for j in range(n):
            dj_dw[j] = dj_dw[j] + err*x[i , j]
        dj_db = dj_db + err 
    dj_dw = dj_dw / m 
    dj_db = dj_db / m 

    return  dj_db , dj_dw





In [81]:
# implementation de la regression lineaire pour comprendre la difference entre les deux 



# N1 regression lineaire pour une seule variable 

import numpy as np 


x = np.array([1,2,3])
y = np.array([2,4,6])

w = 0.5 
b = 1.0

# Fonction des calculs des gradients 

def compute_gradient_single_variable(x , y ,w , b):
    m = len(x)
    dj_dw =0.
    dj_db = 0. 

    for i in range(m):
        err = ((w*x[i] +b)- y[i])
        dj_dw  += err * x[i]
        dj_db += err

    # Moyenne des gradients 

    dj_dw = dj_dw / m 
    dj_db = dj_db / m 

    return dj_dw , dj_dw 


# Exemple d'utilitsation 


dj_dw , dj_db = compute_gradient_single_variable(x , y , w , b)


print("Gradient pour le poids : ",dj_dw)
print("Gradient pour le biais : ", dj_db)

Gradient pour le poids :  -5.0
Gradient pour le biais :  -5.0


In [88]:
def compute_gradient_mutiple_variable(x , y ,w ,b ): 


    m,n = x.shape 

    dj_dw = np.zeros((n,))
    dj_db = 0.


    for i in range(m):


        err =( np.dot(x[i], w) + b ) - y[i]

        for j in range(n): 


            dj_dw[j] += err * x[i, j]
        dj_db += err 
        # mettre a jour le biais 


      

    dj_dw = dj_dw / m 

    dj_db = dj_db / m 

    return dj_dw , dj_db  


     









In [89]:
# compute and display gradient 


tmp_dj_db, tmp_dj_dw = compute_gradient_mutiple_variable(X_train, y_train, w_init, b_init)
print(f'dj_db at initial w,b: {tmp_dj_db}')
print(f'dj_dw at initial w,b: \n {tmp_dj_dw}')

dj_db at initial w,b: [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05]
dj_dw at initial w,b: 
 -1.6739251501955248e-06


# 5.2 GRADIENT DESCENT WITH MULTIPLE VARIABLES 

In [95]:
def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): 
    """
    Performs batch gradient descent to learn w and b. Updates w and b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)       : Updated value of parameter 
      """
    
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  #avoid modifying global w within function
    b = b_in
    
    for i in range(num_iters):

        # Calculate the gradient and update the parameters
        dj_db,dj_dw = gradient_function(X, y, w, b)   ##None

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw               ##None
        b = b - alpha * dj_db               ##None
      
        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(X, y, w, b))

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}   ")
        
    return w, b, J_history #return final w,b and J history for graphing

In [96]:
# initialize parameters
initial_w = np.zeros_like(w_init)
initial_b = 0.
# some gradient descent settings
iterations = 1000
alpha = 5.0e-7

w_final, b_final, J_hist  = gradient_descent(X_train, y_train,initial_w, initial_b,compute_cost, compute_gradient_mutiple_variable, alpha, iterations)

TypeError: unsupported format string passed to numpy.ndarray.__format__

In [97]:
def compute_cost(X, y, w, b): 
    """
    compute cost
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      cost (scalar): cost
    """
    m = X.shape[0]
    cost = 0.0
    for i in range(m):                                
        f_wb_i = np.dot(X[i], w) + b           #(n,)(n,) = scalar (see np.dot)
        cost = cost + (f_wb_i - y[i])**2       #scalar
    cost = cost / (2 * m)                      #scalar    
    return cost

In [98]:
def compute_gradient(X, y, w, b): 
    """
    Computes the gradient for linear regression 
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape           #(number of examples, number of features)
    dj_dw = np.zeros((n,))
    dj_db = 0.

    for i in range(m):                             
        err = (np.dot(X[i], w) + b) - y[i]   
        for j in range(n):                         
            dj_dw[j] = dj_dw[j] + err * X[i, j]    
        dj_db = dj_db + err                        
    dj_dw = dj_dw / m                                
    dj_db = dj_db / m                                
        
    return dj_db, dj_dw

In [99]:
def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): 
    """
    Performs batch gradient descent to learn w and b. Updates w and b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)       : Updated value of parameter 
      """
    
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  #avoid modifying global w within function
    b = b_in
    
    for i in range(num_iters):

        # Calculate the gradient and update the parameters
        dj_db,dj_dw = gradient_function(X, y, w, b)   ##None

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw               ##None
        b = b - alpha * dj_db               ##None
      
        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(X, y, w, b))

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}   ")
        
    return w, b, J_history #return final w,b and J history for graphing

In [100]:
# initialize parameters
initial_w = np.zeros_like(w_init)
initial_b = 0.
# some gradient descent settings
iterations = 1000
alpha = 5.0e-7
# run gradient descent 
w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,
                                                    compute_cost, compute_gradient, 
                                                    alpha, iterations)
print(f"b,w found by gradient descent: {b_final:0.2f},{w_final} ")
m,_ = X_train.shape
for i in range(m):
    print(f"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}")

Iteration    0: Cost  2529.46   
Iteration  100: Cost   695.99   
Iteration  200: Cost   694.92   
Iteration  300: Cost   693.86   
Iteration  400: Cost   692.81   
Iteration  500: Cost   691.77   
Iteration  600: Cost   690.73   
Iteration  700: Cost   689.71   
Iteration  800: Cost   688.70   
Iteration  900: Cost   687.69   
b,w found by gradient descent: -0.00,[ 0.2   0.   -0.01 -0.07] 
prediction: 426.19, target value: 460
prediction: 286.17, target value: 232
prediction: 171.47, target value: 178


In [102]:
def compute_cost( x , y , w ,b): 

    m = x.shape[0]
    f_wb = 0.0
    for i in range(m) : 
        f_wb_i = (np.dot(x[i], w) +b)

        f_wb += (f_wb_i - y[i])**2
    
    total_cost = f_wb / (2*m)

    return total_cost 



In [103]:
def compute_gradient(x , y , w, b):
    m, n = x.shape 

    dj_dw = np.zeros(n)
    dj_db = 0.0

    for i in range(m):
        err =( np.dot(x[i], w) +b ) - y[i]

        for j in range(n):
            dj_dw[j] += err * x[i, j]
        
        dj_db += err

    dj_db = dj_db / m 
    dj_dw = dj_dw / m 

    return dj_dw , dj_db



         
               
               
            
        

In [104]:
#Compute and display gradient 
tmp_dj_db, tmp_dj_dw = compute_gradient(X_train, y_train, w_init, b_init)
print(f'dj_db at initial w,b: {tmp_dj_db}')
print(f'dj_dw at initial w,b: \n {tmp_dj_dw}')

dj_db at initial w,b: [-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05]
dj_dw at initial w,b: 
 -1.6739251501955248e-06


In [None]:
def gradient_descent(x , y , w_in ,b_in , cost_function , gradient_function , alpha , num_iters ) : 
    """   Performs batch gradient descent to learn w and b. Updates w and b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)"""
    

    

In [113]:
# comprendre deep copy 

# c'est une fonction qui permet de garder les valeurs authentiques 


# example without deepcopy 
a = [1, 8 , 89]
b = a 

b[0]= 789

print(f"exemple sans la fonction deep_copy : {b} \n et ceci est la valeur initiale de a : {a}")

print("Utilisation avec deepcopy ")

tab_1 = [1, 78, 96]




tab_3 = copy.deepcopy(tab_1)


tab_3[0] = 2356

print(f"tab_1 values rest inchanged : {tab_1} \n tab_3 value got changed : {tab_3}")




exemple sans la fonction deep_copy : [789, 8, 89] 
 et ceci est la valeur initiale de a : [789, 8, 89]
Utilisation avec deepcopy 
tab_1 values rest inchanged : [1, 78, 96] 
 tab_3 value got changed : [2356, 78, 96]
