# Multiple Linear Regression 
Multiple linear regression is a statistical method that uses multiple independent variables to predict a single dependent variable. <br>
The goal of multiple linear regression is to find a linear relationship between the dependent variable and the independent variables.

The Model would be : 
$$
        f_w,_b(x^{->}) = w^{->} . x^{->} + b
$$
<em>This above Notation is  </em> **Vector Notation**

# 2 Problem Statement 
| Size (sqft) | Number of Bedrooms  | Number of floors | Age of  Home | Price (1000s dollars)  |   
| ----------------| ------------------- |----------------- |--------------|-------------- |  
| 2104            | 5                   | 1                | 45           | 460           |  
| 1416            | 3                   | 2                | 40           | 232           |  
| 852             | 2                   | 1                | 35           | 178           |  


In [1]:
# Train a model 
import numpy as np 
import matplotlib as mpl 
X_train = np.array([[2104, 5, 1, 45], 
                     [1416, 3, 2, 40], 
                     [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])
m = X_train.shape[0]
n = X_train.shape[1]
print(m,n)

3 4


In [2]:
# Parameters 
""" 
w - Vector or np.array 
b - scalar quantity 
""" 
b_init  =  785.1811367994083 
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])
print(f"Value of b = {b_init} \nValue of w(vector) = \n{w_init}")
print(f"w_init shape: {w_init.shape}, b_init type: {type(b_init)}")

Value of b = 785.1811367994083 
Value of w(vector) = 
[  0.39133535  18.75376741 -53.36032453 -26.42131618]
w_init shape: (4,), b_init type: <class 'float'>


In [3]:
# Single Prediction , Vector 
def predict(x, w, b): 
    """
    single predict using linear regression
    Args:
      x (ndarray): Shape (n,) example with multiple features
      w (ndarray): Shape (n,) model parameters   
      b (scalar):             model parameter 
      
    Returns:
      p (scalar):  prediction
    """
    p = np.dot(x, w) + b     
    return p    

In [4]:

def compute_cost(X,y,w,b):
    m = X.shape[0]
    n = X.shape[1]

    cost = 0.0
    
    for i in range(m):
        f_w_b_i = np.dot(X[i],w) + b
        cost += (f_w_b_i - y[i]) ** 2
    return cost/(2*m) 




In [5]:
cost = compute_cost(X_train, y_train, w_init, b_init)
print(f'Cost at optimal w : {cost}')

Cost at optimal w : 1.5578904428966628e-12


In [6]:
def compute_gradient(X,y,w,b):
    m = X.shape[0]
    n = X.shape[1]
    dj_dw = np.zeros(n)
    dj_db = 0.
    for i in range(m):
        err = (np.dot(X[i],w) + b) - y[i]
        for j in range(n):
            dj_dw[j] += err * X[i, j]
        dj_db += err
    dj_dw /= m
    dj_db /= m 
    return dj_db,dj_dw

In [7]:
#Compute and display gradient 
tmp_dj_db, tmp_dj_dw = compute_gradient(X_train, y_train, w_init, b_init)
print(f'dj_db at initial w,b: {tmp_dj_db}')
print(f'dj_dw at initial w,b: \n {tmp_dj_dw}')

dj_db at initial w,b: -1.6739251501955248e-06
dj_dw at initial w,b: 
 [-2.72623577e-03 -6.27197263e-06 -2.21745578e-06 -6.92403391e-05]


In [8]:
import copy
import math
def gradient_descent(X, y, w_in, b_in,alpha, num_iters):
    """
    Performs batch gradient descent to learn w and b. Updates w and b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)       : Updated value of parameter 
      """
    
    # number of training examples
    m = len(X)
    
   # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  #avoid modifying global w within function
    b = b_in
    
    for i in range(num_iters):

        # Calculate the gradient and update the parameters
        dj_db,dj_dw = compute_gradient(X, y, w, b)   ##None

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw               ##None
        b = b - alpha * dj_db               ##None
      
        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( compute_cost(X, y, w, b))

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i % math.ceil(num_iters/10) == 0:
            print(f"Iteration {i:4}: Cost {J_history[-1]:0.2e} w: {w}, b:{b} ")
                  # f"w: {w: 0.3e}, b:{b: 0.5e}")
        
    return w, b, J_history #return final w,b and J history for graphing

        

    

In [9]:
iters = 100000
alpha = 5.0e-7
# print(f"Iteration   Cost          w0       w1       w2       w3       b       djdw0    djdw1    djdw2    djdw3    djdb  ")
# print(f"---------|------------|--------|--------|--------|--------|--------|--------|--------|--------|--------|--------|")

w_fin,b_fin,J_hist = gradient_descent(X_train,y_train,w_init,b_init,alpha,iters)
# print(w_fin,b_fin,J_hist)

Iteration    0: Cost 5.88e-14 w: [  0.39133535  18.75376741 -53.36032453 -26.42131618], b:785.181136799409 


Iteration 10000: Cost 1.40e-16 w: [  0.39133535  18.75376741 -53.36032453 -26.42131618], b:785.1811367994089 
Iteration 20000: Cost 5.78e-17 w: [  0.39133535  18.75376741 -53.36032453 -26.42131618], b:785.1811367994089 
Iteration 30000: Cost 2.40e-17 w: [  0.39133535  18.75376741 -53.36032453 -26.42131618], b:785.1811367994089 
Iteration 40000: Cost 1.00e-17 w: [  0.39133535  18.75376741 -53.36032453 -26.42131618], b:785.1811367994089 
Iteration 50000: Cost 4.27e-18 w: [  0.39133535  18.75376741 -53.36032453 -26.42131618], b:785.1811367994089 
Iteration 60000: Cost 1.87e-18 w: [  0.39133535  18.75376741 -53.36032453 -26.42131618], b:785.1811367994089 
Iteration 70000: Cost 8.76e-19 w: [  0.39133535  18.75376741 -53.36032453 -26.42131618], b:785.1811367994089 
Iteration 80000: Cost 5.34e-19 w: [  0.39133535  18.75376741 -53.36032453 -26.42131618], b:785.1811367994089 
Iteration 90000: Cost 3.19e-19 w: [  0.39133535  18.75376741 -53.36032453 -26.42131618], b:785.1811367994089 


In [10]:
# initialize parameters
initial_w = np.zeros_like(w_init)
initial_b = 0.
# some gradient descent settings
iterations = 100000
alpha = 5.0e-7
# run gradient descent 
w_final, b_final, J_hist = gradient_descent(X_train,y_train,initial_w,initial_b,alpha,iterations)
# print(f"b,w found by gradient descent: {b_final:0.2f},{w_final} ")
# m,_ = X_train.shape
# for i in range(m):
#     print(f"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}")

Iteration    0: Cost 2.53e+03 w: [2.41334667e-01 5.58666667e-04 1.83666667e-04 6.03500000e-03], b:0.000145 
Iteration 10000: Cost 6.25e+02 w: [ 0.21700016  0.03241102 -0.10756894 -0.58002358], b:-0.01907920362447858 
Iteration 20000: Cost 5.94e+02 w: [ 0.22647014  0.06282451 -0.20441317 -0.95360462], b:-0.03083741402406834 
Iteration 30000: Cost 5.81e+02 w: [ 0.23250291  0.09224609 -0.29425217 -1.19161742], b:-0.037825654441786094 
Iteration 40000: Cost 5.75e+02 w: [ 0.23633235  0.12102008 -0.3795677  -1.34272807], b:-0.04175683097206683 
Iteration 50000: Cost 5.71e+02 w: [ 0.23874942  0.14936719 -0.46195071 -1.43813321], b:-0.043728974303773836 
Iteration 60000: Cost 5.69e+02 w: [ 0.24026117  0.17742894 -0.54242108 -1.49783092], b:-0.0444459589375747 
Iteration 70000: Cost 5.67e+02 w: [ 0.24119261  0.20529605 -0.62163265 -1.5346407 ], b:-0.04435899176198979 
Iteration 80000: Cost 5.66e+02 w: [ 0.24175211  0.2330267  -0.70000456 -1.55678024], b:-0.04375731107539368 
Iteration 90000: Co

In [11]:
predict_sample = np.zeros(m)
for i in range(m):
    predict_sample[i] = predict(X_train[i],w_fin,b_fin)
