# Multiple Linear Regression

- Here the linear regression model calculates the prediction `y_hat` by taking `n features` into account

- Each `ith` training example of the `m training examples` are accessed using `2 Dimensions [row, col]`

- Model: f(`X`) = w[i] * x[i] + ..... + w[n] * x[n] + b

- This model is run over `m training examples` to learn the data

In [1]:
# Importing the required libraries
import copy, math
import numpy as np
import matplotlib.pyplot as plt

In [2]:
# Training data
X_train = np.array([
    [2104, 5, 1, 45], 
    [1416, 3, 2, 40], 
    [852, 2, 1, 35]
])
y_train = np.array([460, 232, 178])

In [3]:
# Viewing the training data
print(f"X_train.shape: {X_train.shape}, type: {type(X_train)}")
print(f"y_train.shape: {y_train.shape}, type: {type(y_train)}")

X_train.shape: (3, 4), type: <class 'numpy.ndarray'>
y_train.shape: (3,), type: <class 'numpy.ndarray'>


In [4]:
# Setting the intial values of the parameters
w_in = np.array([0.39133535, 18.75376741, -53.36032453, -26.42131618])
b_in = 785.1811367994083

## Predicting a Value Without Vectoriasation

In [5]:
# Predicting the values without vectorisation
def predict_single_loop(x, w, b):
    
    """
    Predicts the values of using the model without any vectorisation for a single example
    Args:
        x - Features
        w, b - Parameters of the model
        
    Returns:
        y_hat - Single prediction
    """
    
    # No of training examples
    n = x.shape[0]
    y_hat = 0
    
    # Calculating the prediction using multiple values
    for i in range(n):
        p_i = w[i] * x[i]
        y_hat += p_i
    
    # Adding the constant
    y_hat += b
    return y_hat

In [6]:
# Predicting a single value using the model
vec_x = X_train[0, :]
y_hat = predict_single_loop(vec_x, w_in, b_in)

print("Single Row of Training Set: ", vec_x)
print(f"The prediction: {y_hat} \tActual Value: {y_train[0]}")

Single Row of Training Set:  [2104    5    1   45]
The prediction: 459.9999976194083 	Actual Value: 460


## Predicting a value with vectorisation

In [7]:
def predict(x, w, b):
    
    """
    Computes the prediction of the model using vectorisation
    Args:
        x - Features
        w, b - Parameters of the model
        
    Return:
        y_hat - Scalar prediction for the single training example
    """
    
    # Here np.dot() multiplies and takes the sum of all the weights
    y_hat = np.dot(x, w) + b
    
    return y_hat

In [8]:
print("Single Row of Training Set: ", vec_x)
print(f"The prediction: {y_hat} \tActual Value: {y_train[0]}")

Single Row of Training Set:  [2104    5    1   45]
The prediction: 459.9999976194083 	Actual Value: 460


## Computing the cost of the model for multiple variables

- All the code below uses vectorisation

In [9]:
def compute_cost(X, y, w, b):
    
    """
    Returns the cost of the model after multiple variable linear regression
    Args:
        X - Features
        y - Target
        w, b - Parameters of the model
        
    Returns:
        cost - Scalar cost of the model after predicting m training examples
    """
    
    # No of training examples
    m = X.shape[0]
    
    # Cost
    cost = 0
    
    # Looping to find the cost
    for i in range(m):
        
        # Prediction
        y_hat = np.dot(X[i], w) + b

        # Error
        err = (y_hat - y[i]) ** 2
        
        # Updating the cost
        cost += err
        
    # Multiplying the constant
    cost = (1 / (2 * m)) * cost
    return cost

## Computing the cost of the prechosen parameters

In [28]:
cost = compute_cost(X_train, y_train, w_in, b_in)
print("The cost of the model is: ", cost)

The cost of the model is:  1.5578904045996674e-12


## Computing the Gradient Descent

In [43]:
def compute_gradient(X, y, w, b):
    
    """
    Computes the gradient for all the n features in the dataset
    Args:
        X - Feature
        y - Targets
        w, b - Parameters of the model
        
    Returns
        dj_dw, dj_db - Gradients of all the parameters
    """
    
    # No of training examples and features
    m, n = X.shape
    
    # Initial value of the gradients
    dj_dw = np.zeros((n, ))
    dj_db = 0
    
    for i in range(m):
        
        # Cost
        error = (np.dot(w, X[i]) + b) - y[i]
        
        # Constant Gradient
        dj_db += error
        
        # Looping to get gradient of all the features
        for j in range(n):
            dj_dw[j] += error * X[i, j]
        
    # Multiplying by constant
    dj_dw /= m
    dj_db /= m
    
    return dj_dw, dj_db

In [44]:
# Checking the gradient for given weights
temp_dw, temp_db = compute_gradient(X_train, y_train, w_in, b_in)

print("The First Gradient of dw is: ", temp_dw)
print("The First Gradient of db is: ", temp_db)

The First Gradient of dw is:  [-2.72623574e-03 -6.27197255e-06 -2.21745574e-06 -6.92403377e-05]
The First Gradient of db is:  -1.6739251122999121e-06


## Computing the Gradient Descent of Multiple Variables

In [47]:
def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iterations):
    
    """
    Computes the gradient descent of multiple variables to optimise the parameters of the model
    Args:
        X - Features
        y - Target
        w, b - Parameters of the model
        cost_function - Function which returns the cost of all the training examples
        gradient_function - Function which returns the gradient of all the training examples for n features
        alpha - Learning rate
        num_iterations - No of times gradient descent is being run
        
    Returns:
        w, b - Optimised parameters of the model
    """
    
    # History of cost during gradient descent
    J_history = []
    
    # Avoid Modifying the global variable
    w = copy.deepcopy(w_in)
    b = b_in
    
    for i in range(num_iterations):
        
        # Calculating the gradients
        dj_dw, dj_db = gradient_function(X, y, w, b)
        
        # Updating the weights
        w = w - alpha * dj_dw
        b = b - alpha * dj_db
        
        # Storing the cost values
        if i < 10000:
            J_history.append(cost_function(X, y, w, b))
            
        if i % math.ceil(num_iterations / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}   ")
            
    return w, b, J_history

## Using gradient descent to find the optimal values of the parameters of the model

In [49]:
# initialize parameters
initial_w = np.zeros_like(w_in)
initial_b = 0.

# some gradient descent settings
iterations = 1000
alpha = 5.0e-7

# run gradient descent 
w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,
                                                    compute_cost, compute_gradient, 
                                                    alpha, iterations)

print(f"b,w found by gradient descent: {b_final:0.2f},{w_final} ")

Iteration    0: Cost  2529.46   
Iteration  100: Cost   695.99   
Iteration  200: Cost   694.92   
Iteration  300: Cost   693.86   
Iteration  400: Cost   692.81   
Iteration  500: Cost   691.77   
Iteration  600: Cost   690.73   
Iteration  700: Cost   689.71   
Iteration  800: Cost   688.70   
Iteration  900: Cost   687.69   
b,w found by gradient descent: -0.00,[ 0.20396569  0.00374919 -0.0112487  -0.0658614 ] 


## Making predictions

In [50]:
# Making Predictions
m = X_train.shape[0]
for i in range(m):
    print(f"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}")

prediction: 426.19, target value: 460
prediction: 286.17, target value: 232
prediction: 171.47, target value: 178
