
----

## Multiple Linear Regression Algorithm Implementation

Multiple linear regression finds linear dependence between each feature and the target. A function fits to the data while training.

1. Function to adjust:

$$ f_{\vec{w},b}(\vec{x}^{(i)}) = \vec{w}\cdot \vec{x}^{(i)} + b = w_1x_1^{(i)} + w_2x_2^{(i)} + \cdots + w_nx_n^{(i)} + b $$

2. Cost function for minimization: Squared Error Cost Function.

$$ J(\vec{w}, b) = \frac{1}{2m} \sum _{i=1}^m  \left( f_{\vec{w},b}(\vec{x}^{(i)}) - y^{(i)} \right)^2$$

3. Gradients:
$$ \frac{\partial J(\vec{w},b)}{\partial w_j} = \frac{1}{m}\sum_{i=1}^m(f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)}) x_j^{(i)} $$
$$ \frac{\partial J(\vec{w},b)}{\partial b} = \frac{1}{m}\sum_{i=1}^m(f_{\vec{w},b}(\vec{x}^{(i)})-y^{(i)}) $$

4. Gradient descent algorithm:
<div style="margin-left: 44px;">
Repeat until convergence:
</div>

$$ w_j=w_j-\alpha \frac{\partial J(\vec{w},b)}{\partial w_j} $$
$$ b=b-\alpha \frac{\partial J(\vec{w},b)}{\partial b} $$

*Simultaneously update for $w_j \hspace{1mm} (j=1,...,n) $ and $b$*




**Notation**


| **Regression** |     Description    |  Python  |    
| -- | -- | -- | 
|  $\mathbf{X}$ | training example matrix                  | `X_train` or `X` |   
|  $\mathbf{y}$  | training example  targets                | `y_train` |
|  $\mathbf{x}^{(i)}$, $y^{(i)}$ | $i_{th}$ Training Example | `X[i]`, `y[i]` or `x`, `y`|
| m | number of training examples | `m`|
| n | number of features in each example | `n`|
|  $\mathbf{w}$  |  parameter: weight                      | `w`    |
|  $b$           |  parameter: bias                                           | `b`    |     
| $\alpha$ | learning rate | `alpha` |
| $f_{\mathbf{w},b}(\mathbf{x}^{(i)})$ | The result of the model evaluation at $\mathbf{x^{(i)}}$ parameterized by $\mathbf{w},b$ | `f_wb` | 

In [None]:
import numpy as np

def linear_model(X, w, b):
    '''
    Computes the model for a set of training examples

    Args:
        X (ndarray): Shape (m,n) m training examples with n features
        w (ndarray): Shape (n,) n features weights
        b (scalar): bias parameter
    
    Returns:
        f_wb (ndarray): Shape (m,) predicted outputs
    
    '''
    f_wb = np.dot(X,w) + b

    return f_wb


def compute_cost(X, y, w, b):
    '''
    Computes squared error cost function for the given training set

    Args:
        X (ndarray): Shape (m,n) training examples with n features
        w (ndarray): Shape (n,) n features weights
        y (ndarray): Shape (m,) true outputs
        b (scalar): bias parameter
    
    Returns:
        J_wb (scalar): cost function value
    '''

    # Training examples
    m = X.shape[0]

    loss = linear_model(X, w, b) - y
    J_wb = (1./2.*m) * np.sum(loss**2)

    return J_wb

def compute_gradients(X, y, w, b):
    '''
    Computes gradients for each weight and bias

    Args:
        X (ndarray): Shape (m,n) training examples with n features
        w (ndarray): Shape (n,) n features weights
        y (ndarray): Shape (n,) true outputs
        b (scalar): bias parameter
    
    Returns:
        dJ_dw (ndarray): Shape (n,) weight gradients
        dJ_db (scalar): bias gradient
    '''    
    m = X.shape[0]

    loss = linear_model(X,w,b) - y
    dJ_dw = (1./m) * np.sum(np.reshape(loss, (loss.shape[0],1)) * X, axis=0)
    dJ_db = (1./m) * np.sum(loss)

    return dJ_dw, dJ_db

def gradient_descent(X, y, w_init, b_init, max_iter=1000, alpha=1.e-6, epsilon=1.e-3):
    '''
    Implements gradient descent algorithm 
    Args:
        X (ndarray): Shape (m,n) training examples with n features
        y (ndarray): Shape (m,) true outputs
        w_init (ndarray): Shape (n,) n features weights
        b_init (ndarray): Shape (n,) true outputs
        b (scalar): bias parameter
        max_inter (scalar): maximun number of gradient descent steps
        alpha (scalar): learning rate
        epsilon (scalar): defines convergence. Cost function difference between two consecutive interations 
    
    Returns:
        w (ndarray): Shape (n,) optimized features weights
        b (scalar): optimized bias  
    '''
    iter = 0

    # Initializating parameters
    w = np.copy(w_init)
    b = np.copy(b_init)

    J_wb = 0.
    J_hist = []
    
    # Repeating gradient descent algorithm until max_iter or convergence 
    while iter <= max_iter:
        iter += 1

        # Gradients
        dJ_dw, dJ_db = compute_gradients(X, y, w, b)
        
        # Update parameters
        w -= alpha * dJ_dw
        b -= alpha * dJ_db

        # Cost function with updated parameters
        J_wb_curr = compute_cost(X, y, w, b)
        J_hist.append(J_wb_curr)

        # Check if convergence achieved
        consecutive_diff = abs(J_wb_curr - J_wb)
        if consecutive_diff <= epsilon:
            print(f"Convergence achieved in {iter} iterations.")
            break
        
        J_wb = J_wb_curr
    else:
        print(f"Convergence not achieved with {max_iter} iteration.")    
    
    return w, b, J_hist

Author: Alexander Burgos

Fecha: 2025-02-10