<a href="https://colab.research.google.com/github/Samarth745/ML-algo-from-scratch/blob/main/Linear_Regression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Derivation of Multiple Linear Regression**

We now turn our attention to the generalization of linear regression when there are multiple predictor variables. Let us assume that we have $ p $ independent variables $ x_1, x_2, \dots, x_p $, and a dependent variable $ y $. The multiple linear regression model can be written as

$$
y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \dots + \beta_p x_p + \epsilon
$$

where:
- $ \beta_0 $ is the intercept term,
- $ \beta_1, \beta_2, \dots, \beta_p $ are the coefficients corresponding to each independent variable $ x_1, x_2, \dots, x_p $,
- $ \epsilon $ is the error term, capturing the discrepancy between the observed and predicted values of $ y $.

This model can also be written succinctly using vector notation as:

$$
y = \mathbf{X} \boldsymbol{\beta} + \epsilon
$$

where:
- $ \mathbf{X} $ is the $ n \times (p+1) $ matrix of observations (with a column of ones to account for the intercept),
- $ \boldsymbol{\beta} = $beta_0, \beta_1, \dots, \beta_p)^T $ is the vector of coefficients, and
- $ \epsilon $ is the vector of error terms.

## **Objective**

The task is to estimate the vector $ \boldsymbol{\beta} $ by minimizing the sum of squared residuals, which is given by:

$$
S({\beta}) = \sum_{i=1}^{n} \left( y_i - \hat{y}_i \right)^2
$$

where $ \hat{y}_i $ denotes the predicted value of $ y $ based on the model, i.e.,

$$
\hat{y}_i = \beta_0 + \beta_1 x_{1i} + \dots + \beta_p x_{pi}
$$

In matrix form, this expression becomes:

$$
S({\beta}) = ({y} - {X} {\beta})^T {y} - {X} {\beta})
$$

## **Minimization via Ordinary Least Squares (OLS)**

To find the parameter vector $ \boldsymbol{\beta} $ that minimizes the residual sum of squares, we take the gradient of $ S$boldsymbol{\beta} $ with respect to $ \boldsymbol{\beta} $ and set it to zero. The gradient is given by:

$$
\frac{\partial S({\beta})}{\partial \boldsymbol{\beta}} = -2 \mathbf{X}^T ({y} - \mathbf{X} \boldsymbol{\beta})
$$

Setting this equal to zero:

$$
\mathbf{X}^T ({y} - \mathbf{X} \boldsymbol{\beta}) = 0
$$

Rearranging, we obtain the **Normal Equation**:

$$
\mathbf{X}^T \mathbf{X} \boldsymbol{\beta} = \mathbf{X}^T \mathbf{y}
$$

Assuming $ {X}^T.{X} $ is invertible, the solution to the normal equation is:

$$
{\beta} = ({X}^T \mathbf{X})^{-1} {X}^T.{y}
$$

This provides the best linear unbiased estimator for $ {\beta} $, and hence, the regression coefficients are estimated.

## **Conclusion**

The general form of multiple linear regression allows for the modeling of a linear relationship between a dependent variable and multiple independent variables. The parameter estimates are obtained by minimizing the sum of squared residuals, leading to the normal equation:

$$
{\beta} = ({X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}
$$

This expression provides the optimal solution for the regression coefficients under the ordinary least squares criterion.


In [None]:
class MyLinearRegression:
  import numpy as np
  def __init__(self):
    ## There are things that we need to find here
    ## Weights and Biases
    pass

  def fit(self, X, y):
    num_of_rows, num_of_columns = X.shape ## Getting total rows and columns
    b_ = np.ones((num_of_rows, 1)) ## Creating an array of all ones of shape (n,1)
    X_b = np.hstack((X,b_)) ## Adding b_ to X to also get the constant parameter in Linear equation
    y = y.values.reshape(-1, 1) ## Converting y into (n,1) matric

    # Solution of Linear Regression
    # $Β = (X.X^T)^{-1}.X^T.y$
    all_params = np.linalg.inv(X_b.T @ X_b) @ X_b.T @ y
    all_params = all_params.ravel() # rechanging the dimenstion to (1,n) from (n,1)
    constants = all_params[-1]
    weights = all_params[:-1]
    self.weights = weights
    self.constants = constants

  def predict(self, X):
    return np.dot(X, self.weights) + self.constants

In [None]:
myLR = MyLinearRegression()
myLR.fit(X_train, y_train)
my_pred = myLR.predict(X_test)

In [None]:
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train, y_train)
sciki_pred = lr.predict(X_test)

In [None]:
round((my_pred - sciki_pred).mean(), 10)

0.0