# 2. Multiple Linear Regression

For `Multiple Linear Regression` (with more than one independent variable), the hypothesis function expands to:

$y = β₀ + β₁x₁ + β₂x₂ + ... + βₖxₖ$

Where:

- $ x₁, x₂, ..., xₖ $ are the **independent variables**.  
- $ β₀ $ is the **intercept**.  
- $ β₁, β₂, ..., βₖ $ are the **coefficients**, representing the influence of each respective independent variable on the predicted output.



## Multicollinearity

Multicollinearity arises when two or more independent variables are highly correlated with each other. This can make it difficult to find the individual contribution of each variable to the dependent variable.

To detect multicollinearity we can use:

- **Correlation Matrix**: A correlation matrix helps to find relationships between independent variables. High correlations (close to 1 or -1) suggest multicollinearity.
- **VIF** (Variance Inflation Factor): VIF quantifies how much the variance of a regression coefficient increases if predictors are correlated. A high VIF typically above 10 indicates multicollinearity.

## Assumptions of Multiple Regression Model
Similar to simple linear regression we have some assumptions in multiple linear regression which are as follows:

1. Linearity: Relationship between dependent and independent variables should be linear.
2. Homoscedasticity: Variance of errors should remain constant across all levels of independent variables.
3. Multivariate Normality: Residuals should follow a normal distribution.
4. No Multicollinearity: Independent variables should not be highly correlated.

---

## 2. The Math Way: Least-Squares Estimation

The hypothesis function is: $y = β₀ + β₁x₁ + β₂x₂ + ... + βₖxₖ$

Assuming that the independent variables are: $\vec{x_i} = [x₁^i, x₂^i, ..., x_m^i]$

and the model's parameters are: $\vec{β} = [β₀, β₁, ..., β_m]$

then the model's prediction would be: $yᵢ ≈ β₀ + \sum_{j=1}^{m} βⱼ × xⱼ^i$

If $\vec{x_i}$ is extended to  $\vec{x_i} = [1, x_1^i, x_2^i, \ldots, x_m^i]$

then $y_i$ would become a dot product of the parameter and the independent vectors: $y_i \approx \sum_{j=0}^{m} \beta_j \times x_j^i = \vec{\beta} \cdot \vec{x_i}$



The goal is to **minimize the sum of mean squared loss**: $\hat{\vec{\beta}} = \arg\min_{\vec{\beta}} L(\vec{\beta}) = \arg\min_{\vec{\beta}} \sum_{i=1}^{n} (\vec{\beta} \cdot \vec{x_i} - y_i)^2$

Now, putting the independent and dependent variables in matrices $X$ and $Y$ respectively, the loss function can be rewritten as:

$
\begin{aligned}
L(\vec{\beta}) &= \|X\vec{\beta} - Y\|^2 \\
&= (X\vec{\beta} - Y)^{T}(X\vec{\beta} - Y) \\
&= Y^{T}Y - Y^{T}X\vec{\beta} - \vec{\beta}^{T}X^{T}Y + \vec{\beta}^{T}X^{T}X\vec{\beta}
\end{aligned}
$

As the loss function is convex, the optimum solution lies where the gradient equals zero.  
The gradient of the loss function is:

$
\begin{aligned}
\frac{\partial L(\vec{\beta})}{\partial \vec{\beta}}
&= \frac{\partial (Y^{T}Y - Y^{T}X\vec{\beta} - \vec{\beta}^{T}X^{T}Y + \vec{\beta}^{T}X^{T}X\vec{\beta})}{\partial \vec{\beta}} \\
&= -2X^{T}Y + 2X^{T}X\vec{\beta}
\end{aligned}
$

Setting the gradient to zero produces the optimum parameter:

$
\begin{aligned}
-2X^{T}Y + 2X^{T}X\vec{\beta} &= 0 \\
\Rightarrow X^{T}X\vec{\beta} &= X^{T}Y \\
\Rightarrow \hat{\vec{\beta}} &= (X^{T}X)^{-1}X^{T}Y
\end{aligned}
$

**Note:**  
The $\hat{\beta}$ obtained may indeed be a local minimum; to confirm, one must differentiate once more to obtain the Hessian matrix and verify that it is **positive definite**.  
This condition is guaranteed by the **Gauss–Markov theorem**.

RESUME: https://www.geeksforgeeks.org/machine-learning/ml-multiple-linear-regression-using-python

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.datasets import fetch_california_housing