![image.png](attachment:image.png)

# 📘 Multiple Linear Regression (MLR)

---

## 🧠 Introduction

Multiple Linear Regression (MLR) predicts a continuous target variable \$y\$ using **two or more independent variables** \$x\_1, x\_2, \dots, x\_p\$. It assumes a **linear** relationship between them.

---

## 🖐️ Equation

The general equation is:

```
y = β₀ + β₁x₁ + β₂x₂ + ... + βₚxₚ + ε
```

Where:

* \$y\$ = target/output variable
* \$x\_1, x\_2, ..., x\_p\$ = features/predictors
* \$\beta\_0\$ = intercept
* \$\beta\_1, ..., \beta\_p\$ = coefficients
* \$\epsilon\$ = error term (residual)

---

## 📏 Objective

Estimate \$\beta\_0, \beta\_1, ..., \beta\_p\$ such that the **Sum of Squared Errors (SSE)** is minimized:

```
SSE = Σ(yᵢ - ŷᵢ)²
```

---

## 🧲 Matrix Form

MLR can be expressed as:

```
y = Xβ + ε
```

Where:

* \$y\$ is an \$n \times 1\$ vector of outputs
* \$X\$ is an \$n \times (p+1)\$ matrix (includes column of 1s for intercept)
* \$\beta\$ is a \$(p+1) \times 1\$ vector of coefficients
* \$\epsilon\$ is an \$n \times 1\$ vector of errors

Estimated using the **Normal Equation**:

```
β̂ = (XᵀX)⁻¹Xᵀy
```

---

## 📊 Assumptions

1. **Linearity**: Relationship between \$x\$ and \$y\$ is linear
2. **Independence**: Residuals are independent
3. **Homoscedasticity**: Constant variance of residuals
4. **Normality**: Residuals are normally distributed
5. **No multicollinearity**: Predictors not highly correlated

---

## 📊 Evaluation Metrics

**Mean Squared Error (MSE):**

```
MSE = (1/n) Σ(yᵢ - ŷᵢ)²
```

**Root MSE (RMSE):**

```
RMSE = sqrt(MSE)
```

**Mean Absolute Error (MAE):**

```
MAE = (1/n) Σ|yᵢ - ŷᵢ|
```

**R² Score:**

```
R² = 1 - [Σ(yᵢ - ŷᵢ)² / Σ(yᵢ - ȳ)²]
```

**Adjusted R²:**

```
Adjusted R² = 1 - (1 - R²) * (n - 1) / (n - p - 1)
```

Where:

* \$n\$ = number of samples
* \$p\$ = number of predictors

---

## 🌐 Coefficient Interpretation

Each coefficient \$\beta\_j\$ indicates the **change in \$y\$ per unit change in \$x\_j\$**, assuming other variables are fixed.

---

## 🚀 Python Example (Scikit-learn)

```python
from sklearn.linear_model import LinearRegression
import numpy as np

# Feature matrix with 2 predictors
X = np.array([[1, 2], [2, 3], [4, 5]])
y = np.array([3, 5, 9])

model = LinearRegression()
model.fit(X, y)

print("Intercept:", model.intercept_)
print("Coefficients:", model.coef_)
print("R^2 Score:", model.score(X, y))
```

---

## 📈 Use Cases

* Predicting house prices from size, location, etc.
* Forecasting sales from ad spend across media
* Estimating insurance cost using age, BMI, etc.

---

## ⚠️ Limitations

* **Outliers** can distort predictions
* Assumes **linear** relationships
* Sensitive to **multicollinearity**
* Risk of **overfitting** with many predictors

---

## 🧩 Extensions

* **Ridge Regression**: Adds L2 penalty
* **Lasso Regression**: Adds L1 penalty
* **ElasticNet**: Combines L1 and L2
* **Polynomial Regression**: Adds \$x^2\$, \$x^3\$, etc.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)

![image.png](attachment:image.png)