# 📊 Multiple Linear Regression

---

## 🔹 What Is Multiple Linear Regression?

**Multiple Linear Regression (MLR)** is an extension of **Simple Linear Regression** that uses **two or more independent variables** (features) to estimate a **continuous dependent variable** (target). It models a linear relationship between inputs and outputs.

---

## 🧮 Regression Equation

Given a dataset with multiple features:

$$
\hat{y} = \theta_0 + \theta_1 x_1 + \theta_2 x_2 + \cdots + \theta_n x_n
$$

- \( \hat{y} \): Predicted value  
- \( \theta_0 \): Intercept (bias term)  
- \( \theta_1, \theta_2, \ldots, \theta_n \): Coefficients for each feature  
- \( x_1, x_2, \ldots, x_n \): Feature values

This equation defines:
- A **line** in 2D (simple regression)
- A **plane** in 3D
- A **hyperplane** in higher dimensions

---

## 🧠 How MLR Works

1. Inputs are organized into a matrix **X**, including a column of ones to account for the bias term \( \theta_0 \).
2. Coefficients \( \theta \) are estimated using techniques such as:
   - **Ordinary Least Squares (OLS)** (via closed-form solution with matrix algebra)
   - **Gradient Descent** (an optimization-based approach)

---

## 🧮 Example Calculation

Suppose we have:

- \( \theta_0 = 125 \), \( \theta_1 = 6.2 \), \( \theta_2 = 14 \)  
- A car with:  
  - Engine size = 2.4  
  - Cylinders = 4  

Then:

$$
\hat{y} = 125 + 6.2 \cdot 2.4 + 14 \cdot 4 = 214.1
$$

---

## 📉 Model Evaluation

Residual error for one record:

$$
\text{Residual}_i = y_i - \hat{y}_i
$$

Average squared residuals:

$$
\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2
$$

Objective: **Minimize the MSE** to find the best-fit parameters.

---

## 🧠 Advanced: Optimization with Gradient Descent

- Start with random coefficients
- Iteratively minimize error using the gradient of the loss function
- Useful for large datasets where matrix inversion (used in OLS) is expensive

---

## ⚖️ Simple vs. Multiple Linear Regression

| Aspect                | Simple Linear Regression | Multiple Linear Regression |
|-----------------------|--------------------------|-----------------------------|
| # of Features         | 1                        | 2 or more                   |
| Output                | Line                     | Plane or hyperplane         |
| Interpretability      | Very simple              | More complex                |
| Flexibility           | Limited                  | Higher modeling power       |

---

## ⚠️ Pitfalls of Multiple Linear Regression

- **Overfitting**: Too many features can make the model memorize instead of generalize.
- **Collinearity**: Features that are strongly correlated introduce instability in coefficient estimation.
- **Impossible Scenarios in What-If Analysis**: Changing one variable independently may not make sense if it's correlated with others.
- **Outliers**: Strongly affect model accuracy and coefficient estimates.

---

## 💡 Variable Selection Tips

To build a good MLR model:
- Avoid redundant (correlated) variables
- Use features that are **understood**, **controllable**, and **strongly correlated** with the target
- Encode categorical variables:
  - **Binary**: 0 = Manual, 1 = Automatic
  - **Multi-class**: Use one-hot encoding (Boolean flags)

---

## 📦 Real-World Applications

| Domain         | Example Use |
|----------------|-------------|
| **Education**  | Predict exam scores using time spent studying, attendance, anxiety levels, etc. |
| **Healthcare** | Estimate changes in blood pressure from BMI, age, and lifestyle |
| **Environment**| Predict CO₂ emissions using engine size, cylinders, and fuel consumption |
| **Finance**    | Forecast revenue using multiple economic indicators |

---

## ✅ Summary

- **MLR** is a powerful, interpretable method for predicting a continuous variable using multiple inputs.
- It's flexible but must be used with care to avoid **overfitting** and **collinearity issues**.
- Common estimation methods: **Ordinary Least Squares** and **Gradient Descent**.

