# Multiple Linear Regression (MLR)

## 1. Definition & Goal

Extension of Simple Linear Regression that uses **two or more independent variables** to predict a dependent variable ($Y$).

### The Hypothesis (Equation)
Instead of a line, we fit a **hyperplane** (in 3D+) to the data.

$$y = \beta_0 + \beta_1x_1 + \beta_2x_2 + ... + \beta_kx_k + \epsilon$$

* **Goal:** Find the best linear combination of inputs ($X$) to predict $Y$ by minimizing the error.



---

## 2. Calculating Parameters (The Normal Equation)

For MLR, we generally use linear algebra (Matrix operations) to find the coefficients in one step.

### The Equation
$$\beta = (X^T X)^{-1} X^T Y$$

* **$X$**: Matrix of features (dimensions: $n \times k$).
* **$Y$**: Vector of target values (dimensions: $n \times 1$).
* **$\beta$**: Vector of coefficients (weights).
* **$X^T$**: Transpose of X.
* **$^{-1}$**: Inverse of the matrix.

---

## 3. Correlation vs. Regression

It is vital to distinguish between relationship strength and prediction.

| Metric | Purpose |
| :--- | :--- |
| **Correlation** ($\rho$) | Measures the **strength and direction** of a relationship. (Range: -1 to 1). Does not imply causation. |
| **Regression** | Uses the relationship to **predict** $Y$ based on $X$. Quantifies the impact ( "Increasing $X$ by 1 increases $Y$ by 5"). |

---

## 4. The Critical Problem: Multicollinearity

Occurs when independent variables are **highly correlated with each other** ( "Age" and "Year Born").

### Why is it a problem?
1.  **Unstable Coefficients:** Small changes in data lead to wild swings in weights ($\beta$).
2.  **Loss of Interpretability:** You can't tell which feature is actually driving the prediction.
3.  **Overfitting:** The model fits the noise, not the signal.

### Detection: VIF (Variance Inflation Factor)
* **Rule of Thumb:**
    * $VIF = 1$: No correlation.
    * $VIF > 5$: High multicollinearity (Warning).
    * $VIF > 10$: Severe multicollinearity (Fix immediately).

### Solutions
1.  **Remove:** Drop one of the correlated features.
2.  **PCA (Principal Component Analysis):** Combine features into new uncorrelated components.
3.  **Regularization:** Use Lasso (L1) or Ridge (L2) regression to penalize large weights.

---

## 5. Handling Categorical Data: Dummy Variables

Computers can't read text ( "Male", "Female", "France"). We must convert them to numbers.

### Method: One-Hot Encoding
Creating binary columns (0 or 1).

**Example:**
* **Original:** Feature `Gender` $\rightarrow$ {Male, Female}
* **Converted:**
    * `Is_Male` = 1 (if Male), 0 (if Female)
    * `Is_Female` = 0 (if Male), 1 (if Female)

> ** The Dummy Variable Trap:**
> You must always drop **one** column to avoid perfect multicollinearity (because `Is_Male` + `Is_Female` = 1).
> * **Rule:** If you have $C$ categories, use $C-1$ dummy variables.

---

## 6. Pre-processing: Feature Scaling

Since MLR involves combining different features ( "Salary" in thousands vs. "Age" in years), scaling is crucial for numerical stability.

### A. Standardization (Z-Score)
Centers data around 0 with a standard deviation of 1.
$$x' = \frac{x - \mu}{\sigma}$$
* **Best for:** When data follows a Gaussian (Bell curve) distribution.

### B. MinMax Scaling (Normalization)
Squishes data between 0 and 1.
$$x' = \frac{x - \min(x)}{\max(x) - \min(x)}$$
* **Best for:** Neural Networks or when data does not follow a normal distribution.

---

## 7. Performance Metrics (Evaluation)

### A. Adjusted $R^2$ (The Upgrade)
Standard $R^2$ **always increases** when you add a new feature, even if that feature is junk. Adjusted $R^2$ fixes this.

$$R^2_{adj} = 1 - (1-R^2) \frac{n-1}{n-k-1}$$

* $n$: Number of samples.
* $k$: Number of predictors.
* **Logic:** It penalizes the score if you add useless variables. If $R^2_{adj}$ drops, remove the feature.

### B. F-Statistic (Global Test)
Tests if the **entire model** is statistically significant.
* **Null Hypothesis ($H_0$):** All coefficients are zero ($\beta_1 = \beta_2 = ... = 0$).
* **Interpretation:** A high F-statistic (with low p-value) means *at least one* variable is related to $Y$.

### C. P-Values (Individual Test)
Tests if a **specific feature** is significant.
* **$p < 0.05$:** Feature is significant (Keep it).
* **$p > 0.05$:** Feature is likely noise (Consider removing it).

In [None]:
from sklearn.linear_model import LinearRegression
import numpy as np

X = np.array([
    [1, 2],
    [2, 1],
    [3, 4],
    [4, 3],
    [5, 5]
])

# Target (Y)
Y = np.array([5, 6, 7, 10, 11])

# Create model
model = LinearRegression()

# Train model
model.fit(X, Y)

print("Coefficients (b1, b2):", model.coef_)  # Slopes
print("Intercept (b0):", model.intercept_)

# Predict for new values
new_data = np.array([[6, 2]])
prediction = model.predict(new_data)
print("Prediction:", prediction)


Coefficients (b1, b2): [ 1.77777778 -0.22222222]
Intercept (b0): 3.1333333333333346
Prediction: [13.35555556]


In [2]:
import numpy as np

# Dataset
X = np.array([
    [1, 2],
    [2, 1],
    [3, 4],
    [4, 3],
    [5, 5]
])

Y = np.array([5, 6, 7, 10, 11]).reshape(-1, 1)

# Add bias column (intercept)
X_b = np.c_[np.ones((X.shape[0], 1)), X]

# Normal Equation
beta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(Y)

print("Intercept (b0):", beta[0][0])
print("Coefficients (b1, b2):", beta[1:].flatten())

# Predict
new_data = np.array([1, 6, 2]).reshape(1, -1)  # 1 (for intercept), x1=6, x2=2
prediction = new_data.dot(beta)
print("Prediction:", prediction)


Intercept (b0): 3.133333333333321
Coefficients (b1, b2): [ 1.77777778 -0.22222222]
Prediction: [[13.35555556]]
