# üöÄ Regression Models Encyclopedia: Which One Should I Choose?

In Data Science, there is no "Single Best Model." You must choose the right weapon for the battle. Here is your complete guide.

---

## 1. The Basic Models (No Punishment)
The goal of these models is simply to minimize the **Error** (Residuals). They do not care if the math gets complex.

### üîπ 1. Simple Linear Regression
* **The Logic:** "One Input, One Output." Draws a straight line.
* **The Formula:** $$y = b_0 + b_1x$$
* **Real World Example:** Predicting a child's **Height** based ONLY on their **Age**.
* **When to use:** When you have a single feature and the relationship looks straight.

### üîπ 2. Multiple Linear Regression
* **The Logic:** "Many Inputs, One Output."
* **The Formula:** $$y = b_0 + b_1x_1 + b_2x_2 + ... + b_nx_n$$
* **Critical Rule:** You MUST use **StandardScaler**! Otherwise, big numbers (Salary: 50,000) will dominate small numbers (Age: 30).
* **Real World Example:** Predicting **House Price** based on *Size, Location, Floor, and Age*.

### üîπ 3. Polynomial Regression
* **The Logic:** "The world is curved, not flat." We use powers ($x^2, x^3$) to capture curves.
* **The Math:** It is actually Multiple Regression, but with engineered features:
  $$y = b_0 + b_1x + b_2x^2 + b_3x^3$$
* **The Risk:** If you choose a high Degree (e.g., 10), the model will **Overfit** (memorize the noise).
* **Real World Example:** **Temperature Prediction**. It is cold in the morning, hot at noon, and cold at night. A straight line fails here; you need a "U" curve.

---

## 2. The Regularization Models (The Punishers)
If your model is **Overfitting** (memorizing), we add a "Penalty" to the math.

> **üí° The Philosophy:** "You can make a model, but you are NOT allowed to use huge coefficients (weights). Keep it simple."

### üî∏ 4. Ridge Regression (L2 Regularization) - "The Volume Control"
* **How it works:** It shrinks the coefficients (weights) towards zero, but they never reach exactly zero.
* **The Penalty Math:** Adds $$\lambda \sum (\text{slope})^2$$ to the Error equation.
    * Because we **square** the slope, large numbers (like 10,000) get a **huge** penalty. The model is forced to pick small numbers.
* **Best For:**
    * When you want to keep ALL your features.
    * When you have **Multicollinearity** (duplicate features like "Age in Years" and "Age in Months").
* **Analogy:** In a loud meeting, you don't kick anyone out, but you turn down everyone's volume.

### üî∏ 5. Lasso Regression (L1 Regularization) - "The Mute Button"
* **How it works:** It shrinks coefficients all the way to **Zero (0)**. It effectively deletes useless features.
* **The Penalty Math:** Adds $$\lambda \sum |\text{slope}|$$ to the Error equation. (Absolute value).
* **Best For:**
    * **Feature Selection:** When you have 1000 columns but suspect only 10 are important. Lasso will delete the other 990.
* **Analogy:** In a meeting, you kick out the useless people. Only the important speakers remain.

---

## 3. üèÜ The Grand Decision Table

| Model | Complexity | Risk of Overfitting | Math Penalty | Handling "Trash" Columns | Best Use Case |
| :--- | :--- | :--- | :--- | :--- | :--- |
| **Simple/Multiple** | Low | Low (Underfitting?) | None | Uses everything | The starting point for any project. |
| **Polynomial** | High | **Very High** | None | Uses everything | When data is clearly **curved** (Non-linear). |
| **Ridge (L2)** | Medium | Low | **Squared ($Slope^2$)** | Shrinks them (0.01) | When data is noisy or features are correlated. |
| **Lasso (L1)** | Medium | Low | **Absolute ($|Slope|$)** | **Deletes them (0.0)** | When you have too many columns (Feature Selection). |

---

### üêç Python Tip: Which code to write?

* **Standard:** `LinearRegression()`
* **If Overfitting:** `Ridge(alpha=1.0)`
* **If Too Many Columns:** `Lasso(alpha=0.1)`

*(Note: `alpha` is the strength of the punishment. Higher alpha = Simpler model).*

# üõ°Ô∏è Regularization: Ridge (L2) vs. Lasso (L1)

In Machine Learning, our biggest fear is **Overfitting** (Memorizing).
When a model overfits, it learns the "noise" in the training data and fails on new data.
To fix this, we use **Regularization** (Punishment). We punish the model if it tries to use large coefficients (weights).

---

## 1. The "Strict Teacher" Analogy üë©‚Äçüè´

Imagine your features (columns) are students in a classroom. They are all shouting answers to predict the target.

### üîπ Ridge Regression (L2) - "The Volume Control"
* **Scenario:** The teacher says, *"No one is allowed to shout. Lower your voice!"*
* **Action:** If a student screams (Weight = 10,000), the teacher forces them to whisper (Weight = 5).
* **Result:** Everyone stays in the class, but they are quiet. No single student dominates the decision.
* **Technical:** It **shrinks** coefficients towards zero, but they never reach exactly zero.

### üî∏ Lasso Regression (L1) - "The Silencer"
* **Scenario:** The teacher says, *"If you are whispering useless things, GET OUT."*
* **Action:** If a student is not helpful enough to pay the "tax," the teacher kicks them out.
* **Result:** Only the smart students remain. The room is less crowded.
* **Technical:** It sets coefficients to **exactly Zero (0.0)**. It performs **Feature Selection**.

---

## 2. Ridge Regression (L2 Regularization)

**Formula:** $$\text{Cost} = \text{Error} + \lambda \sum (\text{slope})^2$$

* **Why Square ($^2$)?** Squaring a large number makes it HUGE ($10^2 = 100$). This creates a massive penalty for large weights, forcing the model to pick small numbers.
* **When to use?**
    * When you want to keep **ALL** features (e.g., all pixels in an image).
    * When you have **Multicollinearity** (many duplicate features).

```python
from sklearn.linear_model import Ridge
# alpha is the "Strictness" of the teacher.
# alpha=0: Linear Regression (No rules).
# alpha=100: Very strict (Tiny weights).
ridge_model = Ridge(alpha=1.0)
ridge_model.fit(X_train, y_train)

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import Lasso

# 1. Create Data
# Target (Price) depends strongly on 'Rooms', weakly on 'Age', not at all on 'Shoes'
np.random.seed(42)
df = pd.DataFrame({
    'Rooms': np.random.randint(1, 6, 100),
    'Age': np.random.randint(0, 50, 100),
    'Shoe_Size': np.random.randint(35, 45, 100) # Random
})
# Price formula
df['Price'] = (df['Rooms'] * 10000) - (df['Age'] * 100) + np.random.normal(0, 1000, 100)

# 2. Check Correlation (How related are they?)
print("--- Correlation with Price ---")
print(df.corr()['Price'])

# 3. Run Lasso
X = df[['Rooms', 'Age', 'Shoe_Size']]
y = df['Price']

lasso = Lasso(alpha=20) # High Tax
lasso.fit(X, y)

print("\n--- Lasso Weights ---")
print(f"Rooms: {lasso.coef_[0]:.2f}")     # High weight
print(f"Age:   {lasso.coef_[1]:.2f}")     # Small negative weight
print(f"Shoes: {lasso.coef_[2]:.2f}")     # ZERO (Killed)

--- Correlation with Price ---
Rooms        0.992150
Age         -0.126291
Shoe_Size    0.014847
Price        1.000000
Name: Price, dtype: float64

--- Lasso Weights ---
Rooms: 10021.85
Age:   -100.27
Shoes: 49.02


Feature,  Ridge (L2),                                          Lasso (L1)
Does it reach Zero?,  NO (0.0001),                            YES (0.0)
Feature Selection? NO                                       Yes (Deletes columns)
Best For? Preventing Overfitting while keeping data        Cleaning complex datasets with many useless columns.
Math Penalty,Squared (x2)                                  Absolute ($