## 1. Purpose of Linear Regression
Linear regression models the relationship between a **target variable** and one or more **predictor variables**.

**Why we use it:**
- Predict a continuous value (price, demand, revenue).
- Understand how each feature influences the target.
- Establish a baseline model before using more complex methods.

**Key idea:** Fit a straight line that best explains the data trend.

In [None]:
import numpy as np
from sklearn.linear_model import LinearRegression

# Simple example: y = 2x
# X must be 2D for sklearn, so reshape
X = np.array([[1],[2],[3],[4]])
y = np.array([2,4,6,8])

# Fit linear regression
model = LinearRegression().fit(X,y)

# coef_ is slope, intercept_ is the constant term
model.coef_, model.intercept_

## 2. Collinearity
Collinearity means **two variables move together in a predictable way**.

**Examples:**
- Engine size and horsepower increase together.
- Height and weight often increase together.

Collinearity is about **two variables** being related. It becomes a problem when it grows into **multicollinearity**.

## 3. Multicollinearity
Multicollinearity occurs when **three or more predictors are highly correlated**, or when one predictor can be predicted from a combination of others.

**Why it's a problem:**
- Coefficients become unstable.
- Model becomes sensitive to small data changes.
- Hard to interpret feature importance.

**Example:**
- `engine_size`, `horsepower`, and `torque` all strongly related.
- The model struggles to decide which one is truly important.

## 4. Correlation Plots
Correlation plots help visualize relationships between variables.

**Use cases:**
- Detect collinearity and multicollinearity.
- Understand feature relationships.
- Quick EDA before modeling.

In [None]:
import seaborn as sns
import pandas as pd

# Load a simple dataset
df = sns.load_dataset('iris').drop(columns=['species'])

# Heatmap shows correlation between features
# High values (close to 1 or -1) indicate strong relationships
sns.heatmap(df.corr(), annot=True, cmap='coolwarm')

## 5. Remedies for Multicollinearity
- Remove one of the correlated features.
- Combine features (averages, ratios).
- Use **PCA** to reduce dimensionality.
- Use **regularization** (Ridge, LASSO, Elastic Net).
- Increase dataset size if possible.

## 6. Principal Component Analysis (PCA)
PCA reduces dimensionality by transforming features into **uncorrelated components**.

**Why PCA helps:**
- Removes multicollinearity.
- Reduces noise.
- Speeds up training.

**Key idea:** Rotate the feature space to new axes (principal components) that capture maximum variance.

In [None]:
from sklearn.decomposition import PCA

# Fit PCA on iris numeric features
pca = PCA(n_components=2)
pca.fit(df)

# Shows how much variance each component captures
pca.explained_variance_ratio_

## 7. Bias vs Variance
**Bias:** Error from overly simple assumptions (underfitting).

**Variance:** Error from overly complex models (overfitting).

**Goal:** Find the balance — a model that generalizes well.

**High bias model:** Linear regression with too few features.

**High variance model:** Polynomial regression with high degree.

## 8. Regularization
Regularization prevents **overfitting** by adding a penalty to large coefficients.

### Why overfitting happens
- The model learns noise instead of patterns.
- Coefficients become extremely large.
- Model performs well on training data but poorly on new data.

### What regularization does
- Shrinks coefficients toward zero.
- Reduces model complexity.
- Improves generalization.
- Helps with multicollinearity.

### Real‑world examples
**1. Car price prediction**
- Features like horsepower, engine size, and torque are correlated.
- Regularization stabilizes coefficients.

**2. Marketing budget optimization**
- Many channels overlap (Google Ads, Meta Ads, Display Ads).
- Regularization prevents over‑crediting one channel.

**3. Healthcare risk scoring**
- BMI, weight, waist size are correlated.
- Regularization avoids unstable predictions.

Types of regularization:
- **L1 (LASSO)** → feature selection.
- **L2 (Ridge)** → coefficient shrinkage.
- **Elastic Net** → combination of L1 + L2.

## 9. LASSO Regression (L1 Regularization)
LASSO adds an **L1 penalty** (absolute value of coefficients).

### Key properties
- Can shrink some coefficients **exactly to zero**.
- Performs **automatic feature selection**.
- Useful when you have many features.

**Marketing attribution**: Out of 50 campaign features, LASSO identifies the top 5–10 that truly drive conversions.

In [None]:
from sklearn.linear_model import Lasso

# LASSO applies L1 penalty
# This encourages some coefficients to become exactly zero
lasso = Lasso(alpha=0.1)
lasso.fit(X, y)

# Coefficients after L1 shrinkage
# If any value becomes 0, LASSO removed that feature
lasso.coef_

## 10. Ridge Regression (L2 Regularization)
Ridge adds an **L2 penalty** (square of coefficients).

### Key properties
- Shrinks coefficients but **never sets them to zero**.
- Best when all features are useful.
- Excellent for **multicollinearity**.

**Car price prediction**: Engine size, horsepower, and torque are correlated. Ridge stabilizes the model by distributing weights more evenly.

In [None]:
from sklearn.linear_model import Ridge

# Ridge applies L2 penalty
# This reduces coefficient magnitude but keeps all features
ridge = Ridge(alpha=1.0)
ridge.fit(X, y)

ridge.coef_

## 11. Elastic Net (L1 + L2 Regularization)
Elastic Net combines **L1 (LASSO)** and **L2 (Ridge)** penalties.

### Key properties
- Performs feature selection (L1).
- Stabilizes coefficients (L2).
- Works well when features are correlated.

**Customer churn prediction**: Many behavioral features overlap. Elastic Net selects important ones while keeping the model stable.

In [None]:
from sklearn.linear_model import ElasticNet

# Elastic Net blends L1 and L2 penalties
# l1_ratio controls the mix: 0 = pure Ridge, 1 = pure LASSO
enet = ElasticNet(alpha=0.1, l1_ratio=0.5)
enet.fit(X, y)

enet.coef_

## 12. Polynomial Regression
Polynomial regression models **non‑linear relationships** by adding polynomial terms.

**Example:** Instead of fitting a straight line, fit a curve.

**Risk:** High‑degree polynomials can overfit (high variance).

In [None]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline

# Polynomial regression of degree 2
# Adds x^2 term automatically
poly_model = Pipeline([
    ('poly', PolynomialFeatures(degree=2)),
    ('lr', LinearRegression())
])

poly_model.fit(X, y)

# Coefficients for 1, x, x^2
poly_model.named_steps['lr'].coef_