# 🎯 **Bias-Variance Tradeoff: A Comprehensive Guide**

---

## 📖 **Table of Contents**

1. [Introduction & Intuition](#introduction--intuition)
2. [Mathematical Foundation](#mathematical-foundation)
3. [Bias-Variance Decomposition](#bias-variance-decomposition)
4. [Understanding Each Component](#understanding-each-component)
5. [The Tradeoff Mechanism](#the-tradeoff-mechanism)
6. [Model Complexity & Tradeoff](#model-complexity--tradeoff)
7. [Visual Analysis & Examples](#visual-analysis--examples)
8. [Practical Implications](#practical-implications)
9. [Strategies to Manage Tradeoff](#strategies-to-manage-tradeoff)
10. [Real-World Applications](#real-world-applications)
11. [Advanced Concepts](#advanced-concepts)
12. [Implementation & Simulation](#implementation--simulation)

---

## 🚀 **Introduction & Intuition**

### **What is the Bias-Variance Tradeoff?**

The bias-variance tradeoff is a **fundamental concept** in machine learning that describes the relationship between three sources of error in predictive models:

1. 🎯 **Bias** - Error from oversimplified assumptions
2. 📊 **Variance** - Error from sensitivity to training data variations  
3. 🔊 **Irreducible Error** - Inherent noise in the data

### **The Core Dilemma**

**You cannot minimize both bias and variance simultaneously!**

- 📉 **Reducing bias** → Often increases variance
- 📈 **Reducing variance** → Often increases bias
- 🎯 **Goal:** Find the optimal balance

### **Everyday Analogy: Archery Target**

Think of prediction as **shooting arrows at a target**:

🎯 **Low Bias, Low Variance:** Arrows clustered around bullseye ✅  
🎯 **Low Bias, High Variance:** Arrows scattered around bullseye ⚠️  
🎯 **High Bias, Low Variance:** Arrows clustered away from bullseye ⚠️  
🎯 **High Bias, High Variance:** Arrows scattered away from bullseye ❌

### **Machine Learning Context**

- **Target** = True function $f(x)$
- **Arrows** = Model predictions $\hat{f}(x)$
- **Bullseye** = Perfect prediction
- **Clustering** = Low variance
- **Accuracy** = Low bias

---

## 🧮 **Mathematical Foundation**

### **Problem Setup**

Given:
- **True function:** $y = f(x) + \epsilon$ where $\epsilon \sim \mathcal{N}(0, \sigma^2)$
- **Training dataset:** $\mathcal{D} = \{(x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)\}$
- **Learned function:** $\hat{f}_{\mathcal{D}}(x)$ (depends on dataset $\mathcal{D}$)

### **Expected Test Error**

For a new point $(x, y)$, the **expected squared error** is:

$$\boxed{E[(y - \hat{f}(x))^2]}$$

Where the expectation is taken over:
- **Different training sets** $\mathcal{D}$
- **Noise** in the target $y$

### **Key Insight**

The error depends on:
1. **Randomness in training data** → Different $\mathcal{D}$ give different $\hat{f}$
2. **Noise in target variable** → $y$ has inherent randomness
3. **Model's assumptions** → How well model approximates truth

---

## 📈 **Bias-Variance Decomposition**

### **The Fundamental Decomposition**

The expected test error can be decomposed as:

$$\boxed{E[(y - \hat{f}(x))^2] = \text{Bias}^2[\hat{f}(x)] + \text{Var}[\hat{f}(x)] + \sigma^2}$$

### **Detailed Mathematical Derivation**

Let's derive this step by step:

**Step 1:** Expand the squared error
$$E[(y - \hat{f}(x))^2] = E[y^2] - 2E[y\hat{f}(x)] + E[\hat{f}(x)^2]$$

**Step 2:** Use $y = f(x) + \epsilon$ where $E[\epsilon] = 0$
$$E[y] = f(x), \quad E[y^2] = f(x)^2 + \sigma^2$$

**Step 3:** Substitute and rearrange
$$E[(y - \hat{f}(x))^2] = f(x)^2 + \sigma^2 - 2f(x)E[\hat{f}(x)] + E[\hat{f}(x)^2]$$

**Step 4:** Add and subtract $E[\hat{f}(x)]^2$
$$= f(x)^2 - 2f(x)E[\hat{f}(x)] + E[\hat{f}(x)]^2 + E[\hat{f}(x)^2] - E[\hat{f}(x)]^2 + \sigma^2$$

**Step 5:** Recognize bias and variance terms
$$\boxed{= \underbrace{(f(x) - E[\hat{f}(x)])^2}_{\text{Bias}^2} + \underbrace{E[\hat{f}(x)^2] - E[\hat{f}(x)]^2}_{\text{Variance}} + \underbrace{\sigma^2}_{\text{Irreducible Error}}}$$

### **Component Definitions**

#### **Bias**
$$\boxed{\text{Bias}[\hat{f}(x)] = E[\hat{f}(x)] - f(x)}$$

**Measures:** How far the **average prediction** is from the **true value**

#### **Variance**
$$\boxed{\text{Var}[\hat{f}(x)] = E[(\hat{f}(x) - E[\hat{f}(x)])^2]}$$

**Measures:** How much predictions **vary** across different training sets

#### **Irreducible Error**
$$\boxed{\sigma^2 = \text{Var}[\epsilon]}$$

**Measures:** **Inherent noise** in the data that cannot be reduced

---

## 🔍 **Understanding Each Component**

### **1. Bias - The Systematic Error**

#### **Definition & Intuition**
- **Bias** measures **systematic deviation** from truth
- High bias = **Underfitting** = Model too simple
- Low bias = Model captures underlying pattern well

#### **Mathematical Properties**
- $\text{Bias}[\hat{f}(x)] = E[\hat{f}(x)] - f(x)$
- If unbiased: $E[\hat{f}(x)] = f(x)$ ⟹ $\text{Bias} = 0$
- Bias is **deterministic** for a given $x$

#### **Sources of Bias**
1. **Wrong model assumptions** (e.g., assuming linear when truth is nonlinear)
2. **Insufficient model complexity**
3. **Feature limitations**
4. **Algorithmic constraints**

#### **Examples of High Bias Models**
- Linear regression for nonlinear data
- Low-degree polynomials for complex curves
- Naive Bayes with strong independence assumptions
- Logistic regression for complex decision boundaries

### **2. Variance - The Sensitivity Error**

#### **Definition & Intuition**
- **Variance** measures **sensitivity** to training data changes
- High variance = **Overfitting** = Model too flexible
- Low variance = Model stable across different datasets

#### **Mathematical Properties**
- $\text{Var}[\hat{f}(x)] = E[(\hat{f}(x) - E[\hat{f}(x)])^2]$
- Equivalent form: $\text{Var}[\hat{f}(x)] = E[\hat{f}(x)^2] - (E[\hat{f}(x)])^2$
- Variance is always **non-negative**

#### **Sources of Variance**
1. **Limited training data**
2. **High model complexity**
3. **Noisy features**
4. **Random initialization effects**

#### **Examples of High Variance Models**
- High-degree polynomials
- Deep neural networks (without regularization)
- k-NN with small k
- Decision trees with no pruning

### **3. Irreducible Error - The Fundamental Limit**

#### **Definition & Intuition**
- **Irreducible error** is the **minimum possible error**
- Represents **inherent randomness** in the system
- Cannot be reduced by better models or more data

#### **Mathematical Properties**
- $\sigma^2 = \text{Var}[\epsilon] = E[\epsilon^2]$
- Independent of model choice
- Sets **lower bound** on achievable error

#### **Sources of Irreducible Error**
1. **Measurement noise**
2. **Unobserved variables**
3. **Inherent randomness** in the process
4. **Model misspecification** at fundamental level

---

## ⚖️ **The Tradeoff Mechanism**

### **Why the Tradeoff Exists**

#### **Simple Models (High Bias, Low Variance)**
- **Make strong assumptions** → High bias
- **Consistent across datasets** → Low variance
- **Example:** Linear regression for nonlinear data

#### **Complex Models (Low Bias, High Variance)**
- **Few assumptions** → Low bias  
- **Highly sensitive** to data → High variance
- **Example:** High-degree polynomial regression

### **Mathematical Relationship**

As model complexity increases:

$$\boxed{\begin{align}
\text{Bias} &\searrow \text{ (decreases)} \\
\text{Variance} &\nearrow \text{ (increases)} \\
\text{Total Error} &= \text{Bias}^2 + \text{Variance} + \sigma^2
\end{align}}$$

### **The Sweet Spot**

**Optimal complexity** minimizes total error:

$$\boxed{\text{Complexity}^* = \arg\min_{\text{complexity}} [\text{Bias}^2 + \text{Variance} + \sigma^2]}$$

### **Visual Representation**

```
Error
  ↑
  |     Total Error
  |        ∩
  |       / \
  |      /   \
  |     /     \
  |    /       \_____ Bias²
  |___/              
  |  /________________ σ²
  | /        
  |/_______ Variance
  |________________________→ Model Complexity
          ↑
     Optimal Point
```

---

## 📊 **Model Complexity & Tradeoff**

### **Linear Models**

#### **Simple Linear Regression**
$$\hat{f}(x) = \beta_0 + \beta_1 x$$

- **Bias:** High if true relationship is nonlinear
- **Variance:** Low (only 2 parameters to estimate)

#### **Polynomial Regression**
$$\hat{f}(x) = \sum_{i=0}^{d} \beta_i x^i$$

**As degree $d$ increases:**
- **Bias:** $\downarrow$ Can fit more complex curves
- **Variance:** $\uparrow$ More parameters, more sensitivity

### **k-Nearest Neighbors**

$$\hat{f}(x) = \frac{1}{k} \sum_{i \in N_k(x)} y_i$$

**As $k$ decreases:**
- **Bias:** $\downarrow$ More local, flexible predictions  
- **Variance:** $\uparrow$ More sensitive to individual points

**As $k$ increases:**
- **Bias:** $\uparrow$ More global, averaging effect
- **Variance:** $\downarrow$ Stable across datasets

### **Decision Trees**

**Tree depth controls complexity:**

**Shallow trees (low complexity):**
- **Bias:** $\uparrow$ Cannot capture detailed patterns
- **Variance:** $\downarrow$ Consistent splits

**Deep trees (high complexity):**
- **Bias:** $\downarrow$ Can fit training data perfectly
- **Variance:** $\uparrow$ Sensitive to small data changes

### **Neural Networks**

**Network capacity affects tradeoff:**

**Few parameters:**
- **Bias:** $\uparrow$ Limited expressiveness
- **Variance:** $\downarrow$ Stable training

**Many parameters:**
- **Bias:** $\downarrow$ Universal approximation
- **Variance:** $\uparrow$ Sensitive to initialization, data

### **Ensemble Methods**

**Key insight:** Ensembles can reduce variance without increasing bias!

#### **Bagging (Bootstrap Aggregating)**
$$\hat{f}_{\text{bag}}(x) = \frac{1}{B} \sum_{b=1}^{B} \hat{f}_b(x)$$

- **Bias:** Same as base model
- **Variance:** $\downarrow$ Reduced by averaging

#### **Boosting**
Sequential fitting to reduce bias and variance strategically.

---

## 📈 **Visual Analysis & Examples**

### **Polynomial Regression Example**

Consider true function: $f(x) = 1.5x - 0.3x^2 + 0.1x^3 + \epsilon$

#### **Degree 1 (Linear): High Bias, Low Variance**
```python
# Systematic underfit - curved data, straight line model
# Predictions consistent across different datasets
# High bias: Cannot capture curvature
# Low variance: Same line regardless of training set
```

#### **Degree 3 (Optimal): Balanced**
```python
# Captures true cubic relationship well
# Moderate sensitivity to training data
# Low bias: Matches true function form
# Moderate variance: Some variation across datasets
```

#### **Degree 10 (Overfit): Low Bias, High Variance**
```python
# Fits training data perfectly (memorization)
# Wildly different across training sets
# Low bias: Can represent any shape
# High variance: Completely different curves per dataset
```

### **Learning Curves Analysis**

#### **High Bias Scenario**
```
Error
  ↑
  |  Training Error ————————————————
  |                               
  |  Validation Error ——————————————
  |                             
  |________________________→ Training Set Size
  
  Both errors high and close together
  More data doesn't help much
```

#### **High Variance Scenario**
```
Error
  ↑
  |  Validation Error \
  |                    \
  |                     \______
  |  Training Error ——————————————
  |________________________→ Training Set Size
  
  Large gap between training and validation
  More data helps significantly
```

#### **Well-Balanced Model**
```
Error
  ↑
  |  Validation Error \
  |                    \___
  |  Training Error ————————\___
  |________________________→ Training Set Size
  
  Errors converge to reasonable level
  Good generalization achieved
```

---

## 🎯 **Practical Implications**

### **Model Selection Guidelines**

#### **When to Choose Simple Models (Accept Higher Bias)**
- ✅ **Small datasets** - Avoid overfitting
- ✅ **High noise** - Complex models amplify noise
- ✅ **Interpretability needed** - Simple = explainable
- ✅ **Fast inference required** - Simple = fast
- ✅ **Limited computational resources**

#### **When to Choose Complex Models (Accept Higher Variance)**
- ✅ **Large datasets** - Can estimate many parameters
- ✅ **Low noise** - Can afford to be sensitive
- ✅ **Complex underlying patterns** - Need flexibility
- ✅ **Performance critical** - Accuracy over interpretability

### **Diagnostic Tools**

#### **1. Training vs Validation Curves**
```python
def diagnose_bias_variance(train_errors, val_errors):
    """
    Diagnose bias-variance issues from learning curves
    """
    final_train_error = train_errors[-1]
    final_val_error = val_errors[-1]
    gap = final_val_error - final_train_error
    
    if final_val_error > threshold_high and gap < threshold_small:
        return "High Bias (Underfitting)"
    elif gap > threshold_large:
        return "High Variance (Overfitting)"
    else:
        return "Well Balanced"
```

#### **2. Cross-Validation**
- **High bias:** Both train and validation scores low
- **High variance:** Large gap between train and validation scores
- **Balanced:** Good scores with small gap

#### **3. Bootstrap Analysis**
Generate multiple models on bootstrap samples:
- **High variance:** Predictions vary significantly
- **Low variance:** Predictions consistent

### **Data Size Considerations**

#### **Small Data Regime**
- **Variance dominates** - Limited samples to estimate parameters
- **Prefer simpler models** - Linear, low-degree polynomials
- **Regularization crucial** - Prevent overfitting

#### **Large Data Regime**  
- **Bias becomes more important** - Can afford complex models
- **Complex models viable** - Neural networks, high-degree polynomials  
- **Computational efficiency matters** - Training cost scales

### **Feature Engineering Impact**

#### **Too Few Features**
- **High bias** - Cannot represent complex relationships
- **Low variance** - Few parameters to estimate

#### **Too Many Features**
- **Low bias** - Rich representation possible
- **High variance** - Many parameters, curse of dimensionality

#### **Optimal Feature Set**
- **Balanced representation** - Captures important patterns
- **Dimensionality appropriate** for data size

---

## 🛠️ **Strategies to Manage Tradeoff**

### **1. Regularization Techniques**

#### **Ridge Regression (L2)**
$$\min_{\boldsymbol{\beta}} \|\mathbf{y} - \mathbf{X}\boldsymbol{\beta}\|^2 + \lambda \|\boldsymbol{\beta}\|^2$$

**Effect:**
- **Bias:** $\uparrow$ Shrinks coefficients toward zero
- **Variance:** $\downarrow$ Reduces parameter sensitivity
- **Net effect:** Often improves generalization

#### **Lasso Regression (L1)**
$$\min_{\boldsymbol{\beta}} \|\mathbf{y} - \mathbf{X}\boldsymbol{\beta}\|^2 + \lambda \|\boldsymbol{\beta}\|_1$$

**Effect:**
- **Bias:** $\uparrow$ Sets some coefficients to exactly zero
- **Variance:** $\downarrow$ Automatic feature selection
- **Net effect:** Sparse, interpretable models

#### **Elastic Net**
$$\min_{\boldsymbol{\beta}} \|\mathbf{y} - \mathbf{X}\boldsymbol{\beta}\|^2 + \lambda_1 \|\boldsymbol{\beta}\|_1 + \lambda_2 \|\boldsymbol{\beta}\|^2$$

**Effect:** Combines L1 and L2 benefits

### **2. Cross-Validation for Model Selection**

#### **k-Fold Cross-Validation**
```python
def select_optimal_complexity(X, y, complexities, k=5):
    """
    Select model complexity that minimizes CV error
    """
    cv_errors = []
    
    for complexity in complexities:
        fold_errors = []
        
        for fold in k_fold_split(X, y, k):
            X_train, X_val, y_train, y_val = fold
            
            model = train_model(X_train, y_train, complexity)
            error = evaluate_model(model, X_val, y_val)
            fold_errors.append(error)
        
        cv_errors.append(np.mean(fold_errors))
    
    optimal_complexity = complexities[np.argmin(cv_errors)]
    return optimal_complexity
```

### **3. Ensemble Methods**

#### **Bagging (Reduce Variance)**
$$\hat{f}_{\text{bag}}(x) = \frac{1}{B} \sum_{b=1}^{B} \hat{f}_b(x)$$

**Mechanism:**
- Train multiple models on bootstrap samples
- Average predictions reduces variance
- **Bias unchanged, Variance reduced**

#### **Random Forest Example**
```python
class BiasVarianceRandomForest:
    def __init__(self, n_trees=100, max_depth=None):
        self.n_trees = n_trees
        self.max_depth = max_depth
        self.trees = []
    
    def fit(self, X, y):
        for _ in range(self.n_trees):
            # Bootstrap sample
            bootstrap_indices = np.random.choice(
                len(X), size=len(X), replace=True
            )
            X_boot = X[bootstrap_indices]
            y_boot = y[bootstrap_indices]
            
            # Train tree
            tree = DecisionTree(max_depth=self.max_depth)
            tree.fit(X_boot, y_boot)
            self.trees.append(tree)
    
    def predict(self, X):
        # Average predictions (reduces variance)
        predictions = np.array([tree.predict(X) for tree in self.trees])
        return np.mean(predictions, axis=0)
```

#### **Boosting (Reduce Bias)**
Sequential learning to focus on mistakes:

**AdaBoost Example:**
```python
# Each new model focuses on previously misclassified examples
# Gradually reduces bias by learning complex patterns
# Can increase variance if not carefully controlled
```

### **4. Early Stopping**

For iterative algorithms (neural networks, gradient boosting):

```python
def train_with_early_stopping(model, X_train, y_train, X_val, y_val):
    best_val_error = float('inf')
    patience_counter = 0
    patience = 10
    
    for epoch in range(max_epochs):
        model.train_one_epoch(X_train, y_train)
        
        val_error = model.evaluate(X_val, y_val)
        
        if val_error < best_val_error:
            best_val_error = val_error
            patience_counter = 0
            model.save_checkpoint()
        else:
            patience_counter += 1
            
        if patience_counter >= patience:
            model.load_checkpoint()  # Revert to best model
            break
    
    return model
```

### **5. Data Augmentation**

**Increase effective training set size:**
- **Computer Vision:** Rotations, crops, flips
- **NLP:** Paraphrasing, back-translation
- **Time Series:** Jittering, time-warping

**Effect:** More data → Can handle more complex models → Better bias-variance balance

---

## 🌐 **Real-World Applications**

### **Computer Vision**

#### **Image Classification**
- **Simple models:** Linear classifiers on raw pixels
  - **High bias:** Cannot capture complex visual patterns
  - **Low variance:** Stable across datasets
  
- **Complex models:** Deep CNNs
  - **Low bias:** Can learn hierarchical features
  - **High variance:** Sensitive to training data
  
- **Solution:** Transfer learning, data augmentation, regularization

#### **Object Detection**
- **Tradeoff:** Speed vs accuracy
- **Fast models:** Higher bias, lower computational cost
- **Accurate models:** Lower bias, higher computational cost

### **Natural Language Processing**

#### **Text Classification**
- **Bag-of-words:** High bias (ignores word order)
- **Transformers:** Low bias (captures context)
- **Management:** Pre-training, fine-tuning, regularization

#### **Machine Translation**
- **Rule-based:** High bias (fixed rules)
- **Neural:** Low bias (learned patterns)
- **Ensemble:** Combine multiple approaches

### **Time Series Forecasting**

#### **Simple Models**
- **ARIMA:** Assumes linear relationships
- **High bias** for complex patterns
- **Low variance:** Stable parameters

#### **Complex Models**  
- **LSTM/GRU:** Can capture nonlinear patterns
- **Low bias** for complex sequences
- **High variance:** Many parameters

#### **Practical Approach**
```python
def ensemble_forecast(models, X):
    """
    Combine simple and complex models
    """
    predictions = []
    weights = []
    
    for model, weight in models:
        pred = model.predict(X)
        predictions.append(pred)
        weights.append(weight)
    
    # Weighted average reduces variance while maintaining low bias
    ensemble_pred = np.average(predictions, weights=weights, axis=0)
    return ensemble_pred
```

### **Recommendation Systems**

#### **Collaborative Filtering**
- **Memory-based:** High bias (simple similarity)
- **Matrix factorization:** Balanced bias-variance
- **Deep learning:** Low bias, potential high variance

#### **Hybrid Approaches**
Combine multiple techniques to balance bias-variance effectively.

---

## 🧠 **Advanced Concepts**

### **Bias-Variance in Different Learning Paradigms**

#### **Supervised Learning**
- **Classification:** Bias-variance tradeoff in decision boundaries
- **Regression:** Bias-variance in function approximation

#### **Unsupervised Learning**
- **Clustering:** Bias in cluster assumptions vs variance in cluster assignments
- **Dimensionality Reduction:** Bias in linear assumptions vs variance in embeddings

#### **Reinforcement Learning**
- **Value Functions:** Bias in Bellman approximation vs variance in sampling
- **Policy Learning:** Bias in policy parametrization vs variance in exploration

### **Bayesian Perspective**

#### **Prior Distributions**
Strong priors → **Higher bias**, **Lower variance**
Weak priors → **Lower bias**, **Higher variance**

#### **Posterior Inference**
$$p(\theta | \mathcal{D}) \propto p(\mathcal{D} | \theta) p(\theta)$$

**Bias-Variance decomposition in Bayesian setting:**
$$\boxed{E[(y - \hat{y})^2] = \text{Bias}^2 + \text{Variance} + \text{Noise}}$$

Where expectations are over posterior distributions.

### **Information-Theoretic View**

#### **Model Capacity**
- **VC Dimension:** Measures model complexity
- **Rademacher Complexity:** Generalization bounds

#### **Minimum Description Length (MDL)**
Balance between:
- **Model complexity** (longer description)
- **Data fit** (shorter description of residuals)

### **Online Learning**

#### **Regret Bounds**
$$\text{Regret}_T = \sum_{t=1}^{T} \ell(f_t, (x_t, y_t)) - \min_f \sum_{t=1}^{T} \ell(f, (x_t, y_t))$$

**Bias-variance emerges** in regret analysis through:
- **Approximation error** (bias)
- **Estimation error** (variance)

### **Multi-Task Learning**

#### **Shared Representations**
- **High sharing:** Lower variance, potential higher bias
- **Low sharing:** Higher variance, potentially lower bias

#### **Meta-Learning**
Learn bias-variance tradeoff across tasks:
```python
def meta_learn_complexity(tasks, complexities):
    """
    Learn optimal complexity for new tasks
    """
    task_performances = {}
    
    for task in tasks:
        for complexity in complexities:
            performance = evaluate_on_task(task, complexity)
            task_performances[(task, complexity)] = performance
    
    # Learn mapping from task features to optimal complexity
    optimal_complexity_predictor = train_predictor(task_performances)
    return optimal_complexity_predictor
```

---

## 💻 **Implementation & Simulation**

### **Bias-Variance Decomposition Simulation**

```python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline

def bias_variance_decomposition(model_class, X_test, y_test_true, 
                               n_simulations=100, **model_params):
    """
    Empirical bias-variance decomposition
    """
    n_test = len(X_test)
    predictions = np.zeros((n_simulations, n_test))
    
    # Generate multiple training sets and models
    for i in range(n_simulations):
        # Generate noisy training data
        X_train, y_train = generate_training_data()
        
        # Train model
        model = model_class(**model_params)
        model.fit(X_train, y_train)
        
        # Predict on test set
        predictions[i] = model.predict(X_test)
    
    # Compute bias and variance
    mean_predictions = np.mean(predictions, axis=0)
    bias_squared = np.mean((mean_predictions - y_test_true) ** 2)
    variance = np.mean(np.var(predictions, axis=0))
    
    return bias_squared, variance

def generate_training_data(n_samples=100, noise_std=0.3):
    """Generate training data from true function"""
    X = np.random.uniform(-1, 1, n_samples).reshape(-1, 1)
    y_true = true_function(X.ravel())
    y_noisy = y_true + np.random.normal(0, noise_std, n_samples)
    return X, y_noisy

def true_function(x):
    """True underlying function"""
    return 1.5 * x - 0.5 * x**2 + 0.3 * np.sin(15 * x)

# Simulation
X_test = np.linspace(-1, 1, 100).reshape(-1, 1)
y_test_true = true_function(X_test.ravel())

models_to_test = {
    'Linear': (LinearRegression, {}),
    'Poly_2': (Pipeline, [
        ('poly', PolynomialFeatures(2)),
        ('linear', LinearRegression())
    ]),
    'Poly_5': (Pipeline, [
        ('poly', PolynomialFeatures(5)),  
        ('linear', LinearRegression())
    ]),
    'Poly_15': (Pipeline, [
        ('poly', PolynomialFeatures(15)),
        ('linear', LinearRegression())
    ]),
    'RandomForest': (RandomForestRegressor, {'n_estimators': 100})
}

results = {}
for name, (model_class, params) in models_to_test.items():
    bias_sq, variance = bias_variance_decomposition(
        model_class, X_test, y_test_true, **params
    )
    total_error = bias_sq + variance + 0.3**2  # Add noise variance
    
    results[name] = {
        'bias_squared': bias_sq,
        'variance': variance,
        'total_error': total_error
    }
    
    print(f"{name}:")
    print(f"  Bias²: {bias_sq:.4f}")
    print(f"  Variance: {variance:.4f}")
    print(f"  Total Error: {total_error:.4f}")
```