# Bias–Variance Tradeoff

This notebook is a **companion to `01_bias_variance.md`**.

Purpose:
- Build intuition via simulation
- Visualize bias vs variance

---

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

np.random.seed(42)

## Synthetic Data Generation

We generate data from a **nonlinear function** so that:
- Simple models exhibit **high bias**
- Flexible models exhibit **high variance**

---

In [None]:
def generate_data(n=50, noise=0.3):
    X = np.linspace(0, 10, n)
    y = np.sin(X) + np.random.normal(0, noise, size=n)
    return X.reshape(-1, 1), y

X, y = generate_data()

## Model Definitions

- **Linear Regression** → high bias
- **Deep Decision Tree** → high variance

---

In [None]:
linear_model = LinearRegression()
tree_model = DecisionTreeRegressor(max_depth=None)

## Single Train/Test Split Comparison

This illustrates **underfitting vs overfitting**.

---

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

linear_model.fit(X_train, y_train)
tree_model.fit(X_train, y_train)

linear_train_mse = mean_squared_error(y_train, linear_model.predict(X_train))
linear_test_mse = mean_squared_error(y_test, linear_model.predict(X_test))

tree_train_mse = mean_squared_error(y_train, tree_model.predict(X_train))
tree_test_mse = mean_squared_error(y_test, tree_model.predict(X_test))

pd.DataFrame({
    'Model': ['Linear Regression', 'Decision Tree'],
    'Train MSE': [linear_train_mse, tree_train_mse],
    'Test MSE': [linear_test_mse, tree_test_mse]
})

## Visualization

Visual intuition often makes the bias–variance tradeoff obvious.

---

In [None]:
X_plot = np.linspace(0, 10, 300).reshape(-1, 1)

plt.figure(figsize=(10, 5))
plt.scatter(X, y, alpha=0.5, label='Data')
plt.plot(X_plot, linear_model.predict(X_plot), label='Linear (High Bias)')
plt.plot(X_plot, tree_model.predict(X_plot), label='Tree (High Variance)')
plt.legend()
plt.title('Bias vs Variance Illustration')
plt.show()