# Bias-Variance Tradeoff Lab

In this lab, we will explore the bias-variance tradeoff, a fundamental concept in machine learning that helps us understand the performance of our models. We will implement examples to visualize how bias and variance affect model performance.

In [1]:
# Import necessary libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Create a synthetic dataset
X, y = make_moons(n_samples=100, noise=0.2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Function to plot decision boundaries
def plot_decision_boundary(clf, X, y):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01), np.arange(y_min, y_max, 0.01))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, alpha=0.8)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='k', marker='o')
    plt.title('Decision Boundary')
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.show()


## Understanding Bias and Variance

In this section, we will train models with different complexities and observe how the bias and variance change. We will use a decision tree classifier with varying depths to illustrate this.


In [2]:
# Train models with different complexities
depths = [1, 3, 5, 10]
plt.figure(figsize=(15, 10))

for i, depth in enumerate(depths):
    clf = DecisionTreeClassifier(max_depth=depth)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    plt.subplot(2, 2, i + 1)
    plot_decision_boundary(clf, X_test, y_test)
    plt.title(f'Decision Tree Depth: {depth}, Accuracy: {accuracy:.2f}')

plt.tight_layout()
plt.show()


## Analysis

After running the above code, observe how the decision boundaries change with different depths of the decision tree. 
- **Low Depth (High Bias)**: The model is too simple and cannot capture the underlying patterns (underfitting).
- **Optimal Depth**: The model captures the patterns well without being too complex.
- **High Depth (High Variance)**: The model is too complex and captures noise in the data (overfitting).

This illustrates the bias-variance tradeoff: as we increase model complexity, bias decreases but variance increases.

## Conclusion

In this lab, we visualized the bias-variance tradeoff using decision trees. Understanding this tradeoff is crucial for selecting the right model complexity to achieve optimal performance.