# Understanding Bias and Variance

This notebook introduces the concepts of Bias and Variance in machine learning, helping you understand the fundamental tradeoff in model performance.

![Dartboard showing bias and variance](images/bias_variance_dartboard.png)

## What is Bias and Variance?

Bias and Variance are two types of error that affect the performance of machine learning models.

Bias measures errors due to overly simplistic assumptions in the model. High bias can lead to underfitting, where the model fails to capture the true pattern.

Variance measures errors due to the model's sensitivity to small fluctuations in the training data. High variance can cause overfitting, where the model captures noise instead of the underlying pattern.

### Bias Explained

- 🎯 Error due to oversimplified assumptions
- 📉 Model consistently misses the true pattern
- 🔍 High bias = underfitting
- 💡 Example: Using linear model for curved data

### Variance Explained

- 🎲 Error due to sensitivity to small data changes
- 📊 Model predictions vary widely with different datasets
- 🔍 High variance = overfitting
- 💡 Example: Very deep decision tree

## Visualizing Bias and Variance with Code

Below is an example code that demonstrates high bias (underfitting) and high variance (overfitting) using simple models.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor

# Generate sample data
np.random.seed(42)
X = np.linspace(0, 1, 100).reshape(-1, 1)
y = 0.5 * X.ravel() + 0.3 * np.sin(15 * X.ravel()) + np.random.normal(0, 0.1, X.shape[0])

# High bias model (underfitting)
linear_model = LinearRegression()
linear_model.fit(X, y)

# High variance model (overfitting)
tree_model = DecisionTreeRegressor(max_depth=20)
tree_model.fit(X, y)

print("Linear Model (High Bias):", linear_model.score(X, y))
print("Deep Tree (High Variance):", tree_model.score(X, y))

## Key Takeaway

- 🎯 **The Goal:** Find the sweet spot between bias and variance for optimal model performance.

💭 How would you explain bias vs variance to a friend using a real-world analogy?