# Understanding Underfitting and Overfitting in Machine Learning

## 📊 Underfitting vs Overfitting

![Graphs illustrating underfitting, good fit, and overfitting](images/underfitting_overfitting.png)

*"Finding the Goldilocks zone of model complexity!"*

## 📉 Underfitting (High Bias)
- **What it is:** Model is too simple to capture underlying patterns
- **Signs:** Poor performance on both training and test data
- **Causes:** Model complexity too low, insufficient features
- **Example:** Using linear regression for clearly non-linear data

## 📈 Overfitting (High Variance)
- **What it is:** Model learns training data too well, including noise
- **Signs:** Great training performance, poor test performance
- **Causes:** Model too complex, too many features, insufficient data
- **Example:** Decision tree that memorizes every training example

## 🎯 Good Fit (Just Right)
- **What it is:** Model captures true patterns without memorizing noise
- **Signs:** Good performance on both training and test data
- **Achievement:** Balanced model complexity
- **Goal:** Generalizes well to unseen data

## 💻 Detecting Over/Underfitting

In [ ]:
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

# Assuming X and y are predefined feature matrix and target vector
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Test different model complexities by varying max_depth
depths = [1, 3, 5, 10, 20]
train_errors = []
test_errors = []

for depth in depths:
    model = DecisionTreeRegressor(max_depth=depth, random_state=42)
    model.fit(X_train, y_train)
    
    train_pred = model.predict(X_train)
    test_pred = model.predict(X_test)
    
    train_errors.append(mean_squared_error(y_train, train_pred))
    test_errors.append(mean_squared_error(y_test, test_pred))
    
    print(f"Depth {depth}: Train Error = {train_errors[-1]:.4f}, Test Error = {test_errors[-1]:.4f}")

[🚀 Open in Colab](https://colab.research.google.com/github/Roopesht/codeexamples/blob/main/genai/python_easy/3/overfitting.ipynb)

## 🎯 Key Takeaway
> "Monitor both training and test performance to detect overfitting early!"

### 💭 Question
- If your model has 95% accuracy on training data but only 60% on test data, what's happening?