# Model Complexity Tuning

In this notebook, we'll learn how to tune the complexity of machine learning models to improve their performance. Fine-tuning model parameters helps us find the best balance between underfitting and overfitting.

## 🚀 Hands-on: Model Complexity Tuning

Let's get our hands dirty with real model tuning! The following diagram illustrates how changing model complexity impacts performance.

![Interactive dashboard showing model complexity adjustment with real-time performance metrics](images/model_tuning.png)

_"Time to get your hands dirty with real model tuning!"_

## 🎛️ Key Complexity Parameters

Different models have parameters that control their complexity. Adjusting these helps us find the best model for our data.

- **Decision Trees:** `max_depth`, `min_samples_split`
- **Random Forest:** `n_estimators`, `max_depth`
- **KNN:** `n_neighbors`
- **Neural Networks:** `hidden_layer_sizes`, `alpha`

## 📊 Systematic Tuning Strategy

To find the best model complexity:
1. **Start Simple:** Use low complexity parameters.
2. **Gradually Increase:** Increase complexity step by step.
3. **Monitor Both:** Track training and validation performance.
4. **Find Sweet Spot:** Look for the point where the model performs best on validation data.

## 💻 Interactive Complexity Tuning

Let's see how changing the `max_depth` of a decision tree affects its performance using validation curves.

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import validation_curve

# Create a regression dataset for demonstration
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=200, n_features=1, noise=0.1, random_state=42)

# Generate range of max_depth values
param_range = np.arange(1, 21)

# Calculate training and validation scores for each max_depth
train_scores, val_scores = validation_curve(
    DecisionTreeRegressor(random_state=42),
    X, y,
    param_name='max_depth',
    param_range=param_range,
    cv=5,
    scoring='neg_mean_squared_error'
)

# Convert scores to positive errors
train_errors = -train_scores.mean(axis=1)
val_errors = -val_scores.mean(axis=1)

# Plot validation curve
plt.figure(figsize=(10, 6))
plt.plot(param_range, train_errors, 'b-', label='Training Error')
plt.plot(param_range, val_errors, 'r-', label='Validation Error')
plt.xlabel('Model Complexity (max_depth)')
plt.ylabel('Mean Squared Error')
plt.legend()
plt.title('Validation Curve: Finding Optimal Complexity')
plt.grid(True)
plt.show()

# Find optimal max_depth
optimal_depth = param_range[np.argmin(val_errors)]
print(f"Optimal max_depth: {optimal_depth}")

## 🎯 Key Insight

_"The best model complexity is found through systematic experimentation, not guesswork!"_