# Hands-on Model Tuning Challenge

In this notebook, we will explore how to tune machine learning models for better performance.
We'll work with regression models on housing price data and visualize how different complexity levels affect performance.

Let's dive in!

## Step 1: Data Preparation

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load housing data
housing = fetch_california_housing()
X = housing.data
y = housing.target

# Split into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features for better model performance
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## Step 2: Model Tuning Function

In [None]:
from sklearn.model_selection import validation_curve

def tune_model(model, param_name, param_range, X, y):
    """Tune a model parameter and return validation curve data"""
    train_scores, val_scores = validation_curve(
        model,
        X, y,
        param_name=param_name,
        param_range=param_range,
        cv=5,
        scoring='neg_mean_squared_error'
    )
    # Convert scores to positive MSE
    train_scores_mean = -train_scores.mean(axis=1)
    val_scores_mean = -val_scores.mean(axis=1)
    return train_scores_mean, val_scores_mean

## Step 3: Visualization Function

In [None]:
import matplotlib.pyplot as plt

def plot_validation_curve(param_range, train_scores, val_scores, title):
    plt.figure(figsize=(8, 6))
    plt.plot(param_range, train_scores, label='Training Error')
    plt.plot(param_range, val_scores, label='Validation Error')
    plt.xlabel('Parameter Value')
    plt.ylabel('Mean Squared Error')
    plt.title(title)
    plt.legend()
    plt.grid(True)
    plt.show()

## Step 4: Model Tuning and Visualization

In [None]:
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.neighbors import KNeighborsRegressor

models_to_tune = {
    'Decision Tree': (DecisionTreeRegressor(), 'max_depth', range(1, 21)),
    'Random Forest': (RandomForestRegressor(), 'n_estimators', range(10, 201, 20)),
    'KNN': (KNeighborsRegressor(), 'n_neighbors', range(1, 51, 2))
}

results = {}

for model_name, (model, param_name, param_range) in models_to_tune.items():
    print(f"Tuning {model_name}...")
    train_scores, val_scores = tune_model(model, param_name, list(param_range), X_train, y_train)
    results[model_name] = {
        'param_range': list(param_range),
        'train_scores': train_scores,
        'val_scores': val_scores
    }
    plot_validation_curve(list(param_range), train_scores, val_scores, f"Validation Curve for {model_name}")

## Step 5: Analyze Results

Based on the validation curves, identify the optimal parameter values for each model and observe regions of overfitting or underfitting.

This visual analysis helps in choosing the best model and parameter setting for our dataset.

## Conclusion

Congratulations! You've just learned how to tune hyperparameters of different models and visualize their performance.
Different models have different optimal complexity levels, and validation curves help us identify these effectively.
Remember, the best model choice depends on your specific dataset and problem!