# Lab 7: Model Selection & Hyperparameter Tuning
## Interactive Notebook

### Learning Objectives
1. Master cross-validation techniques
2. Implement grid search
3. Use random search for efficiency
4. Build ML pipelines
5. Compare and select best models

**Estimated Time:** 3-4 hours

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_wine
from sklearn.model_selection import (
    train_test_split, cross_val_score, KFold, StratifiedKFold,
    GridSearchCV, RandomizedSearchCV
)
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score
import warnings
warnings.filterwarnings('ignore')
print('‚úÖ Ready!')

## Part 1: Load Data and Cross-Validation

In [None]:
# Load wine dataset
data = load_wine()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f'Training: {X_train.shape}, Test: {X_test.shape}')

### üìù Task: Compare CV Strategies

In [None]:
# TODO: Test different CV strategies
from sklearn.linear_model import LogisticRegression

model = LogisticRegression(max_iter=1000)

# K-Fold
kfold = KFold(n_splits=5, shuffle=True, random_state=42)
scores_kfold = # YOUR CODE HERE

# Stratified K-Fold
stratified_kfold = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores_stratified = # YOUR CODE HERE

print(f'K-Fold: {scores_kfold.mean():.4f} ¬± {scores_kfold.std():.4f}')
print(f'Stratified K-Fold: {scores_stratified.mean():.4f} ¬± {scores_stratified.std():.4f}')

## Part 2: Grid Search

In [None]:
# TODO: Grid search for Random Forest
param_grid = {
    'n_estimators': [50, 100, 200],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10]
}

rf = RandomForestClassifier(random_state=42)
grid_search = # YOUR CODE HERE

# YOUR CODE: Fit grid search

print(f'Best params: {grid_search.best_params_}')
print(f'Best score: {grid_search.best_score_:.4f}')

## Part 3: Random Search

In [None]:
# TODO: Random search
from scipy.stats import randint

param_distributions = {
    'n_estimators': randint(50, 500),
    'max_depth': [None] + list(randint(5, 50).rvs(10)),
    'min_samples_split': randint(2, 20)
}

random_search = RandomizedSearchCV(
    rf, param_distributions, n_iter=50, cv=5, random_state=42, n_jobs=-1
)

# YOUR CODE: Fit and display results

## Part 4: Pipeline Creation

In [None]:
# TODO: Create pipeline with scaling + classifier
pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', SVC())
])

# Grid search on pipeline
param_grid_pipeline = {
    'classifier__C': [0.1, 1, 10],
    'classifier__kernel': ['linear', 'rbf'],
    'classifier__gamma': ['scale', 'auto']
}

# YOUR CODE: Perform grid search on pipeline

---
# üìù Summary
‚úÖ Compared CV strategies
‚úÖ Implemented grid search
‚úÖ Used random search
‚úÖ Built ML pipelines
**Excellent! üéâ**