# Practical Exercises in Cross-Validation and Evaluation (Exercise Solutions)
In this final section, we’ll engage in practical exercises that involve building, tuning, and evaluating models using cross-validation and performance metrics. These exercises are designed to reinforce the concepts explored throughout the chapter, including model selection, hyperparameter tuning, and generalization evaluation. By completing these exercises, we’ll solidify our understanding of how to effectively assess model performance and select the best model for real-world applications.

## Exercise 1: Cross-Validating a Logistic Regression Model
We’ll evaluate a logistic regression classifier using k-fold cross-validation and report multiple metrics.

### Implementation Steps:

In [None]:
# Load libraries
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_validate

# Load the dataset
X, y = make_classification(n_samples=500, n_features=10, random_state=2024)

# Cross-validate and collect metrics
model = LogisticRegression()
results = cross_validate(model, X, y, cv=5, scoring=["accuracy", "precision", "recall", "f1"])

for metric in ["test_accuracy", "test_precision", "test_recall", "test_f1"]:
    print(f"{metric}: {results[metric].mean():.3f}")

## Exercise 2: Tuning Hyperparameters with Grid Search
We’ll perform hyperparameter tuning using GridSearchCV and compare the results.

### Implementation Steps:

In [None]:
# Load libraries
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.svm import SVC
from sklearn.metrics import classification_report

# Load and split the dataset
X, y = make_classification(n_samples=500, n_features=10, random_state=2024)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=2024)

# Perform grid search
param_grid = {"C": [0.1, 1, 10], "kernel": ["linear", "rbf"]}
grid = GridSearchCV(SVC(), param_grid, cv=5)
grid.fit(X_train, y_train)
print(f"Best parameters: {grid.best_params_}")

# Evaluate on the test set
y_pred = grid.predict(X_test)
print(classification_report(y_test, y_pred))

## Exercise 3: Assessing Model Generalization with Learning and Validation Curves
We’ll use learning_curve and validation_curve to visualize how model performance varies with training size and model complexity.

### Implementation Steps:

In [None]:
# Load libraries
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import learning_curve, validation_curve

# Plot the learning curve
train_sizes, train_scores, test_scores = learning_curve(
    LogisticRegression(), X, y, cv=5, scoring='accuracy', train_sizes=np.linspace(0.1, 1.0, 5), random_state=2024)

plt.plot(train_sizes, train_scores.mean(axis=1), label="Training")
plt.plot(train_sizes, test_scores.mean(axis=1), label="Validation")
plt.title("Learning Curve")
plt.xlabel("Training Size")
plt.ylabel("Accuracy")
plt.legend()
plt.grid(True)
plt.show()

# Plot the validation curve
param_range = np.logspace(-3, 2, 6)
train_scores, test_scores = validation_curve(
    LogisticRegression(), X, y, param_name="C", param_range=param_range,
    cv=5, scoring='accuracy')

plt.semilogx(param_range, train_scores.mean(axis=1), label="Training")
plt.semilogx(param_range, test_scores.mean(axis=1), label="Validation")
plt.title("Validation Curve")
plt.xlabel("C (Inverse Regularization Strength)")
plt.ylabel("Accuracy")
plt.legend()
plt.grid(True)
plt.show()