# 8.0 Model Evaluation and Hyperparameter Tuning

This lesson will cover model evaluation techniques and hyperparameter tuning in machine learning. These two concepts are crucial for improving the performance of machine learning models and ensuring that they generalize well to unseen data. 

**Learning Objectives:**
By the end of this lesson, students will be able to:

* Understand the importance of model evaluation.
* Evaluate models using different metrics depending on the problem type (regression or classification).
* Implement model validation techniques like cross-validation.
* Understand the role of hyperparameters in model training.
* Use techniques such as grid search and random search to tune hyperparameters.

## 8.1. Model Evaluation

**Key Concepts:**
* **Training data:** Used to train the model.
* **Testing data:** Used to evaluate model performance.
* **Overfitting:** When a model performs well on the training data but poorly on unseen data because it has learned noise or random fluctuations.
* **Underfitting:** When a model is too simple and fails to capture underlying patterns in the data.
* **Generalization:** The model’s ability to perform well on new, unseen data.

Evaluating model performance ensures it is not just memorizing data (overfitting) or ignoring patterns (underfitting). Performance should be assessed on data separate from the training set.

### 8.1.1. Model Evaluation Metrics for Classification Problems
* **Accuracy:** The percentage of correct predictions out of all predictions. In imbalanced datasets (where one class is much more frequent than the other), accuracy might be misleading.
* **Precision:** The ratio of correctly predicted positive observations to all predicted positives. This metric is useful when the cost of false positives is high (e.g., predicting whether a customer will churn).
* **Recall (sensitivity):** The ratio of correctly predicted positive observations to all actual positives. This is useful when the cost of false negatives is high (e.g., medical diagnoses where failing to detect a disease could be fatal).
* **F1-Score:** The harmonic mean of precision and recall, providing a balance between the two. This metric is useful when you want a single metric that combines precision and recall.
* **ROC Curve & AUC:** Used to evaluate binary classifiers, showing performance at various threshold settings.
* **Confusion Matrix:** A table showing the true vs. predicted classifications. This allows you to calculate precision, recall, accuracy, and F1-score.

**Hands-on Example: Evaluating a Classification Model**

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train a classifier
model = LogisticRegression(max_iter=200)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate model
print("Accuracy:", accuracy_score(y_test, y_pred))
print("Precision:", precision_score(y_test, y_pred, average='weighted'))
print("Recall:", recall_score(y_test, y_pred, average='weighted'))
print("F1-Score:", f1_score(y_test, y_pred, average='weighted'))

### 8.1.2. Model Evaluation Metrics for Regression Problems
* **Mean Absolute Error (MAE):** The average of the absolute differences between predicted and actual values. It is useful to calculate the absolute size of errors.
* **Mean Squared Error (MSE):** The average of the squared differences between predicted and actual values. It is useful when you want to penalize larger errors more heavily.
* **Root Mean Squared Error (RMSE):** The square root of MSE, which brings the error back to the original unit of measurement.
* **R-squared (R²):** A measure of how well the model explains the variance in the target variable. It is useful to measure the proportion of the variance in the target variable that is explained by the model.
  
**Hands-on Example: Evaluating a Regression Model**

In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.linear_model import LinearRegression

# Load dataset
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a regression model
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Make predictions
y_pred = regressor.predict(X_test)

# Evaluate model
print("MAE:", mean_absolute_error(y_test, y_pred))
print("MSE:", mean_squared_error(y_test, y_pred))
print("RMSE:", mean_squared_error(y_test, y_pred, squared=False))
print("R^2:", r2_score(y_test, y_pred))

### 8.1.3. Cross-Validation
Cross-validation is a technique used to assess how well a model generalizes by splitting the data into multiple training and testing sets. It is especially useful when the dataset is small. It reduces the variance of performance estimates and helps mitigate overfitting.

**K-Fold Cross-Validation**
* The dataset is split into *k* equal-sized folds.
* The model is trained *k* times, each time using *k−1* folds for training and the remaining fold for testing.
* The average performance across all *k* folds is reported.

In [None]:
from sklearn.model_selection import cross_val_score

# Perform 5-fold cross-validation on a classifier
cross_val_scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')

# Print the cross-validation scores
print("Cross-validation scores:", cross_val_scores)
print("Average cross-validation score:", cross_val_scores.mean())

## 8.2. Hyperparameter Tuning
Hyperparameters are parameters that are set before the learning process begins (e.g., learning rate, number of trees in a forest, etc.). Tuning them is essential for optimizing model performance. The right hyperparameters can significantly improve model performance, while poor choices can lead to overfitting, underfitting, or slow training times.

### 8.2.1. Introduction to Hyperparameters

A hyperparameter is a configuration variable in machine learning algorithms that is set before training the model and governs the training process or the structure of the model itself (e.g., learning rate, number of trees in a Random Forest, number of layers in a neural network). Unlike parameters, which are learned from the data during training (such as the weights in a neural network), hyperparameters control how the model learns or the model's architecture.

**Examples of Hyperparameters:**
1. **Model Architecture Hyperparameters:**
* Number of layers: In a neural network, you can decide how many hidden layers the model should have.
* Number of neurons in each layer: In a neural network, this controls how many neurons each layer contains.
Kernel size (for Convolutional Neural Networks, CNNs): Determines the dimensions of the kernel in convolutional layers.
* Tree depth (for Decision Trees, Random Forests): Controls the maximum depth of the tree, affecting its complexity.

2. **Training Process Hyperparameters:**
* Learning rate: Controls how quickly or slowly the model adjusts its parameters during training (gradient descent). A higher learning rate means the model might converge faster, but too high a rate might cause it to overshoot the optimal solution.
* Batch size: The number of training samples used in one iteration of model training.
* Epochs: The number of times the entire dataset is passed through the model during training.
* Momentum: In gradient-based optimization methods, momentum helps smooth out the updates to the model weights.

3. **Regularization Hyperparameters:**
* L1 or L2 regularization: These regularizers help prevent overfitting by adding penalties to large weights (L2) or enforcing sparsity (L1).
* Dropout rate: In neural networks, dropout randomly disables a fraction of neurons to prevent overfitting during training.

4. **Optimization Hyperparameters:**
* Optimization algorithm: Whether to use algorithms like Stochastic Gradient Descent (SGD), Adam, or RMSprop to update model parameters.
* Beta values (for algorithms like Adam): These control the moving averages of the gradients and squared gradients during training.

Other than that, some hyperparameters are specific to model architecture. For example, in Random Forest, some hyperparameters can be optimized, such as number of trees, max features (controls how many features to consider when splitting a node) and max depth (limits the depth of the trees in a decision tree or random forest to prevent overfitting).

### 8.2.2. Methods of hyperparameter tuning

* **Grid Search:** Exhaustively tests a range of hyperparameters, evaluates model performance for each combination, and selects the best one.
* **Random Search:** Randomly samples combinations of hyperparameters, offering a more efficient search compared to grid search. It is faster than grid search and can explore a larger space in a shorter amount of time, but doesn’t guarantee finding the best combination.
* **Bayesian Optimization:** A probabilistic model that uses past evaluations to guide the search for the optimal hyperparameters. It is more efficient than grid search and random search and often requires fewer evaluations to find the best parameters. In Python, one popular library for Bayesian Optimization is `Hyperopt` or `Optuna`.

**Grid Search (Example with Random Forest):**

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset
data = load_iris()
X = data.data
y = data.target

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create a RandomForest model
rf = RandomForestClassifier()

# Define hyperparameters grid
param_grid = {
    'n_estimators': [10, 50, 100, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Perform grid search
grid_search = GridSearchCV(estimator=rf, param_grid=param_grid, cv=5, n_jobs=-1)
grid_search.fit(X_train, y_train)

# Best parameters found
print("Best parameters:", grid_search.best_params_)
print("Best score:", grid_search.best_score_)

**Random Search (Example with Random Forest):**

In [None]:
from scipy.stats import randint

# Create a RandomForest model
rf = RandomForestClassifier()

# Define hyperparameters distribution
param_dist = {
    'n_estimators': randint(10, 200),
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': randint(2, 20),
    'min_samples_leaf': randint(1, 20)
}

# Perform randomized search
random_search = RandomizedSearchCV(estimator=rf, param_distributions=param_dist, n_iter=100, cv=5, n_jobs=-1)
random_search.fit(X_train, y_train)

# Best parameters found
print("Best parameters:", random_search.best_params_)
print("Best score:", random_search.best_score_)

**Bayesian Optimization (Example with Hyperopt using RandomForestClassifier)**:

In [None]:
from hyperopt import fmin, tpe, hp, Trials
from sklearn.metrics import accuracy_score

# Define the objective function for hyperparameter tuning
def objective(params):
    rf = RandomForestClassifier(**params)
    rf.fit(X_train, y_train)
    y_pred = rf.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    return -accuracy  # Return negative accuracy for minimization

# Define the search space
space = {
    'n_estimators': hp.choice('n_estimators', [10, 50, 100, 200]),
    'max_depth': hp.choice('max_depth', [None, 10, 20, 30]),
    'min_samples_split': hp.choice('min_samples_split', [2, 5, 10]),
    'min_samples_leaf': hp.choice('min_samples_leaf', [1, 2, 4])
}

# Perform Bayesian optimization with Tree-structured Parzen Estimator (TPE)
trials = Trials()
best = fmin(fn=objective, space=space, algo=tpe.suggest, max_evals=20, trials=trials)

# Print the best found hyperparameters
print("Best hyperparameters:", best)

**Practical**
* Step 1: Load a dataset (e.g., Iris dataset or Boston housing dataset).
* Step 2: Split the data into training and testing sets. Use train_test_split from sklearn for this.
* Step 3: Train a model (e.g., Logistic Regression or Random Forest).
* Step 4: Evaluate the model using the appropriate metrics (accuracy, precision, recall, etc.). Use cross_val_score or GridSearchCV to apply cross-validation.
* Step 5: Tune hyperparameters using GridSearchCV or RandomizedSearchCV: Choose a set of hyperparameters to tune (e.g., max_depth, n_estimators for a Random Forest).
* Run the search and evaluate performance.
* Step 6: Compare model performance before and after hyperparameter tuning.