# Nested cross-validation 

Nested cross-validation 🔎 is a technique used in machine learning that serves the dual purpose of estimating the model's generalization error while simultaneously searching for the best set of hyperparameters or model configurations. It's a combination of two cross-validation loops: an outer loop and an inner loop.

🔄 Outer Cross-Validation (Outer Loop): The outer loop is responsible for estimating the model's performance. It typically uses k-fold cross-validation, where the original dataset is divided into k subsets or folds. In each iteration, one fold is used as a validation set, and the remaining k-1 folds are used for training. The model is trained and evaluated k times, and the average performance metric (e.g., accuracy, F1-score) is calculated over these iterations. This gives us an estimate of how well the model performs on unseen data.

🔄 Inner Cross-Validation (Inner Loop): Inside each iteration of the outer loop, there's another cross-validation loop. This inner loop is used for hyperparameter tuning or model selection. It's similar to the outer loop but focuses on selecting the best set of hyperparameters or model configurations. The inner loop also uses k-fold cross-validation but is applied to the training data from the outer loop. Different hyperparameter combinations or models are evaluated, and the best-performing combination is selected.

![cross cv diagram](../.image/cross_cv.JPG)

### key advantages of nested cross-validation

- ✔️ Robust Performance Estimation: By using nested cross-validation, we obtain a more reliable estimate of our model's performance because it considers variations in both the training and validation data.

- ✔️ Avoiding Data Leakage: Nested cross-validation helps prevent data leakage, which can occur when hyperparameter tuning or model selection is performed on the same data used for performance estimation. The inner loop ensures that model selection occurs on independent training and validation sets.

- ✔️ Optimal Hyperparameter Tuning: It allows us to find the best hyperparameters or model configuration for our specific dataset while avoiding overfitting.

### Example workflow in  cross-validation:

**Outer Loop (Performance Estimation)**:

- Split the dataset into k folds.
- In each iteration:
    - Use k-1 folds for training.
    - Use the remaining fold for validation.
    - Calculate a performance metric (e.g., accuracy) on the validation set.
- Average the performance metrics from all iterations to estimate the model's overall performance.
    
**Inner Loop (Hyperparameter Tuning)**:

- Inside each iteration of the outer loop:
    - Split the training data from the outer loop into k folds.
    - In each inner iteration:
        - Use k-1 folds for training within the training data.
        - Use the remaining fold for validation within the training data.
        - Try different hyperparameter settings or model configurations.  
        - Calculate a performance metric on the inner validation set.
    - Choose the hyperparameters or model configuration that performed best on average across inner iterations.

Nested cross-validation provides a more robust and unbiased way to evaluate and tune models, ensuring that our final model's performance estimates are more trustworthy. Here is a [great read](https://towardsdatascience.com/validating-your-machine-learning-model-25b4c8643fb7) on the topic, below pls see code demo to play with. 🏄🏻‍♀️ 

In [5]:
import numpy as np
import pandas as pd
from sklearn.model_selection import cross_val_score, train_test_split
from sklearn.metrics import accuracy_score, roc_auc_score
from hyperopt import fmin, tpe, hp, Trials, space_eval
from sklearn.datasets import load_breast_cancer  
from sklearn.ensemble import RandomForestClassifier 

In [6]:
# Load the Breast Cancer dataset
data = load_breast_cancer()
X = data.data
y = data.target

# Define the hyperparameter space to search
space = {
    'max_depth': hp.choice('max_depth', [int(x) for x in range(2, 11)]),
    'min_samples_split': hp.uniform('min_samples_split', 0.1, 1.0),
    'min_samples_leaf': hp.uniform('min_samples_leaf', 0.1, 0.5),
}

In [7]:
# Define the outer cross-validation loop
outer_scores = []
outer_loop_log = {}
outer_cv = 5 # Number of outer loop iterations
for i in range(outer_cv):  
    # Split the data into training and test sets for the outer loop
    X_train_outer, X_test_outer, y_train_outer, y_test_outer = train_test_split(X, y, test_size=0.3, random_state=42)
    
    def objective(params):
        # Create a Random Forest classifier with the given hyperparameters
        clf = RandomForestClassifier(**params)
        # Use cross-validation to evaluate the model
        scores = cross_val_score(clf, X_train_outer, y_train_outer, cv=5, scoring='roc_auc')
        # Return the negative mean auc
        return -np.mean(scores)
    
    # Optimize hyperparameters using Hyperopt (inner loop)
    trials = Trials()
    best = fmin(fn=objective, space=space, algo=tpe.suggest, max_evals=10, trials=trials)
    best_params = space_eval(space, best)
    outer_loop_log[f'fold{i}'] = {}
    outer_loop_log[f'fold{i}']['best_params'] = best_params
    outer_loop_log[f'fold{i}']['tune_auc'] = -trials.best_trial['result']['loss']
    
    # Create a Random Forest classifier with the best hyperparameters generated from the inner loop
    clf = RandomForestClassifier(**best_params)
    # Train the final model on the training data for the outer loop
    clf.fit(X_train_outer, y_train_outer)
    # Evaluate the final model on the test set for the outer loop
    y_pred_outer = clf.predict_proba(X_test_outer)[:,1]
    test_auc = roc_auc_score(y_test_outer, y_pred_outer)
    outer_loop_log[f'fold{i}']['test_auc'] = test_auc
    print('test_auc in the outer loop', test_auc)
    outer_scores.append(test_auc)

# Calculate the mean and standard deviation of outer loop scores
mean_auc = np.mean(outer_scores)
std_auc = np.std(outer_scores)

print("Mean AUC: {:.3f}".format(mean_auc))
print("Standard Deviation: {:.4f}".format(std_auc))

  0%|          | 0/10 [00:00<?, ?trial/s, best loss=?]

100%|██████████| 10/10 [00:10<00:00,  1.07s/trial, best loss: -0.9783271874266948]
test_auc in the outer loop 0.9902263374485596
100%|██████████| 10/10 [00:11<00:00,  1.12s/trial, best loss: -0.9815660333098757]
test_auc in the outer loop 0.9966196355085245
100%|██████████| 10/10 [00:10<00:00,  1.07s/trial, best loss: -0.9816199859254047]
test_auc in the outer loop 0.9952968841857731
100%|██████████| 10/10 [00:10<00:00,  1.06s/trial, best loss: -0.9785322073657049]
test_auc in the outer loop 0.9954438565549677
100%|██████████| 10/10 [00:11<00:00,  1.11s/trial, best loss: -0.970885198217218]
test_auc in the outer loop 0.9871399176954733
Mean AUC: 0.993
Standard Deviation: 0.0036


The result above is reassuring since we have multiple sets of hyperparameters that all performed well. The very low standard deviation in performance across different outer loops suggests that the model is robust and not highly sensitive to the choice of hyperparameters.

### Retrieve the Best Fold and Hyperparameters

The below function finds the fold that performed the best on the selected metric (test_auc or tune_auc). We can then easily retrieve the best hyperparameters from the fold. 

In [8]:
def get_best_fold(outer_loop_log, metric):
    """This function find the fold that performed the best on a selected metric
    """
    best_fold = None
    best_score = 0.0  # Initialize with a lower value for AUC
    metric_values = []  # Store metric values for all folds

    for fold, data in outer_loop_log.items():
        score = data[metric]
        metric_values.append(score)

        if score > best_score:
            best_score = score
            best_fold = fold

    std_metric = np.std(metric_values)

    return best_fold, best_score, std_metric

# Find the fold with the best performance and retrive its hyperparameters:
best_test_fold, best_test_auc, std_test_auc = get_best_fold(outer_loop_log, 'test_auc')
print("Best Test Fold:", best_test_fold)
print("Best Test AUC:", best_test_auc)
print("Std Test AUC:", std_test_auc)
print("Hyperparameters for the best fold\n", outer_loop_log[best_test_fold]['best_params'])

Best Test Fold: fold1
Best Test AUC: 0.9966196355085245
Std Test AUC: 0.003643314302245934
Hyperparameters for the best fold
 {'max_depth': 2, 'min_samples_leaf': 0.11518263089634831, 'min_samples_split': 0.17336266454096236}
