<a href="https://colab.research.google.com/github/NDsasuke/Autocorrelation-function-Diagnostics-and-prediction/blob/main/Diagnostics%20and%20prediction/Cross-Validation/Nested_Cross_Validation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. **Importing the necessary libraries**: This segment imports the required libraries for the code, including `numpy` for numerical operations, `cross_val_score`, `GridSearchCV`, and `KFold` from `sklearn.model_selection` for cross-validation and parameter tuning, `LogisticRegression` from `sklearn.linear_model` for logistic regression, and `load_iris` from `sklearn.datasets` to load the Iris dataset.


In [None]:
import numpy as np
from sklearn.model_selection import cross_val_score, GridSearchCV, KFold
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris


2. **Loading the Iris dataset**: This segment loads the Iris dataset using the `load_iris` function and assigns the features to `X` and the target variable to `y`.


In [None]:
# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target


3. **Creating the outer cross-validation object**: Here, an outer cross-validation object `outer_cv` is created using `KFold` with `n_splits=5`, indicating that we will perform 5-fold cross-validation. The dataset will be shuffled before splitting, and a random seed of 42 is set for reproducibility.


In [None]:
# Create an outer cross-validation object
outer_cv = KFold(n_splits=5, shuffle=True, random_state=42)



4. **Defining the parameter grid for inner cross-validation**: In this segment, the parameter grid `param_grid` is defined for the inner cross-validation. It specifies the values to be tested for the `C` parameter of the logistic regression model.


In [None]:
# Define the parameter grid for inner cross-validation
param_grid = {'C': [0.1, 1, 10]}



5. **Performing nested cross-validation**: This segment initiates the nested cross-validation process. It loops over the outer cross-validation splits and splits the data into outer train and test sets using the indices generated by `outer_cv.split(X)`.



6. **Creating the inner cross-validation object**: Here, an inner cross-validation object `inner_cv` is created using `KFold` with `n_splits=3`, indicating that we will perform 3-fold cross-validation for the inner loop. The dataset will be shuffled before splitting, and a random seed of 42 is set for reproducibility.



7. **Performing grid search on the inner cross-validation**: In this segment, a grid search is performed using `GridSearchCV` on the inner cross-validation loop. It uses logistic regression as the estimator and `param_grid` as the parameter grid. The best model is selected based on the performance measured by cross-validation.



8. **Evaluating the best model on the outer test set**: After the grid search is completed, the best model obtained from the inner loop is evaluated on the outer test set. The accuracy score is computed using `score` method on the `X_test` and `y_test` data.


In [None]:
# Perform nested cross-validation
nested_scores = []
for train_index, test_index in outer_cv.split(X):
    # Split the data into outer train and test sets
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Create an inner cross-validation object
    inner_cv = KFold(n_splits=3, shuffle=True, random_state=42)

    # Perform grid search on the inner cross-validation
    grid_search = GridSearchCV(estimator=LogisticRegression(solver='liblinear', max_iter=1000), param_grid=param_grid, cv=inner_cv)
    grid_search.fit(X_train, y_train)

    # Evaluate the best model on the outer test set
    nested_score = grid_search.score(X_test, y_test)
    nested_scores.append(nested_score)



9. **Calculating the average performance of the nested cross-validation**: This segment calculates the average performance of the nested cross-validation by taking the mean of the scores obtained in each iteration.


In [None]:
# Calculate the average performance of the nested cross-validation
average_score = np.mean(nested_scores)


10. **Printing the average performance**: Finally, the average performance of the nested cross-validation is printed to the console.


In [3]:
# Print the average performance
print(f"Average Performance: {average_score}")


Average Performance: 0.9666666666666668
