<a href="https://colab.research.google.com/github/NDsasuke/Autocorrelation-function-Diagnostics-and-prediction/blob/main/Diagnostics%20and%20prediction/Cross-Validation/Stratified_Cross_Validation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


1. Importing the necessary libraries:

This segment imports the required libraries for the code, including `numpy` for numerical operations, `sklearn.model_selection.StratifiedKFold` for Stratified Cross-Validation, `sklearn.linear_model.LogisticRegression` for the logistic regression model, and `sklearn.datasets.load_iris` to load the Iris dataset for demonstration purposes.


In [3]:
import numpy as np
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris


2. Loading the Iris dataset:

This segment loads the Iris dataset using `load_iris()` from `sklearn.datasets`. It assigns the input features to `X` and the corresponding target values to `y`.


In [4]:
# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target



3. Creating a StratifiedKFold object:

This segment sets the number of folds `k` to 5 and creates a `StratifiedKFold` object named `skf`. The `shuffle=True` argument ensures that the data is shuffled before splitting, and `random_state=42` sets a specific random seed for reproducibility.


In [5]:
# Create a StratifiedKFold object with k=5
k = 5
skf = StratifiedKFold(n_splits=k, shuffle=True, random_state=42)



4. Initializing an empty list for scores:

This segment creates an empty list named `scores` that will be used to store the evaluation metrics (accuracy scores) for each fold.


In [6]:
# Initialize a list to store the evaluation metrics for each fold
scores = []



5. Performing Stratified Cross-Validation:

This segment iterates over the training and test indices for each fold using the `split` method of the `StratifiedKFold` object. It splits the data into training and test sets based on these indices. A `LogisticRegression` model is then trained on the training data using `fit`, with `max_iter` set to 1000 to allow for a higher number of iterations. The performance of the model is evaluated on the test data using `score` (accuracy score in this case). The obtained score is appended to the `scores` list.


In [7]:
# Perform Stratified Cross-Validation
for train_index, test_index in skf.split(X, y):
    # Split the data into training and test sets for the current fold
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Train the model on the training data
    model = LogisticRegression(max_iter=1000)
    model.fit(X_train, y_train)

    # Evaluate the model on the test data and store the score
    score = model.score(X_test, y_test)
    scores.append(score)



6. Printing the evaluation metrics for each fold:

This segment iterates over the `scores` list using `enumerate` to retrieve both the fold index and the corresponding score. It then prints the fold index and the accuracy score for each fold.


In [8]:
# Print the evaluation metrics for each fold
for fold, score in enumerate(scores):
    print(f"Fold {fold+1}: Accuracy = {score}")


Fold 1: Accuracy = 1.0
Fold 2: Accuracy = 0.9666666666666667
Fold 3: Accuracy = 0.9333333333333333
Fold 4: Accuracy = 1.0
Fold 5: Accuracy = 0.9333333333333333



7. Calculating and printing the average score:

This segment calculates the average accuracy score across all folds using `np.mean` on the `scores` list and stores it in the variable `average_score`. It then prints the average score.

By increasing the `max_iter` parameter of `LogisticRegression`, the code ensures that the logistic regression model converges successfully during training, addressing the `ConvergenceWarning` and providing reliable evaluation metrics for each fold.

In [9]:
# Calculate and print the average score across all folds
average_score = np.mean(scores)
print(f"\nAverage Accuracy: {average_score}")



Average Accuracy: 0.9666666666666668
