<a href="https://colab.research.google.com/github/NDsasuke/Autocorrelation-function-Diagnostics-and-prediction/blob/main/Diagnostics%20and%20prediction/Cross-Validation/K_Fold_Cross_Validation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. Importing the necessary libraries:

This segment imports the required libraries for the code, including `numpy` for numerical operations, `sklearn.model_selection.KFold` for K-Fold Cross-Validation, and `sklearn.linear_model.LinearRegression` for the linear regression model.


In [2]:
import numpy as np
from sklearn.model_selection import KFold
from sklearn.linear_model import LinearRegression


2. Generating example dataset:

This segment creates an example dataset `X` and target variable `y` for demonstration purposes. `X` is a 2-dimensional array representing the input features, and `y` is a 1-dimensional array representing the corresponding target values.


In [3]:
# Generate example dataset
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3], [3, 3], [3, 4], [4, 4], [4, 5]])
y = np.array([2, 3, 3, 4, 4, 5, 5, 6])


3. Creating a KFold object:

This segment sets the number of folds `k` to 5 and creates a `KFold` object named `kf`. The `shuffle=True` argument ensures that the data is shuffled before splitting, and `random_state=42` sets a specific random seed for reproducibility.


In [4]:
# Create a KFold object with k=5
k = 5
kf = KFold(n_splits=k, shuffle=True, random_state=42)


4. Initializing an empty list for scores:

This segment creates an empty list named `scores` that will be used to store the evaluation metrics (R^2 scores) for each fold.


In [5]:

# Initialize a list to store the evaluation metrics for each fold
scores = []


5. Performing K-Fold Cross-Validation:

This segment iterates over the training and test indices for each fold using the `split` method of the `KFold` object. It splits the data into training and test sets based on these indices. A `LinearRegression` model is then trained on the training data using `fit`, and its performance is evaluated on the test data using `score` (R^2 score in this case). The obtained score is appended to the `scores` list.


In [6]:
# Perform K-Fold Cross-Validation
for train_index, test_index in kf.split(X):
    # Split the data into training and test sets for the current fold
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    # Train the model on the training data
    model = LinearRegression()
    model.fit(X_train, y_train)

    # Evaluate the model on the test data and store the score
    score = model.score(X_test, y_test)
    scores.append(score)




6. Printing the evaluation metrics for each fold:

This segment iterates over the `scores` list using `enumerate` to retrieve both the fold index and the corresponding score. It then prints the fold index and the R^2 score for each fold.


In [7]:
# Print the evaluation metrics for each fold
for fold, score in enumerate(scores):
    print(f"Fold {fold+1}: R^2 Score = {score}")

Fold 1: R^2 Score = 1.0
Fold 2: R^2 Score = 1.0
Fold 3: R^2 Score = 1.0
Fold 4: R^2 Score = nan
Fold 5: R^2 Score = nan



7. Calculating and printing the average score:

This segment calculates the average R^2 score across all folds using `np.mean` on the `scores` list and stores it in the variable `average_score`. It then prints the average score.

These code segments together implement K-Fold Cross-Validation using scikit-learn library and provide a detailed explanation of the process and its advantages.

In [8]:
# Calculate and print the average score across all folds
average_score = np.mean(scores)
print(f"\nAverage R^2 Score: {average_score}")



Average R^2 Score: nan
