Cross-validation is a resampling technique used in machine learning to assess the performance of a model and to mitigate issues such as overfitting and selection bias. It involves partitioning the dataset into complementary subsets, performing multiple iterations of training and validation, and averaging the results.

### Types of Cross-Validation

1. K-Fold Cross-Validation
2. Leave-One-Out Cross-Validation (LOOCV)

### 1. K-Fold Cross-Validation

In k-fold cross-validation, the dataset is divided into k equal-sized folds. The model is trained k times, each time using k-1 folds for training and the remaining fold for validation. The performance metrics are then averaged over the k iterations to obtain the final evaluation.

In [1]:
from sklearn.model_selection import cross_val_score, KFold
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

# Load the iris dataset
X, y = load_iris(return_X_y=True)

# Initialize a logistic regression model
model = LogisticRegression()

# Define the number of folds
k = 5

# Initialize a k-fold cross-validation object
kf = KFold(n_splits=k, shuffle=True, random_state=42)

# Perform k-fold cross-validation
scores = cross_val_score(model, X, y, cv=kf)

# Print the cross-validation scores
print("Cross-Validation Scores:", scores)
print("Average Accuracy:", scores.mean())


Cross-Validation Scores: [1.         1.         0.93333333 0.96666667 0.96666667]
Average Accuracy: 0.9733333333333334


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


#### Code Explanation

1. Load Data: We load the iris dataset, which contains features (X) and target labels (y).
2. Initialize Model: We initialize a logistic regression model.
3. Define K-Fold Cross-Validation: We define the number of folds (k) and initialize a k-fold cross-validation object.
4. Perform Cross-Validation: We use cross_val_score to perform k-fold cross-validation on the model using the specified number of folds.
5. Print Results: We print the cross-validation scores and calculate the average accuracy across all folds.

### 2. Leave-One-Out Cross-Validation (LOOCV)

In leave-one-out cross-validation, each data point is used as the validation set exactly once, with the remaining data points used for training. This is repeated for each data point in the dataset. It is computationally expensive for large datasets but provides an unbiased estimate of model performance.

In [2]:
from sklearn.model_selection import LeaveOneOut
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np

# Generate some sample data
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([3, 7, 11, 15])

# Initialize Leave-One-Out Cross-Validation
loo = LeaveOneOut()

# Initialize an empty list to store the mean squared errors
mse_scores = []

# Iterate over the Leave-One-Out splits
for train_index, val_index in loo.split(X):
    # Split the data into training and validation sets
    X_train, X_val = X[train_index], X[val_index]
    y_train, y_val = y[train_index], y[val_index]
    
    # Initialize and train the model
    model = LinearRegression()
    model.fit(X_train, y_train)
    
    # Make predictions on the validation set
    y_pred = model.predict(X_val)
    
    # Calculate the mean squared error
    mse = mean_squared_error(y_val, y_pred)
    
    # Store the mean squared error
    mse_scores.append(mse)

# Compute the average mean squared error
avg_mse = np.mean(mse_scores)
print("Average Mean Squared Error:", avg_mse)


Average Mean Squared Error: 3.5991778800708664e-30


#### Code Explanation: 

1. Generate Sample Data: We create a sample dataset with 4 data points and 2 features.
2. Initialize Leave-One-Out Cross-Validation: We initialize the LeaveOneOut object, which generates the indices for Leave-One-Out splits.
3. Iterate over Splits: We loop over each Leave-One-Out split, where each iteration provides the indices for the training and validation sets.
4. Train and Validate Model: For each split, we train a Linear Regression model on the training set and validate it on the single data point left out.
5. Calculate Mean Squared Error: We calculate the mean squared error between the actual and predicted values for the validation set.
6. Store Mean Squared Error: We store the mean squared error for each iteration.
7. Compute Average Mean Squared Error: Finally, we compute the average mean squared error over all iterations, providing an overall evaluation of the model's performance using LOOCV.