**Popular Cross Validation Techniques**
- Tuning and validating ML models on a single validation set is often misleading (such as single train - test split), as they are easy to implement but have optimistic results. The lucky random split leads to model perform exceptionally well on validation set but poorly on unseen or new data

- CV involves repeatedly partitioning data into subsets, training models on few subset and validating on remaning. it provides more robust and unbiased estimate of model performance

**Leave-One-Out Cross-Validation**
- Leave one data point for validation.
- Train the model on the remaining data points.
- Repeat for all points.
- This is practically infeasible when you have tons of data points. This is because number of models is equal to number of data points.
- We can extend this to Leave-p-Out Cross-Validation, where, in each iteration, p observations are reserved for validation and the rest are used for training.

**K-Fold Cross-Validation**
- Split data into k equally-sized subsets.
- Select one subset for validation.
- Train the model on the remaining subsets.
- Repeat for all subsets

**Rolling Cross-Validation**
- Mostly used for data with temporal structure.
- Data splitting respects the temporal order, using a fixed-size training window.
- The model is evaluated on the subsequent window

**Blocked Cross-Validation**
- Another common technique for time-series data.
- In contrast to rolling cross-validation, the slice of data is intentionally kept short if the variance does not change appreciably from one window to the next.
- This also saves computation over rolling cross-validation

**Stratified Cross-Validatio**n
-The above techniques may not work for imbalanced datasets. Thus, this technique is mostly used for preserving the class distribution. The partitioning ensures that the class distribution is preserved.

In [1]:
from sklearn.model_selection import cross_val_score, KFold
from sklearn.linear_model import LinearRegression
import numpy as np

# Generate synthetic data
X = np.random.rand(100, 2)
y = 3 * X[:, 0] + 2 * X[:, 1] + 0.5 * np.random.randn(100)

# Generate a linear regression model
model = LinearRegression()

# Perform K-fold Cross Validation
kf = KFold(n_splits=5, shuffle=True, random_state=42)
cv_scores = cross_val_score(model, X, y, cv=kf)

print("Cross Validation Scores:", cv_scores)
print("Mean CV Score:", np.mean(cv_scores))


Cross Validation Scores: [0.86119068 0.81955271 0.56128282 0.72947638 0.85602746]
Mean CV Score: 0.7655060094970405


#### Stratified K-Fold Cross Validation

In [2]:
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

# Generate synthetic imbalance data
X, y = make_classification(n_samples=1000, n_features=20, n_informative=2,
                           n_redundant=10, n_clusters_per_class=1, weights=[0.99],
                           flip_y=0, random_state=42)

# Create a random forest classifier
model = RandomForestClassifier()

# perform stratified k fold cross-validation(k=5)
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
cv_score = cross_val_score(model, X, y, cv=skf)

print("Cross-Validation Scores:", cv_scores)
print("Mean CV Score:", np.mean(cv_scores))

Cross-Validation Scores: [0.86119068 0.81955271 0.56128282 0.72947638 0.85602746]
Mean CV Score: 0.7655060094970405


####  Leave-One-Out Cross-Validation (LOOCV):


In [3]:
from sklearn.model_selection import cross_val_score, LeaveOneOut
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

# Load the iris dataset
iris = load_iris()
X, y = iris.data, iris.target

# Creata a logistic regression model
model = LogisticRegression()

# Perform the LOO CV
loo = LeaveOneOut()
cv_scores = cross_val_score(model, X, y, cv=loo)

print("Cross-Validation Scores:", cv_scores)
print("Mean CV Score:", np.mean(cv_scores))

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver opt

Cross-Validation Scores: [1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1.
 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0.
 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
 1. 1. 1. 1. 1. 1.]
Mean CV Score: 0.9666666666666667


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
