### Different Cross Validation Techniques

Stratified Cross-Validation

It is a variation of K-fold cross validation that ensures each fold has the same proportion of classes as the original dataset. It is particularly important when dealing with imbalanced datasets where some classes are more frequent than others.

For example: If the original dataset have dataset samples with 70% class A and 30% class B, then in stratified Kfold each fold will also have the same proportion of 70% class A and 30% class B

In [1]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import StratifiedKFold, cross_val_score
from sklearn.ensemble import RandomForestClassifier

In [2]:
# Load the dataset
data = load_breast_cancer()

X, y = data.data, data.target

In [3]:
# Define the model
rf_model = RandomForestClassifier()

In [None]:
# Use StratifiedKFold with 5 splits
stratified_kf = StratifiedKFold(n_splits=5)

In [5]:
scores = cross_val_score(rf_model, X, y, cv=stratified_kf)

print("StratifiedCV Scores: ", scores)
print("Mean Score: ", scores.mean())

StratifiedCV Scores:  [0.92105263 0.94736842 0.98245614 0.98245614 0.98230088]
Mean Score:  0.9631268436578171


Group KFold:

It is used when you have groups in your data, and you want to ensure that the same group is not split across different folds. 


In [6]:
from sklearn.datasets import load_iris
from sklearn.model_selection import GroupKFold, cross_val_score
import numpy as np

In [7]:
# Load the dataset
data = load_iris()
X, y = data.data, data.target

In [8]:
# Create artificial groups
# Let's say every 10 consecutive samples belong to the same "group"
groups = np.array([i // 10 for i in range(len(y))])

In [9]:
model = RandomForestClassifier()

In [10]:
group_kfold = GroupKFold(n_splits=5)

# Perform cross-validation
scores = cross_val_score(model, X, y, groups=groups, cv=group_kfold)

# Results
print("Cross-Validation Scores: ", scores)
print("Mean Score: ", scores.mean())

Cross-Validation Scores:  [1.         0.9        0.9        0.96666667 0.96666667]
Mean Score:  0.9466666666666667
