# **Cross Validation:** Model Evaluation
Cross-validation is a technique for evaluating ML models by training several ML models on subsets of the available input data<br>
and evaluating them on the complementary subset of the data. Use cross-validation to detect overfitting, ie, failing to generalize a pattern

In [28]:
import numpy as np
import pandas as pd
import seaborn as sns

from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

In [2]:
df = sns.load_dataset("iris")
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [24]:
X = df.iloc[:, :2]
y = df.iloc[:, -1]

## Model Evaluation using Train Test Split

In [26]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
KNN = KNeighborsClassifier()
KNN.fit(X_train, y_train)

y_pred = KNN.predict(X_test)
score = accuracy_score(y_test, y_pred)
print(f"The accuracy score is: {score}")

The accuracy score is: 0.8


---
## Model Evaluation using **Cross Validation**

### Cross Val Score uses:
* Model
* X Feature(s)
* y Target
* CV (StratifiedFolds by default)
* Scoring

In [42]:
# Multiple Scores of Models depending on CV

from sklearn.model_selection import cross_val_score
KNN = KNeighborsClassifier()
cv = cross_val_score(KNN, X, y, cv=10, scoring="accuracy")
np.mean(cv)

0.76

---
## Model Evaluation using **K-Fold Cross Validation**

### KFold Cross Val Score uses:
* KFolds
* Model
* X Feature(s)
* y Target
* CV (set to KFolds)
* Scoring

In [43]:
# Multiple Scores of Models depending on CV

from sklearn.model_selection import cross_val_score, KFold

KFold = KFold(n_splits=10)
KNN = KNeighborsClassifier()
cv = cross_val_score(KNN, X, y, cv=KFold, scoring="accuracy")
np.mean(cv)

0.6866666666666666

---
## Model Evaluation using **Repeated K-Fold Cross Validation**

### Repeated KFold Cross Val Score uses:
* Repeated KFolds
* Model
* X Feature(s)
* y Target
* CV (set to Repeated KFolds)
* Scoring

In [44]:
# Multiple Scores of Models depending on CV

from sklearn.model_selection import cross_val_score, RepeatedKFold

RKFold = RepeatedKFold(n_splits=10, n_repeats=3, random_state=1)
KNN = KNeighborsClassifier()
cv = cross_val_score(KNN, X, y, cv=RKFold, scoring="accuracy")
np.mean(cv)

0.7600000000000001

---
## Model Evaluation using **Stratified K-Fold Cross Validation**

### Stratified KFold Cross Val Score uses:
* StratifiedKFolds
* Model
* X Feature(s)
* y Target
* CV (set to StratifiedKFolds)
* Scoring

In [45]:
# Multiple Scores of Models depending on CV

from sklearn.model_selection import cross_val_score, StratifiedKFold

SKFold = StratifiedKFold(n_splits=10)
KNN = KNeighborsClassifier()
cv = cross_val_score(KNN, X, y, cv=SKFold, scoring="accuracy")
np.mean(cv)

0.76

---
## Model Evaluation using **Repeated Stratified K-Fold Cross Validation**

### Repeated Stratified KFold Cross Val Score uses:
* Repeated StratifiedKFolds
* Model
* X Feature(s)
* y Target
* CV (set to Repeated StratifiedKFolds)
* Scoring

In [46]:
# Multiple Scores of Models depending on CV

from sklearn.model_selection import cross_val_score, RepeatedStratifiedKFold

RSKFold = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
KNN = KNeighborsClassifier()
cv = cross_val_score(KNN, X, y, cv=RSKFold, scoring="accuracy")
np.mean(cv)

0.78

| Technique | Usefulness |
| --- | --- |
| `cross_val_score` | A function that can use different cross-validation strategies to evaluate model performance on unseen data. |
| `kfold` | A simple cross-validation strategy that can be used when data is independent and identically distributed. |
| `RepeatedKFold` | A variation of k-fold cross-validation that can be used to reduce the variance of the evaluation metric. |
| `StratifiedKFold` | A cross-validation strategy that is useful when dealing with imbalanced datasets. |
| `RepeatedStratifiedKFold` | A variation of StratifiedKFold that repeats the stratified k-fold process n times to improve evaluation performance. |
| `StratifiedGroupKFold` | A variation of StratifiedKFold that takes into account a grouping factor for the data, useful when the samples are not independent and identically distributed. |
