[Leave 1 Out Cross Validation](https://www.youtube.com/watch?v=yxqcHWQKkdA)  
[Time Series Cross Validation](https://www.youtube.com/watch?v=g9iO2AwTXyI)

[GridSearchCV](https://www.youtube.com/watch?v=4Im0CT43QxY)  
[RandomSearchCV](https://www.youtube.com/watch?v=Q5dH5mOQ_ik)  
[Hyperparameter Tuning](https://www.youtube.com/watch?v=355u2bDqB7c)

In [1]:
import numpy as np
import pandas as pd

# Cross Validation

It is a statistical technique used in machine learning and statistics to assess how well a predictive model will perform on unseen data. The basic idea is to split the available dataset into multiple subsets or folds.

[Machine Learning Fundamentals: Cross Validation](https://youtu.be/fSytzGwwBVw?si=TSZXN41ZNLQPXm5U)

## Types of Cross Validation

1. **K-Fold Cross Validation**: The model is trained on k-1 folds and validated on the remaining fold. This process is repeated k times, with each fold serving as the validation set exactly once.

2. **Stratified K-Fold Cross Validation**: This technique is similar to k-fold cross-validation, but it ensures that each fold has approximately the same proportion of target classes as the original dataset. It's particularly useful for imbalanced datasets where one class is much more prevalent than the others.

3. **Leave One Out Cross Validation**: This process is repeated n times, where n is the total number of observations in the dataset. It is computationally expensive, especially for large datasets, but it provides a less biased estimate of the model's performance.

4. **Repeated K-Fold Cross Validation**: This method involves repeating k-fold cross-validation multiple times with different random splits of the data. It helps to reduce the variability in the estimated performance of the model.

5. **Shuffle Split Cross Validation**: This process is repeated multiple times. It is useful when the dataset is too large to be easily divided into folds or when you want a specific number of iterations rather than a fixed number of folds.

6. **Time Series Cross Validation**: Time series data requires a different approach to cross-validation because the order of observations matters. Techniques like forward chaining or sliding window validation are commonly used for time series data, where the model is trained on past data and validated on future data.

[Complete Guide to Cross Validation](https://youtu.be/-8s9KuNo5SA?si=rk1Ltp9_5GpqYiX_)

In [15]:
from sklearn.datasets import load_digits

digits = load_digits()

In [16]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(digits.data,
                                                    digits.target,
                                                    test_size=0.3,
                                                    random_state=2002)

In [17]:
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier

In [21]:
lr = LogisticRegression(solver='liblinear',multi_class='ovr')
lr.fit(X_train, y_train)
print(lr.score(X_test, y_test))

svm = SVC(gamma='auto')
svm.fit(X_train, y_train)
print(svm.score(X_test, y_test))

rf = RandomForestClassifier(n_estimators=40)
rf.fit(X_train, y_train)
print(rf.score(X_test, y_test))

0.9648148148148148
0.4
0.9740740740740741


**K-Fold**

In [22]:
from sklearn.model_selection import KFold

In [30]:
kf = KFold(n_splits=7,shuffle=True)
kf

KFold(n_splits=7, random_state=None, shuffle=True)

In [47]:
print(' '*25+'Train Set'+' '*34+'Test Set'+' '*5)
for train_index, test_index in kf.split(np.random.randint(-50,150,25)):
    print(train_index, test_index)

                         Train Set                                  Test Set     
[ 0  1  2  4  5  6  7  8  9 11 12 13 14 15 17 18 19 20 21 22 23] [ 3 10 16 24]
[ 0  1  2  3  4  5  7  9 10 11 13 14 15 16 17 18 19 20 21 22 24] [ 6  8 12 23]
[ 1  3  5  6  7  8  9 10 11 12 13 14 16 17 18 19 20 21 22 23 24] [ 0  2  4 15]
[ 0  1  2  3  4  5  6  8  9 10 11 12 13 15 16 17 18 19 22 23 24] [ 7 14 20 21]
[ 0  1  2  3  4  5  6  7  8  9 10 12 13 14 15 16 17 19 20 21 23 24] [11 18 22]
[ 0  2  3  4  6  7  8  9 10 11 12 13 14 15 16 17 18 20 21 22 23 24] [ 1  5 19]
[ 0  1  2  3  4  5  6  7  8 10 11 12 14 15 16 18 19 20 21 22 23 24] [ 9 13 17]


**Stratified K-Fold Cross Validation**

In [48]:
from sklearn.model_selection import cross_val_score

In [53]:
cross_val_score(RandomForestClassifier(n_estimators=37),X_train,y_train,cv=5)

array([0.9484127 , 0.94444444, 0.97211155, 0.98804781, 0.98007968])

**Leave One Out Cross Validation**

In [54]:
from sklearn.model_selection import LeaveOneOut

In [55]:
loo = LeaveOneOut()

In [67]:
for train_index, test_index in loo.split(np.random.randint(-50,150,10)):
    print(train_index, test_index)

[1 2 3 4 5 6 7 8 9] [0]
[0 2 3 4 5 6 7 8 9] [1]
[0 1 3 4 5 6 7 8 9] [2]
[0 1 2 4 5 6 7 8 9] [3]
[0 1 2 3 5 6 7 8 9] [4]
[0 1 2 3 4 6 7 8 9] [5]
[0 1 2 3 4 5 7 8 9] [6]
[0 1 2 3 4 5 6 8 9] [7]
[0 1 2 3 4 5 6 7 9] [8]
[0 1 2 3 4 5 6 7 8] [9]
