## k-Fold Cross-Validation
* Cross-validation is a resampling procedure used to evaluate machine learning models on a limited data sample.
* The procedure has a single parameter called k that refers to the number of groups that a given data sample is to be split into. As such, the procedure is often called k-fold cross-validation. When a specific value for k is chosen, it may be used in place of k in the reference to the model, such as k=10 becoming 10-fold cross-validation.
* Cross-validation is primarily used in applied machine learning to estimate the skill of a machine learning model on unseen data. That is, to use a limited sample in order to estimate how the model is expected to perform in general when used to make predictions on data not used during the training of the model
* It is a popular method because it is simple to understand and because it generally results in a less biased or less optimistic estimate of the model skill than other methods, such as a simple train/test split.

## The general procedure is as follows:
* Shuffle the dataset randomly.
* Split the dataset into k groups
* For each unique group:
* Take the group as a hold out or test data set
* Take the remaining groups as a training data set
* Fit a model on the training set and evaluate it on the test set
* Retain the evaluation score and discard the model
* Summarize the skill of the model using the sample of model evaluation scores

## Configuration of k
* The k value must be chosen carefully for your data sample.
* A poorly chosen value for k may result in a mis-representative idea of the skill of the model, such as a score with a high variance (that may change a lot based on the data used to fit the model), or a high bias, (such as an overestimate of the skill of the model).

### Three common tactics for choosing a value for k are as follows:

* Representative: The value for k is chosen such that each train/test group of data samples is large enough to be statistically representative of the broader dataset.
* k=10: The value for k is fixed to 10, a value that has been found through experimentation to generally result in a model skill estimate with low bias a modest variance.
* k=n: The value for k is fixed to n, where n is the size of the dataset to give each test sample an opportunity to be used in the hold out dataset. This approach is called leave-one-out cross-validation

## NOTE :-A value of k=10 is very common in the field of applied machine learning, and is recommend if you are struggling to choose a value for your dataset.

## scikit-learn k-fold cross-validation

In [1]:
import numpy as np
from sklearn.model_selection import KFold

#### Data sample

In [2]:
data=np.array([1,2,3,4,5,6,7,8,9,10])

#### Prepare for cross validation

In [3]:
kfold=KFold(n_splits =3,shuffle =True,random_state =1)

#### enumerate splits

In [5]:
for train,test in kfold.split(data):
    print("train :%s, test :%s"%(data[train],data[test]))

train :[1 2 4 6 8 9], test :[ 3  5  7 10]
train :[ 3  5  6  7  8  9 10], test :[1 2 4]
train :[ 1  2  3  4  5  7 10], test :[6 8 9]


## Variations on Cross-Validation
### There are a number of variations on the k-fold cross validation procedure.

### Three commonly used variations are as follows:

* Train/Test Split: Taken to one extreme, k may be set to 2 (not 1) such that a single train/test split is created to evaluate the model.
* LOOCV: Taken to another extreme, k may be set to the total number of observations in the dataset such that each observation is given a chance to be the held out of the dataset. This is called leave-one-out cross-validation, or LOOCV for short.
* Stratified: The splitting of data into folds may be governed by criteria such as ensuring that each fold has the same proportion of observations with a given categorical value, such as the class outcome value. This is called stratified cross-validation.
* Repeated: This is where the k-fold cross-validation procedure is repeated n times, where importantly, the data sample is shuffled prior to each repetition, which results in a different split of the sample.
* Nested: This is where k-fold cross-validation is performed within each fold of cross-validation, often to perform hyperparameter tuning during model evaluation. This is called nested cross-validation or double cross-validation