__What is Cross Validation?__
Cross Validation is a technique which involves reserving a particular sample of a dataset on which you do not train the model. Later, you test your model on this sample before finalizing it.

Here are the steps involved in cross validation:

__Step 1:__ You reserve a sample data set.

__Step 2:__ Train the model using the remaining part of the dataset.

__Step 3:__ Use the reserve sample of the test (validation) set. This will help you in gauging the effectiveness of your model’s performance.

---

__Different Approaches for achieving Cross Validation__

__1. The validation set approach :__ In this approach, we reserve 50% of the dataset for validation and the remaining 50% for model training. However, a major disadvantage of this approach is that since we are training a model on only 50% of the dataset, there is a huge possibility that we might miss out on some interesting information about the data which will lead to a higher bias.

---

__2.Leave one out cross validation (LOOCV):__ In this approach, we reserve only one data point from the available dataset, and train the model on the rest of the data. This process iterates for each data point. This also has its own advantages and disadvantages. Let’s look at them:

    - We make use of all data points, hence the bias will be low
    - We repeat the cross validation process n times (where n is number of data points) which results in a higher execution time
    - This approach leads to higher variation in testing model effectiveness because we test against one data point. So, our estimation gets highly influenced by the data point. If the data point turns out to be an outlier, it can lead to a higher variation
    
LOOCV leaves one data point out. Similarly, you could leave p training examples out to have validation set of size p for each iteration. This is called LPOCV (Leave P Out Cross Validation)

---

__3. k-fold cross validation:__ 
From the above two validation methods, we’ve learnt:

    - We should train the model on a large portion of the dataset. Otherwise we’ll fail to read and recognise the underlying trend in the data. This will eventually result in a higher bias
    - We also need a good ratio of testing data points. As we have seen above, less amount of data points can lead to a variance error while testing the effectiveness of the model
    - We should iterate on the training and testing process multiple times. We should change the train and test dataset distribution. This helps in validating the model effectiveness properly
    


__Step 1:__ Randomly split your entire dataset into k”folds”

__Step 2:__ For each k-fold in your dataset, build your model on k – 1 folds of the dataset. Then, test the model to check the effectiveness for kth fold.

__Step 3:__ Record the error you see on each of the predictions.

__Step 4:__ Repeat this until each of the k-folds has served as the test set.

__Step 5:__ The average of your k recorded errors is called the cross-validation error and will serve as your performance metric for the model.

Remember, a lower value of k is more biased, and hence undesirable. On the other hand, a higher value of K is less biased, but can suffer from large variability. 

It is important to know that a smaller value of k always takes us towards validation set approach, whereas a higher value of k leads to LOOCV approach.

---

__4. Stratified k-fold cross validation:__ Stratification is the process of rearranging the data so as to ensure that each fold is a good representative of the whole. For example, in a binary classification problem where each class comprises of 50% of the data, it is best to arrange the data such that in every fold, each class comprises of about half the instances.

It is generally a better approach when dealing with both bias and variance. A randomly selected fold might not adequately represent the minor class, particularly in cases where there is a huge class imbalance.

Having said that, if the train set does not adequately represent the entire population, then using a stratified k-fold might not be the best idea. In such cases, one should use a simple k-fold cross validation with repetition.

In repeated cross-validation, the cross-validation procedure is repeated n times, yielding n random partitions of the original sample. The n results are again averaged (or otherwise combined) to produce a single estimation.

---

__5. Adversarial Validation:__ When dealing with real datasets, there are often cases where the test and train sets are very different. As a result, the internal cross-validation techniques might give scores that are not even in the ballpark of the test score. In such cases, adversarial validation offers an interesting solution.

The general idea is to check the degree of similarity between training and tests in terms of feature distribution. If It does not seem to be the case, we can suspect they are quite different. This intuition can be quantified by combining train and test sets, assigning 0/1 labels (0 – train, 1-test) and evaluating a binary classification task.

__Step 1:__ Remove the target variable from the train set

__Step 2:__ Create a new target variable which is 1 for each row in the train set, and 0 for each row in the test set

__Step 3:__ Combine the train and test datasets

__Step 4:__ Using the newly created target variable, fit a classification model and predict probabilities for each row to be in the test set

__Step 5:__ Sort the train set using the calculated probabilities in step 4 and take top n% samples/rows as the validation set (n% is the fraction of the train set you want to keep in the validation set)

val_set_ids will get you the ids from the train set that would constitute the validation set which is most similar to the test set. This will make your validation strategy more robust for cases where the train and test sets are highly dissimilar.

However, you must be careful while using this type of validation technique. Once the distribution of the test set changes, the validation set might no longer be a good subset to evaluate your model on.

---

__6. Cross Validation for time series:__ Splitting a time-series dataset randomly does not work because the time section of your data will be messed up. For a time series forecasting problem, we perform cross validation in the following manner.

__Step 1:__ Folds for time series cross valdiation are created in a forward chaining fashion.
 
__Step 2:__ Suppose we have a time series for yearly consumer demand for a product during a period of n years. 

We progressively select a new train and test set. We start with a train set which has a minimum number of observations needed for fitting the model. Progressively, we change our train and test sets with each fold. In most cases, 1 step forecasts might not be very important. In such instances, the forecast origin can be shifted to allow for multi-step errors to be used. For example, in a regression problem, the following code could be used for performing cross validation.

---