# Resampling Methods

Resampling involve repeatedly drawing samples from a training set and refitting a model
of interest on each sample in order to obtain additional information about
the fitted model. Such an approach may allow us to
obtain information that would not be available from fitting the model only
once using the original training sample. Two of the most commonly
used resampling methods are **cross-validation** and the **bootstrap**.

## Cross-validation

In the absence of a very large designated test set that can be used to
directly estimate the test error rate, a number of techniques can be used
to estimate this quantity using the available training data. Here, we consider a class of methods that estimate the
test error rate by holding out a subset of the training observations from the
fitting process, and then applying the statistical learning method to those
held out observations.

### The Validation Set Approach

The
validation set approach, displayed is a very simple strategy. It involves randomly dividing the available set of observations into two parts, a training set and a validation set or hold-out set. The
model is fit on the training set, and the fitted model is used to predict the
responses for the observations in the validation set.

### Leave-One-Out Cross-Validation

Like the validation set approach, LOOCV involves splitting the set of
observations into two parts. However, instead of creating two subsets of
comparable size, a single observation $(x_1, y_1)$ is used for the validation
set, and the remaining observations ${(x_2, y_2), \dots , (x_n, y_n)}$ make up the
training set. The statistical learning method is fit on the $n − 1$ training
observations, and a prediction  $y_1$ is made for the excluded observation,
using its value $x_1$. 

Even though this is unbiased for the test error (as we are conidering a single observation), it is a poor estimate
because it is highly variable, since it is based upon a single observation
$(x_1, y_1)$.

We can repeat this process $n$ times, using different observation each time and the average of the test error rates.

### k-Fold Cross-Validation

An alternative to LOOCV is k-fold CV. This approach involves randomly
k-fold CV
dividing the set of observations into k groups, or folds, of approximately
equal size. The first fold is treated as a validation set, and the method
is fit on the remaining k − 1 folds. The mean squared error, $MSE_1$, is
then computed on the observations in the held-out fold. This procedure is
repeated k times; each time, a different group of observations is treated
as a validation set. This process results in k estimates of the test error,
$MSE_1, MSE_2, \dots , MSE_k$. The k-fold CV estimate is computed by averaging
these values.