In [1]:
from IPython.core.display import HTML
HTML("""<style>

.CodeMirror pre {
font-size: 10pt;
}

div.code_cell {
font-size: 10pt;
}

div.cell { /* Tunes the space between cells */
margin-top:0em;
margin-bottom:0em;
}

div.text_cell_render h1 { /* Main titles bigger, centered */
font-size:1.7em;
line-height:1em;
text-align:left;
margin-top:-0.2em;
margin-bottom:-0.2em;
}

div.text_cell_render h2 { /*  Parts names nearer from text */
font-size:1.4em;
margin-top:-0.2em;
margin-bottom:-0.2em;
}

div.text_cell_render h3 { /*  Parts names nearer from text */
font-size:1.1em;
margin-top:-0.2em;
margin-bottom:-0.2em;
}

div.text_cell_render { /* Customize text cells */
font-size:9.5pt;
line-height:130%;
margin-top:-0.2em;
margin-bottom:-0.2em;
}
</style>""")

**Model assessment**: the process of evaluating a model's preformance. <br>
**Model selection**: the process of selecting the proper level of flexibility for a model.

# Cross-validation

## The validation set approach

<img src="images/5.2.PNG" style="float:right;margin-left:20px" width="270px">

Suppose that we would like to estimate the test error associated with fitting a particular statistical learning method on a set of observations.

**Validation set approach**: The model is fit on the training set, and the fitted model is used to predict the resonses for the oservations in the validation set. The resulting validation set error rate provides an estimate of the test error rate.

Potential drawacks:
* The validation estimate of the test error rate can be highly variable, depending on precisely which observations are included in the training set and which observations are included in the validation set.
* Only a subset of the observations are used to fit the model. $\Rightarrow$ validation set error rate may tend to overestimate the test error rate for the model fit on the entire data set.

## Leave-one-out cross-validation (LOOCV)

<img src="images/5.4.PNG" style="float:right;margin-left:20px" width="270px">

**LOOCV** involves:
1. Splitting the set of observations into two parts: a single observation $(x_1,y_1)$ is used for the validation set, and the ramaining observations $\{(x_2,y_2),...,(x_n,y_n)\}$ make up the training set.
2. Calculating the $\text{MSE}_1 = (y_1 - \hat{y}_1)^2$.
3. Repeat the previous 2 steps $n-1$ more times, each time using a different observation for validation set.

$\Rightarrow$ produce $n$ squared errors, $\text{MSE}_1,...,\text{MSE}_n$.

LOOCV's MSE: 
$\text{CV}_{(n)} = \dfrac{1}{n} \sum\limits_{i=1}^n{\text{MSE}_i}$

Advantages: less bias, no randomness involves, and can be used with any kind of
predictive modeling. <br>
Disadvantage: expensive calculation.

A shortcut makes the cost of LOOCV the same as that of a single model fit (only applied for linear models fit by least squares):

$$\text{CV}_{(n)} = \dfrac{1}{n} \sum\limits_{i=1}^n{\left(\dfrac{y_i-\hat{y}_i}{1 - h_i}\right)^2}$$

&emsp;where: $\bullet\hspace{0.2cm}$$\hat{y}_i$: the $i$th fitted value from the original least squares fit <br>
&emsp;$\hspace{1.1cm}\bullet\hspace{0.2cm}$$h_i$: the leverage 
$h_i = \dfrac{1}{n} + \dfrac{(x_i - \bar{x})^2}{\sum_{i'=1}^n{(x_{i'}-\bar{x})^2}}$
reflects the amount that an observation influences its own fit

## $k$-fold cross validation

**$k$-fold** involves:
1. Randomly divides the set of observations into $k$ groups, or folds, approximately equal size.
2. Use 1 fold as a validation set while the rest of the folds be the training set.
3. Repeat the 2 previous step $k-1$ more times, each time use a different fold for validation set.

$\Rightarrow$ produce $k$ squared errors, $\text{MSE}_1,...,\text{MSE}_k$.

$k$-fold's MSE:
$\text{CV}_{(k)} = \dfrac{1}{k} \sum\limits_{i=1}^k{\text{MSE}_i}$

Advantage: less computation time.

Comparing the flexibility of LOOCV and 10-fold CV on 3 simulated data sets:

<img src="images/5.6.PNG" width="700px">

Sometimes we are interested in the location of the minimum point in the estimated test MSE curve, which reflex the approximate flexibility level of the statistical learning method(s).

### Bias-variance trade-off for $k$-fold cross validation

LOOCV:
* gives approximately unbiased estimates of the test set error, since each training set contains $n-1$ observations, which is almost as many as the number of observations in the full data set.
* has high correlation between the sets because each set is almost identical to other sets.

$k$-fold CV:
* has an intermediate level of bias, since each training set contains $(k-1)n/k$ observations.
* has lower correlation between the sets.

$\Rightarrow$ given these considerations, one performs $k$-fold cross-validation using $k=5$ or $k=10$, as these values have been shown empirically to yield test error rate estimates that suffer neither from excessively high bias nor from very high variance.

## Cross-validation on classification problems

CV can be used for classification, but instead of MSE to quantify test error, instead use the number of misclassified observation.

LOOCV: $\text{CV}_{(n)} = \dfrac{1}{n} \sum\limits_{i=1}^n{\text{Err}_i} \hspace{1cm}$
where $\text{Err}_i = I(y_i \neq \hat{y}_i)$.

The $k$-fold CV error rate and validation set error rates are defined analogously.

# The bootstrap

Suppose that we wish to invest a fixed sum of money in 2 financial assets that yield returns of $X$ and $Y$ where both are random quantities. We will invest a fraction $\alpha$ of our money in $X$ and the remaining $1-\alpha$ will go to $Y$.

We want to minimize $\text{Var}(\alpha X + (1-\alpha)Y)$.

The value that minimizes the risk is given by:
$\alpha = \dfrac{\sigma_Y^2 - \sigma_{XY}}{\sigma_X^2 + \sigma_Y^2 - 2\sigma_{XY}}$

&emsp;where $\sigma_X^2 = \text{Var}(X)$, $\sigma_Y^2 = \text{Var}(Y)$, and $\sigma_{XY} = \text{Cov}(X,Y)$.

Since we don't know $\sigma_X^2$, $\sigma_Y^2$, and $\sigma_{XY}$ in reality, we can compute the estimates for these quantities.