## $\S$ 7.3.1. Example: Bias-Variance Tradeoff

### Example specification

FIGURE 7.3 shows the bias-variance tradeoff for two simulated examples. There are 80 observations and 20 predictors, uniformly distributed in the hypercube $[0,1]^{20}$. The situations are as follows:

* _Left panels_: $Y=\begin{cases}0 &\text{ if } X_1 \le 0.5 \\ 1 &\text{ otherwise,}\end{cases}$ and we apply kNN.
* _Right panels_: $Y=\begin{cases}1 &\text{ if } \sum_{j=1}^{10} X_j \gt 5 \\ 5 &\text{ otherwise,}\end{cases}$ and we use best subset linear regression of size $p$.

* The top row is regression with squares error loss;
* the bottom row is classification with 0-1 loss.

The figures show
* the prediction error (red),
* squared bias (green), and
* variance (blue),

all computed for a large test sample.

In the regression problem, bias and variance add to produce the prediction error curve, with minima at about $k=5$ for $k$NN, and $p\ge 10$ for the linear model.

In [1]:
"""FIGURE 7.3. simulation for bias-variance tradeoff"""
%matplotlib inline
import scipy
import matplotlib.pyplot as plt

In [12]:
# Training set
size_training = 80
size_predictor = 20
train_x = scipy.rand(size_training, size_predictor)
train_x.shape

train_y1 = scipy.where(train_x[:,0] <= .5, 0, 1)
train_y2 = scipy.where(train_x[:,:10].sum(axis=1) > 5, 1, 5)
# print(train_x[:,0], train_y_cls)
# print(train_x[:,:10].sum(axis=1), train_y_rgr)

In [13]:
# kNN simulation
# kNN regression
# kNN classification

In [14]:
# Linear model simulation
# Linear regression
# Linear classification
print('Under construction ...')

Under construction ...


### Questions

For classification loss (bottom figures), some interesting phenomena can be seen. The bias and variance curves are the same as in the top figures, and prediction error now refers to misclassification rate. We see that prediction error is no longer the sum of squared bias and variance.

1. For the $k$NN classifier, prediction error decreases or stays the same as the number of neighbors is increased to 20, despite the fact that the squared bias is rising.
2. For the linear model classifier the minimum occurs for $p\ge 10$ as in regression, but the improvement over the $p=1$ model is more dramatic.

We see that bias and variance seem to interact in determining prediction error.

### Answer for Question 1

Why does this happen? There is a simple explanation for the first phenomenon.

Suppose at a given input point, the true probability of class 1 is $0.9$ while the expected value of our estimate is $0.6$. Then the squared bias -- $(0.6 - 0.9)^2$ -- is considerable, but the prediction error is zero since we make the correct decision.

In other words, estimation errors that leave us on the right side of the decision boundary don't hurt. Exercise 7.2 demonstrates this phenomenon analytically, and also shows the interaction effect between bias and variance.

### Summary

The overall point is that the bias-variance tradeoff behaves differently for 0-1 loss than it does for squared error loss. This in turn means that the best choices of tuning parameters may differ substantially in the two settings. One should base the choice of tuning parameter on an estimate of prediction error, as described in the following sections.