:nosearch:

# Selected Solutions

## [Predictive Modeling](predictive)

```{solution-start} CV-selection
:class: dropdown
```

Only Lisa uses cross-validation correctly. She tunes models and hyperparameters using 5-fold CV on the training data, picks the winner by the cross-validated score, then evaluates that single, refit model once on the untouched test set to obtain an out-of-sample estimate. This approach keeps the test set completely separate from model selection, so the final performance estimate is not biased by the selection process.

Bart misuses the test set by selecting the winning model on it after tuning via CV. We already know that we should not tune a specific model using the test set and choosing between models (eg linear regression vs neural network) on the basis of the test set is a similar mistake. This leaks information from the test set into the selection process, which can lead to optimistic bias because the chosen model may perform well on the test set partly due to chance. Once the test set is used for selection, it is no longer a valid measure of true generalization performance. 

A better alternative: if you do not want to hold out a single test split,  use nested cross-validation. An inner CV loop is used for tuning and an outer CV loop is used for unbiased performance estimation. This prevents the kind of bias caused by using the test set in the selection process.

```{solution-end}
```

```{solution-start} james-stein
:class: dropdown
```
This is a case of the James-Stein estimator. 

$$\text{MSE}(\alpha) = \mathbb{E}[\|\theta - (1-\alpha)Y\|^2]$$

Since $Y \sim N(\theta, I_3)$, we can write $Y = \theta + Z$ where $Z \sim N(0, I_3)$. Substituting:

$$\text{MSE}(\alpha) = \mathbb{E}[\|\theta - (1-\alpha)(\theta + Z)\|^2]$$
$$= \mathbb{E}[\|\theta - (1-\alpha)\theta - (1-\alpha)Z\|^2]$$
$$= \mathbb{E}[\|\alpha\theta - (1-\alpha)Z\|^2]$$

Expanding the squared norm:
$$\text{MSE}(\alpha) = \mathbb{E}[\|\alpha\theta\|^2 - 2\langle\alpha\theta, (1-\alpha)Z\rangle + \|(1-\alpha)Z\|^2]$$

Since $\theta$ is fixed and $\mathbb{E}[Z] = 0$,

$$\text{MSE}(\alpha) = \alpha^2\|\theta\|^2 - 2\alpha(1-\alpha)\theta^T\mathbb{E}[Z] + (1-\alpha)^2\mathbb{E}[\|Z\|^2]$$
$$= \alpha^2\|\theta\|^2 + (1-\alpha)^2\mathbb{E}[\|Z\|^2]$$

For $Z \sim N(0, I_3)$, we have $\mathbb{E}[\|Z\|^2] = \mathbb{E}[Z_1^2 + Z_2^2 + Z_3^2] = 1 + 1 + 1 = 3$.

Therefore $\text{MSE}(\alpha) = \alpha^2\|\theta\|^2 + 3(1-\alpha)^2$.

Clearly the optimal $\alpha$ is neither 0 nor 1. The MSE-optimal estimator is:
$$\theta_{\alpha^*} = (1 - \alpha^*)Y = \left(1 - \frac{3}{\|\theta\|^2 + 3}\right)Y = \frac{\|\theta\|^2}{\|\theta\|^2 + 3}Y$$

The MSE-optimal estimator is biased if $\theta$ is nonzero.

```{solution-end}
```