# Chapter 8 Notes

## 8.2 The Bootstrap and Maximum Likelihood Methods

### 8.2.1 A Smoothing Example

Suppose that we have one dimensional data $\mathbb{Z}=\lbrace z_1, \ldots, z_N\rbrace$ where $z_i=(x_i, y_i)$.
We might fit a curve to this using cubic splines with knots at quantiles of the $x$-values.
Since this is just a linear model we can generate (pointwise) confidence bands around our estimate (see §3.2 for details).

We can also generate confidence bands using bootstrapping.
We consider two different versions.

**Non-parametric bootstrap:**
Draw bootstrap samples with replacement from the $z_i$.
For each boostrap sample fit a cubic spline to the data.
Generate a confidence interval at each $x$ by taking percentiles of bootstrap splines evaluated at the point.

*Note:* There is a decision here whether to use the original knots or generate new knots using the bootstrapped $x$-values.
This choice depends on whether we want to capture the variability from the position of the knots as well as the noise in the targets.
The flexibility of the method means that we could even use more complex methods for choosing the number and position of the knots.

**Parametric boostrap:**
Fit cubic splines to the original training data and then simulate new responses by adding Gaussian noise to the predicted values.
That is, if $\mu(x)$ is our spline function, sample $y_i^* \sim \mathcal{N}(\mu(x_i), \hat{\sigma}^2)$, where $\hat{\sigma}$ is the sample variance of the $y_i$. We then construct our boostrapped splines by fitting to the new $(x_i, y_i^*)$. Again, we can construct pointwise confidence intervals by taking percentiles of the bootstrapped splines.

In this case, the confidence intervals agree with those generated through the least squares estimate at the start.
In general the parametric boostrap agrees with maximum likelihood.

### 8.2.2 Maximum Likelihood Inference

If we specify a parametric model for $Z$ via a probability density function $z \sim g_{\theta}(z)$ then we can calculate the maximum likelihood estimator $\hat{\theta}$ for the parameters $\theta$.
The sampling distribution of the MLE has a limiting Normal distribution:

\begin{equation}
    \hat{\theta} \longrightarrow \mathcal{N}\left( \theta_0, \mathbf{i}(\theta_0)^{-1}\right),
\end{equation}

where $\mathbf{i}(\theta)$ is the Fisher information and $\theta_0$ is the true parameter value.
Using $\hat{\theta}$ to approximate $\hat{theta}_0$, this allows us to construct approximate confidence intervals around the MLE, which we can then use to get pointwise confidence intervals.

## 8.3 Bayesian Methods

We can alternatively generate Bayesian credible intervals by putting priors on the spline coefficients (and variance).
The text shows that if the spline coefficients are uncorrelated in prior, then the Bayesian intervals approach the parametric boostrap and maximum likelihood as the variance of the priors increase; that is, as they tend towards a non-informative prior.

## 8.4 Relationship Between the Bootstrap and Bayesian Inference

They show that for a Bernoulli sample with $\alpha$ successes and $\beta$ failures, the Bayesian posterior $\text{Beta}(\alpha, \beta)$ that results from an  uninformative prior is almost the same as the boostrap distribution of the estimator for the true success probability.
This generalises to a Multinomial distribution using the Dirichlet distribution instead of the Beta distribution.

In this sense the bootstrap distribution represents an approximate non-parametric, uninformative posterior distribution for our parameter.

