# On <i>Introduction to Bootstrap</i>

## Background
Suppose $X_1,...,X_n = \vec{X}$ is an independently and identically distributed (iid) random sample of a population with mean $\mu$ and variance $\sigma^2$. The collection of possible sample means arising from sampling is a random variable $\bar{X}$, and the standard error of the sample mean is
$$\epsilon[\bar{X}] = \sqrt{\mathbb{V}[\bar{X}]}=\frac{\sigma}{\sqrt{n}}$$

Which measures the dispersion of sample means about the population mean $\mu$. Extend this definition to estimate the standard error of $\vec{X}$
$$\hat{\epsilon}[\bar{X}] = \frac{s}{\sqrt{n}}$$

Where the sample variance $s^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2$. Since the sample mean $\bar{X}$ can be treated as a random variable, and by the central limit theorem
$$\bar{X} \sim \mathcal{N}\left(\mu,\frac{\sigma}{\sqrt{n}}\right)$$

Such an approximation of the distribution of $\bar{X}$ implies $\bar{X} \in [-2\sigma,2\sigma]$ 95% of the time.

## The Bootstrap Principle
### Empirical Distribution
In the absence of any other information about a distribution, the observed sample $\vec{X}$ contains all available information about the underlying distribution. Resampling the existing sample informs further sampling of the underlying distribution.

For the sample $\vec{X}$, a parameter of interest $\theta$ is estimated by $\hat{\theta}=s(\vec{X})$. Define probability distribution 
$$\hat{P}(A) = \frac{1}{n}\sum_{i=1}^n \mathbb{I}_A(X_i) \ \ \mathrm{for} \ \ A \subseteq \mathbb{R}$$

Where $\hat{P}$ is the empirical distribution of the sample $\vec{X}$, weighting each observation by $\frac{1}{n}$. This distribution $\hat{P}$ is the maximum likelihood estimator (and the KL divergence minimizer), consistent with the statement that, absent any additional information, $\hat{P}$ is the best estimate of $P$ and
$$ \mathrm{lim}_{n \rightarrow \infty}\hat{P}(A) = P(A)$$

Which follows from the law of large numbers.

### Resampling
Suppose the iid sample $X^*$ is drawn (resampled) from the empirical distribution $\hat{P}$. As the weight of each sample is $\frac{1}{n}$, $X_i$ from the original sample $\vec{X}$ is selected, $\hat{P}(X_i) = \frac{1}{n}$. Sample $\vec{X}$ with replacement for iid draws ${i_1,...,i_n} \sim \mathcal{U}(1,n)$ and index the new sample $X^*$ by $j$
$$X^*_j = X_{i_j} \ \ \mathrm{s.t.} \ \ X^* = X_1^*,...,X_n^*$$

## Method

For a random sample of distribution $P$, $\vec{X} = X_1,...,X_n$ with parameter of interest $\theta = t(P)$ and estimator $\hat{\theta} = s(\vec{X})$, simulate the sampling process $\hat{P} \sim P$ such that
$$X^*=X_1^*,...,X_n^*$$
Sampled from $\hat{P}$ with $\theta^*=t(\hat{P})$ and $\hat{\theta^*} = s(X^*)$. Then, the sampling distribution of $\hat{\theta}$ is estimated by
$$\hat{\mathbb{P}}(\hat{\theta} \in A) = \mathbb{P}^*(\hat{\theta}^* \in A)$$
In practice, this resampling is computed using Monte Carlo methods. For $B$ independent bootstrap samples $X^{*(1)},...,X^{*(n)}$ from $\hat{P}$
$$X_1^{*(b)},...,X^{*(b)}_n \ \ b = 1,...,B$$
The parameter $\hat{\theta}^{*(b)} = s(X^{*(b)})$ is computed from the replications and the sampling distribution of $\hat{\theta}$ can be estimated from the bootstrapped distributions
$$\hat{\mathbb{P}}\left(\hat{\theta} \in A\right) = \frac{1}{B}\sum_{b=1}^B\mathbb{I}\left(\hat{\theta}^{*(b)}\right)$$