Bootstrap methods are procedures for the empirical estimation or approximation of sampling distributions and their characteristics. Their primary use lies in estimating the accuracy of parameter estimators, and in constructing confidence sets or hypothesis tests. They are applied in circumstances where the form of the population from which the observed data was drawn is unknown.

One use of the bootstrap method is to estimate the standard error of certain statistics derived from a sample of IID random variables.

Let $X = (X_1, \dots, X_n)$ be an IID sample on some sample space $\Omega$, drawn from a distribution $F$, and let $T(X)$ be a statistic of interest. The standard error of $T$ is
\begin{equation}
    \sigma(T; F) = \sqrt{Var_F T(X)}.
\end{equation}
The non-parametric bootstrap estimate of the standard error is
\begin{equation}
    \sigma(T; \hat{F}) = \sqrt{Var_{\hat{F}} T(Y)},
\end{equation}
where $Y$ is an IID sample of size $n$ drawn from the empirical distribution
\begin{equation}
    \hat{F}(A) = \frac{1}{n} \sum_{i=1}^n \mathbb{1}[X_i \in A] \quad \text{for} \quad A \subset \Omega.
\end{equation}
Consider drawing a single value $Y_j$ from the distribution $\hat{F}$. The probability that this drawn value is equal to any of the original sample points $X_i$ is
\begin{equation}
    \Pr(Y_j = X_i) = \frac{1}{n}, \quad \text{for} \quad i = 1, \dots, n.
\end{equation}
Now, consider the process of drawing a single value at random from the original sample $X = (X_1, \dots, X_n)$ with replacement. There are $n$ items to choose from, and each has an equal chance of being selected. The probability of picking any specific element $X_i$ is also $1/n$. Thus, an IID sample $Y$ of size $n$ from the empirical distribution $\hat{F}$ is precisely the same as a random sample of size $n$ drawn with replacement from the original sample $X$.


---

The reasonableness of the bootstrap estimate $\sigma(T; \hat{F})$ relies on the plug-in principle. Since the true distribution $F$ is unknown, we use the empirical distribution $\hat{F}$, and assume that the relationship between $\hat{F}$ and $Y$ mimics the relationship between $F$ and $X$.

1.  Bias
    *   The bootstrap estimate $\sigma(T; \hat{F})$ is generally a biased estimator for the true standard error $\sigma(T; F)$ as it tends to slightly underestimate the true variance. The bootstrap variance of the sample mean is $\frac{n-1}{n}$ times the usual unbiased sample variance.
    *   In the simulation phase, the estimator $\hat{\sigma}_B$ uses a factor of $\frac{1}{B-1}$. While this makes the sample variance of the bootstrap replicates unbiased relative to the bootstrap distribution, it does not eliminate the underlying plug-in bias relative to the true population.

2.  Variance
    *   The estimate $\hat{\sigma}_B$ is a numerical approximation of the theoretical bootstrap standard error $\sigma(T; \hat{F})$. Since we only draw a finite number of bootstrap samples $B$, there is ergodic variance. *  By the LLN, as $B \to \infty$, this simulation error goes to zero, and $\hat{\sigma}_B$ converges to $\sigma(T; \hat{F})$. A larger $B$ will ensure this variance is negligible compared to the standard error.

3.  Consistency
    *   The bootstrap is reasonable because, according to the Glivenko-Cantelli theorem, $\hat{F}$ converges to $F$ as $n \to \infty$. However, it can fail if the statistic $T$ is not smooth (e.g., the maximum or minimum of a sample) or if the underlying population distribution has infinite variance.

Often there will be no simple expression for $\sigma(T; \hat{F})$. It is, however, simple to estimate it numerically by means of simulation. The algorithm proceeds in three steps:
1.  Draw a large number $B$ of independent bootstrap samples $Y_1, \dots, Y_B$.
2.  For each bootstrap sample, evaluate the statistic $T(Y_b)$.
3.  Calculate the sample standard deviation of the $T(Y_b)$ values:
\begin{equation}
    \hat\sigma_B = \sqrt{\frac{1}{B-1} \sum_{b=1}^B (T(Y_b) âˆ’ \bar{T})^2}, \quad \text{where} \quad \bar{T} = \frac{1}{B} \sum_{b=1}^B T(Y_b).
\end{equation}
As $B \to \infty$, $\hat\sigma_B$ will approach $\sigma(T; \hat{F})$.