# 4 Bootstrap and Jackknife

For an estimator, we can estimate its standard error (which is useful in confidence level) by Bootstrap or Jackknife.

## Bootstrap

Given a sample $x_1, x_2, \cdots, x_n$, we can generate a new sample $x_1^*, x_2^*, \cdots, x_n^*$ by sampling **with replacement** from the original sample. Then we can calculate the estimator $\hat{\theta}^*$ from the new sample. Repeat this process $B$ times, we can get $B$ estimators $\hat{\theta}_1^*, \hat{\theta}_2^*, \cdots, \hat{\theta}_B^*$. The standard error of $\hat{\theta}$ is estimated by the standard deviation of $\hat{\theta}_1^*, \hat{\theta}_2^*, \cdots, \hat{\theta}_B^*$.

$$\widehat{\text{SE}}_B(\hat\theta) = \sqrt{\frac{1}{B-1}\sum_{i=1}^B(\hat\theta_i^*-\bar{\hat\theta}^*)^2}$$

Oftentimes $B$ does not to be very large, for example $B = 50$ might be enough.

<!-- <br>

For instance, with sample $(X_1,Y_1)$, $(X_2,Y_2)$, $\cdots$, $(X_n,Y_n)$, we can estimate the standard error of the correlation coefficient $\rho$ by Bootstrap. If we assume $Y = kX+b+\epsilon=W+\epsilon$, where $\epsilon$ is the noise, the estimator is given by

$$\rho = \frac{\frac 1n \sum X_i\cdot (W_i+\epsilon_i) - \frac {1}{n}\bar X_i\sum (W_i+\epsilon_i)}{\sqrt{\frac 1n \sum (X_i - \bar X_i)^2}\sqrt{\frac 1n \sum (W_i - \bar W_i+\epsilon_i)^2}}.$$ -->

In [58]:
import numpy as np
lsat = np.array([576,635,558,578,666,580,555,661,651,605,653,575,545,572,594])
gpa = np.array([3.39,3.30,2.81,3.03,3.44,3.07,3.00,3.43,3.36,3.13,3.12,2.74,2.76,2.88,2.96])

# estimator for the correlation coefficient
def corr(x,y):
    x, y = x - x.mean(), y - y.mean()
    return (x*y).mean() / np.sqrt(x.var() * y.var())

# bootstrap for standard error
def bootstrap_se(data, statistic, B = 200):
    n = data.shape[0]
    idx = np.random.randint(0, n, (B, n))
    stat = np.zeros(B)
    for i in range(B):
        stat[i] = statistic(data[idx[i]])
    return np.var(stat, ddof = 1) ** .5

data = np.stack((lsat, gpa), axis = -1)
statistic = lambda x: corr(x[...,0], x[...,1])

np.random.seed(1)
print('Estimated SE =', bootstrap_se(data, statistic))

Estimated SE = 0.12520332405358106


### Bias Correction

Bootstrap can also help estiamte the bias of an estimator $\hat\theta$. The bias of an estimator is defined by $\text{bias}(\hat\theta) = \mathbb E\hat\theta - \theta$. However, the real $\theta$ is often unknown.

To use bootstrap to estimate the bias, we first compute $\hat\theta$ to estimate $\theta$. And then we use bootstrap to estimate $\mathbb E\hat\theta\approx \frac 1B\sum_{i=1}^B  \hat{\theta}_i^*$. Then we have bias estimation $\widehat{\text{bias}}(\hat\theta) = \frac 1B\sum_{i=1}^B  \hat{\theta}_i^* - \hat\theta$.

The bias-corrected estimator is given by $\tilde\theta = \hat\theta - \widehat{\text{bias}}(\hat\theta)$.

In [57]:
# estimate bias of moment estimator
import numpy as np
mu, sigma, n = 2, 1.5, 10

def bootstrap(data, statistic, B = 200):
    n = data.shape[0]
    idx = np.random.randint(0, n, (B, n))
    stat = 0
    for i in range(B):
        stat += statistic(data[idx[i]])
    return stat / B - statistic(data)

statistic = lambda x: ((x - x.mean())**2).mean()
statistic = lambda x: x.var(ddof = 1)

np.random.seed(0)
x = np.random.randn(n) * sigma + mu
bias_ = bootstrap(x, statistic)
print('Estimated bias =', bias_)
print('True      bias =', -sigma**2/n)
print('Estimated mean =', statistic(x))
print('Corrected mean =', statistic(x) - bias_)
print('True      mean =', sigma**2)

Estimated bias = -0.20687985833105405
True      bias = -0.225
Estimated mean = 2.338105501250709
Corrected mean = 2.544985359581763
True      mean = 2.25


### Confidence Interval

Apart from $\hat\theta\pm z_{\alpha/2} \widehat{\text{SE}}_B(\hat\theta)$, we can also use other confidence intervals by Bootstrap. Assume we have obtained $\hat\theta_1^*, \hat\theta_2^*, \cdots, \hat\theta_B^*$ by Bootstrap already. We sort them as

$$\hat\theta_{(1)}^* \leqslant \hat\theta_{(2)}^* \leqslant \cdots \leqslant \hat\theta_{(B)}^*.$$

Then we can use the following confidence intervals.

#### Percentile Bootstrap Confidence Interval

Use the percentile $[\hat\theta_{((B+1)\alpha/2)}^*, \hat\theta_{((B+1)(1-\alpha/2))}^*]$. It covers $(1 - \alpha)$-percentage of $\hat\theta^*$.

A bias-corrected and accelerated version known as BCa is given by [1, pp. 184-186] $(\hat\theta_{(f(\alpha/2))}, \hat\theta_{(f(1-\alpha_2))})$ where

$$\left\{\begin{aligned}
\hat a &= \frac{\sum_{i=1}^B (\hat \theta_i^* - \bar{\hat\theta^*})^3}{6\left(\sum_{i=1}^B (\hat \theta_i^* - \bar{\hat\theta^*})^2\right)^{3/2}}\\
\hat z &= \Phi^{-1}\left(\frac 1B  \sum_{i=1}^B\mathbb I_{\hat\theta_i^* <\hat\theta} \right)\\ 
f(w) &= \Phi\left(\hat z + \frac{\hat z + \Phi^{-1}(w)}{1 -\hat a(\hat z + \Phi^{-1}(w))}\right)
\end{aligned}\right.$$

with $\Phi$ being the CDF of standard normal distribution.

`scipy.stats.bootstrap((x,), lambda x, axis: np.var(x, ddof=1, axis=axis), method = 'BCa')`

#### Basic Bootstrap Confidence Interval

It is a combination of bias correction and percentile trick: $[2\hat\theta - \hat\theta_{((B+1)(1-\alpha/2))}^*, 2\hat\theta - \hat\theta_{((B+1)\alpha/2)}^*]$. It is equivalent to

$$\mathbb P( \hat\theta_{((B+1)\alpha/2)}^* -\hat\theta\leqslant \hat\theta -\theta \leqslant \hat\theta_{((B+1)(1-\alpha/2))}^*-\hat\theta) \approx 1 - \alpha.$$

#### Bootstrap t Confidence Interval 

Bootstrap t is much more complicated. Recall we have $x_1^*, \dotsc, x_n^*$ (with replacement) in each of the $B$ bootstrap samples. We can compute the t-statistics using nested Bootstrap over the sample:

$$t^{(b)} = (\hat\theta^{(b)} - \hat\theta) / \widehat{\text{SE}}(\hat\theta^{(b)}).$$

Finally, we extract the percentile $t_{\alpha/2}^*$ and $t_{1-\alpha/2}^*$ and use the following confidence interval:

$$\hat\theta \pm \widehat{\text{SE}}(\hat\theta) \cdot t_{(\cdot)}^*.$$

The time complexity is $O(B^2)$.

In [79]:
import numpy as np
from scipy.stats import norm

def bootstrap_se(data, statistic, B = 200):
    n = data.shape[0]
    idx = np.random.randint(0, n, (B, n))
    stat = np.zeros(B)
    for i in range(B):
        stat[i] = statistic(data[idx[i]])
    return np.var(stat, ddof = 1) ** .5

def bootstrap_ci(data, statistic, alpha = 0.05, B = 200, method = 'normal'):
    """
    Compute estimated CI for a statistic using bootstrap with various methods.
    Accepted methods are 'normal', 'percentile', 'BCa', 'basic', 't'.
    """
    theta = statistic(data)
    n = data.shape[0]
    idx = np.random.randint(0, n, (B, n))
    stat = np.zeros(B)
    for i in range(B):
        stat[i] = statistic(data[idx[i]])
    if method == 'normal':
        z = norm.ppf(1-alpha/2)
        se = np.std(stat, ddof = 1)
        ci = theta + z * np.array([-se, se])
    elif method == 'percentile':
        ci = np.quantile(stat, [alpha/2, 1-alpha/2])
    elif method == 'basic':
        ci = 2 * theta - np.quantile(stat, [1-alpha/2, alpha/2])
    elif method == 'BCa':
        stat2 = stat - stat.mean()
        a = (stat2 ** 3).mean() / (6 * ((stat2 ** 2).mean()) ** 1.5)
        z = norm.ppf((stat < theta).mean())
        f = lambda w: norm.cdf(z + (z + norm.ppf(w)) / (1 - a * (z + norm.ppf(w))))
        ci = np.quantile(stat, [f(alpha/2), f(1-alpha/2)])
    elif method == 't':
        se = [bootstrap_se(data[idx[i]], statistic, B = B) for i in range(B)]
        t = (stat - theta) / np.array(se)
        se0 = np.var(stat, ddof = 1) ** .5
        ci = theta + np.quantile(t, [alpha/2, 1-alpha/2]) * se0
    return ci

In [183]:
np.random.seed(3)
x = np.random.randn(100)
x = x**2 # chi-squared distribution

def statistic(x, axis = None):
    # skewness
    x = x - x.mean(axis=axis, keepdims=True)
    return (x**3).mean(axis=axis) / (x**2).mean(axis=axis)**1.5

# skewness of chi-squared distribution is sqrt(8/d) where d is degree of freedom
for method in ['normal', 'percentile', 'BCa', 'basic', 't']:
    np.random.seed(2)
    ci = bootstrap_ci(x, statistic, method = method, alpha = .05, B = 50)
    print(method.rjust(10, ' '), '(%.4f, %.4f) Length = %.4f'%(ci[0], ci[1], ci[1] - ci[0]))

    normal (1.3930, 3.1028) Length = 1.7098
percentile (1.3291, 2.8736) Length = 1.5445
       BCa (1.5176, 2.9209) Length = 1.4033
     basic (1.6223, 3.1668) Length = 1.5445
         t (0.6318, 2.7977) Length = 2.1660


## Jackknife

Suppose we have observed sample $x_1,\dotsc,x_n$. We can discard one of them to obtain $(x_1,\dotsc, x_{i-1}, x_{i+1},\dotsc,x_n)$. Let $i$ run through $1,\dotsc,n$ and we can get $n$ different estimators $\hat\theta_1,\dotsc,\hat\theta_n$.

The bias of the estimator is given by

$$\widehat{\text{bias}}_{\text{jack}}(\hat\theta) = (n-1)\sum_{i=1}^n(\hat\theta_i - \hat\theta).$$

While the standard error is given by

$$\widehat{\text{SE}}_{\text{jack}}(\hat\theta) = \sqrt{\frac{n-1}{n}\sum_{i=1}^n(\hat\theta_i - \bar{\hat\theta})^2}.$$

<br>

Jackknife might fail when the statistic is not smooth.

In [213]:
import numpy as np

def jackknife(data, statistic):
    n = data.shape[0]
    theta = statistic(data)
    data = np.concatenate([data, data]) # cyclize
    stat = np.zeros(n)
    for i in range(n):
        stat[i] = statistic(data[i+1:n+i])
    bias = (n-1) * (stat.sum() - n*theta)
    se = ((n-1) * ((stat - theta)**2).mean()) ** .5
    return bias, se

# Full data available at https://rdrr.io/cran/bootstrap/man/patch.html
patch_data = np.array([
    [    1,  9243, 17649, 16449,  8406, -1200],
    [    2,  9671, 12013, 14614,  2342,  2601],
    [    3, 11792, 19979, 17274,  8187, -2705],
    [    4, 13357, 21816, 23798,  8459,  1982],
    [    5,  9055, 13850, 12560,  4795, -1290],
    [    6,  6290,  9806, 10157,  3516,   351],
    [    7, 12412, 17208, 16570,  4796,  -638],
    [    8, 18806, 29044, 26325, 10238, -2719]
])

statistic = lambda x: x[:,-1].mean() / x[:,-2].mean()

print('Estimated (bias, SE) =', jackknife(patch_data, statistic))

Estimated (bias, SE) = (0.06401990686485959, 0.1055711231014818)


In [258]:
# Unsmooth statistic might fail Jackknife. E.g. median
np.random.seed(0)
x = np.random.randint(1, 101, 10)

print('Bootstrap Median SE =', bootstrap_se(x, np.median, B = 2000))
print('Jackknife Median SE =', jackknife(x, np.median)[1])

Bootstrap Median SE = 11.217010504690238
Jackknife Median SE = 25.5


## References

[1] B. Efron, R. Tibshirani, [An Introduction to the Bootstrap](https://www.semanticscholar.org/paper/An-Introduction-to-the-Bootstrap-Efron-Tibshirani/85a8a97f614b2b6823e035bcc9abcb0f3d27be4d), 1994