# Bootstrap methods in Stata

``2020-11-22``  
_Zhiyuan Chen, Department of Trade Economics, Renmin Business School_
* Main Reference: _A. Colin Cameron and Pravin K. Trivedi, Microeconometrics Using Stata, Second Edition, 2010, Chapter 13_


## Bootstrap methods
Consider calculating the standard error of $\hat{\theta}$ when the analytical form is not known. We can image that there are many different random samples from the population were available. Then we could obtain 400 different estimates of $\hat{\theta}$ and the standard error of $\hat{\theta}$ will be the standard deviation of these 400 estimates. 

Let $\hat{\theta}_1^{*}, \cdots, \hat{\theta}_B^{*}$ denote the estimates, where here B=500. Then the boostrap estimate of the variance of $\hat{\theta}$ is simply

$$
\widehat{SE_{Boot}}(\hat{\theta}) = \sqrt{ \frac{1}{B-1}\sum_{b=1}^{B}(\hat{\theta}_b^{*}-\bar{\hat{\theta}^*})^2} \tag{1}
$$

* The bootstrap can be used to obtain the distribution of many different statistics (confidence intervals, quantiles...). 
* When applying boostrap, you should make sure that:
      1.The bootstrap resampleing scheme assumes indepdent observations
      2.The variance of the estimate is well defined and exists


## Boostrap pairs method to estimate Variance-Covariance matrix
Let $\mathbf{D}_i = (y_i,\mathbf{x}_i)$ denote the data for observation $i$. We assume that $\mathbf{D}_i$ is independent over $i$. 

Stata employs the following bootstrap-pairs algorithm:
1. Repeat steps A and B $B$ independent times:
    1. Draw a boostrap sample of size $N$ by sampling with replacement from the original data $\mathbf{D}_1,\cdots,\mathbf{D}_N$.Denote the bootstrap sample by $\mathbf{D}_1^*,\cdots,\mathbf{D}_N^*$. 
    2. Calculate an estimate, $\hat{\theta}^{*}$ of $\theta$ based on $\mathbf{D}_1^*,\cdots,\mathbf{D}_N^*$.
2. Given the B bootstrap estimates $\{\hat{\theta}_b^{*}\}_{b=1}^B$, the bootstrap estimate of the variance-covariance matrix of the estimator is 

$$
VCE_{Boot}(\hat{\theta}) =  \frac{1}{B-1}\sum_{b=1}^{B}(\hat{\theta}_b^{*}-\bar{\hat{\theta}^*})^2 \tag{2}
$$

* __<u>Cluster bootrap</u>__
The method can be easily adapted to cluster bootstraps. Then $\mathbf{D}_i$ becomes $\mathbf{D}_c$, where $c = 1, \cdots, C$ denotes each of the C clusters. Data should be independent over $c$, resampling is over clusters, the size of bootstrap resample is of size $C$ clusters. 


## Examples
In what follows, we show:
1. Using the `bsample` command to draw sample with replacement and calculate relevant statistics
2. Using the `bootstrap` command to obtain bootstrap statistics

### Example 1: bootstrap using `bsample`

In [59]:
* Bootstrap SE using `bsample'
 * simulate a dataset
drop _all 
global nobs 500
global B 100
qui set obs $nobs
set seed 12345  //make the results reproducible
gen x = runiform() // use rnormal(mu, sigma) for more general cases
qui save usample.dta,replace

In [71]:
 * (continued) sample draws from the data
clear all
set seed 12345
qui postfile buffer xboot using bootmean, replace 
forvalues b = 1/$B {
    quietly {
      drop _all
      qui use usample.dta, clear
      bsample $nobs
      summ x
    }
    post buffer (r(mean))    // one can also use mean y and _b[y] to store the results
}
postclose buffer
qui use bootmean,clear
qui summ xboot

In [72]:
disp("-------------------------------")
disp ("bootstrap s.e. = ") r(sd)
disp("-------------------------------")


-------------------------------

bootstrap s.e. = .01311633

-------------------------------


### Example 2: Boostrap using `bootstrap`

In [66]:
* Bootstrap SE using `bootstrap'
  * define the program of calculating sample mean
capture program drop samplemean
program samplemean, rclass
        qui summ x
        return scalar xbar = r(mean)
 end

In [68]:
 * use bootstrap command
use usample.dta,clear
bootstrap r(xbar), reps($B) seed(12345) nodots nowarn: samplemean




Bootstrap results                               Number of obs     =        500
                                                Replications      =        100

      command:  samplemean
        _bs_1:  r(xbar)

------------------------------------------------------------------------------
             |   Observed   Bootstrap                         Normal-based
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _bs_1 |   .4862399   .0131163    37.07   0.000     .4605324    .5119475
------------------------------------------------------------------------------
