## Advanced Econometrics 2 (2020/2021) - Bootstrap Methods

Computer Class 1b (Wednesday)

*Aim of this computer class*: to gain practical experience of bootstrapping regression models.

# 11-1

Consider the model $y=\alpha+\beta x+\varepsilon$, where $\alpha$, $\beta$, and $x$ are scalars and $\varepsilon \sim N(0,\sigma^2)$. A sample of size $N=20$ is generated with $\alpha=2, \beta=1, \sigma^2=1$ and $x \sim N(2,2)$. We wish to test $H_0: \beta=1$ against $H_1: \beta\neq 1$  at level 0.05 using the t-statistic $t=(\hat{\beta}-1)/SE[\hat{\beta}]$. Use $B = 999$ bootstrap replications. 

In [1]:
def OLS(y,X):
    N,k = X.shape                   # number of observations and regressors
    XXi = np.linalg.inv(X.T @ X)
    b_ols = XXi @ (X.T @ y)
    res = y-X @ b_ols
    s2 = (res @ res)/(N-k)
    SE = np.sqrt(s2*np.diag(XXi))
    return b_ols,SE,res

### **(a)** Estimate the model by OLS, giving slope estimate $\hat{\beta}$.

In [2]:
import numpy as np
y  =np.array([2.463460087, 4.082339785,7.14245305 ,6.837688781, 3.188993095,4.838084255,5.354217263,5.024464493,4.278112328, 2.061616983,-0.655026946,6.637085435, 1.822475278, 3.440341802,6.294259862, 4.225766242,4.901194854, 2.293813513,3.278865984,5.515655038])
x  =np.array([0.259705633, 2.481299324,3.960540791,3.49720621,  2.133512947,1.530091473,3.265568683,3.797276605,1.184917425, 0.462349978,-2.149324397,4.470384733, 1.343208036, 1.693754991,3.869958201, 2.789750994,2.867776386, 0.393884163,1.918828592,2.983220267])
eps=np.array([0.203754454,-0.39895954, 1.181912259,1.340482571,-0.944519852,1.307992781,0.08864858,-0.772812112,1.093194903,-0.400732995,-0.505702548,0.166700702,-1.520732758,-0.253413189,0.424301661,-0.563984752,0.033418467,-0.10007065,-0.639962608,0.532434771])
N=len(y)
alpha=2
beta=1
const=np.ones(N)
X=np.vstack( (const,x) ).T
b_ols,SE,res = OLS(y,X)
print('OLS estimates:      %7.4f   %7.4f' % (b_ols[0],b_ols[1]) )
print('Standard errors:   (%7.4f) (%7.4f)' % (SE[0],SE[1]) )
print('t-stat for H_0:beta=1:       %8.3f' % ((b_ols[1]-beta)/SE[1]) )

OLS estimates:       1.7472    1.1246
Standard errors:   ( 0.2959) ( 0.1116)
t-stat for H_0:beta=1:          1.117


### **(b)** Use a paired bootstrap to compute the standard error and compare this to the original sample estimate. Use the bootstrap standard error to test $H_0$.

In [3]:
w=np.vstack( (y,x) ).T                 # make pairs
BOOTREP=9999;                          # number of bootstrap replications
betaB=np.zeros(BOOTREP)                # initialise to zero
tB=np.zeros(BOOTREP)
np.random.seed(42)
for b in range(BOOTREP):
    index=np.random.randint(N,size=N)  # select the indices  
    wB=np.copy(w[index,])              # resample from data
    yB=wB[:,0]
    XB=np.vstack( (const,wB[:,1]) ).T
    bB_ols,SEB,resB=OLS(yB,XB)         # obtain bootstrap estimates using OLS(.)-function    
    betaB[b]=(bB_ols[1])               # store bootstrapped regression coefficient
    tB[b]=(bB_ols[1]-b_ols[1])/SEB[1]  # store bootstrapped t-statistic
print('Results paired boostrap (B=%d):' % BOOTREP)
print('  Bootstrapped SE:       %7.4f' % np.std(betaB));
print('  t-stat using SE.boot:  %7.4f' % ((b_ols[1]-beta)/np.std(betaB)))

Results paired boostrap (B=9999):
  Bootstrapped SE:        0.0956
  t-stat using SE.boot:   1.3030


$SE_{Boot}[\hat{\beta}]=0.0956$, this value is comparable to the asymptotic SE of 0.1116. The t-statistic becomes $$t_{obs}=\frac{\hat{\beta}-1}{SE_{Boot}(\hat{\beta}}=(1.1246-1)/0.0956=1.3030.$$
Rejection region can be based on student-*t* distribution with 18(=20–2) degrees of freedom:
- $T\leq -t_{0.025}(18)=–2.101$ or $T\geq t_{0.025}(18)=2.101$.

$H_0: \beta=1$ is not rejected. `px.histogram(x=betaB)` in Python gives:

In [4]:
from scipy import stats
print('2.5%% critical value Student-t distribution: %8.3f' % stats.t.ppf(0.975,N-2))
import plotly.express as px
px.histogram(x=betaB,labels={'x':'betaB'},title='Paired bootstrap')  # make histogram

2.5% critical value Student-t distribution:    2.101


### **(c)** Use a paired bootstrap based on an asymptotic pivotal test statistic to test $H_0$.

We cannot use restricted estimates since we use the paired bootstrap. Hence, the test statistic has to be centered around $\hat{\beta}$, i.e. use the quantiles of 
$$T^*=(\hat{\beta}^*-\hat{\beta})/SE[\hat{\beta}^*].$$
The appropriate quantiles and simulated bootstrap distribution of $T^*$ are shown below.

In [5]:
px.histogram(x=tB,labels={'x':'tB'},title='Paired bootstrap with asymptotic refinement').show()
print(' 2.5%% bootstrap quantile: %8.3f' % np.quantile(tB,0.025))
print('97.5%% bootstrap quantile: %8.3f' % np.quantile(tB,0.975))

 2.5% bootstrap quantile:   -1.539
97.5% bootstrap quantile:    1.526


The critical values based on the studentized bootstrap distribution are: $-1.539$ and $1.526$. Rejection region based on bootstrap distribution: 
- $T\leq -1.539$ or $T\geq 1.526$.

We cannot reject $H_0:\beta=1$ since $t_{obs}=1.117$.

### (d) Use a residual bootstrap to compute the standard error and compare this to the original sample estimate. Use the bootstrap standard error to test $H_0$.

In the code below, we have replaced the paired bootstrap with the residual bootstrap.

In [6]:
fit=X@b_ols
np.random.seed(42)
for b in range(BOOTREP):
    index=np.random.randint(N,size=N)  # select the indices  
    epsB=res[index]                    # resample from residuals
    yB=fit+epsB
    bB_ols,SEB,resB=OLS(yB,X)          # obtain bootstrap estimates using OLS(.)-function    
    betaB[b]=(bB_ols[1])               # store bootstrapped regression coefficient
    tB[b]=(bB_ols[1]-b_ols[1])/SEB[1]  # store bootstrapped t-statistic
print('Results residual boostrap (B=%d):' % BOOTREP)
print('  Bootstrapped SE:      %8.4f' % np.std(betaB));
print('  t-stat using SE.boot: %8.4f' % ((b_ols[1]-beta)/np.std(betaB)))

Results residual boostrap (B=9999):
  Bootstrapped SE:        0.1063
  t-stat using SE.boot:   1.1722


$SE_{Boot}[\hat{\beta}]=0.1063$, this value iseven closer to the asymptotic SE of 0.1116. The t-statistic becomes $$t_{obs}=\frac{\hat{\beta}-1}{SE_{Boot}(\hat{\beta}}=(1.1246-1)/0.1063=1.1722.$$
Rejection region can be based on student-*t* distribution with 18(=20–2) degrees of freedom:
- $T\leq -t_{0.025}(18)=–2.101$ or $T\geq t_{0.025}(18)=2.101$.

$H_0: \beta=1$ is not rejected. `px.histogram(x=betaB)` in Python gives:

In [7]:
px.histogram(x=betaB,labels={'x':'betaB'},title='Residual bootstrap') 

### **(e)** Use a residual bootstrap with asymptotic refinement to test $H_0$.

In [8]:
px.histogram(x=tB,labels={'x':'tB'},title='Residual bootstrap with asymptotic refinement').show()
print(' 2.5%% bootstrap quantile: %8.3f' % np.quantile(tB,0.025))
print('97.5%% bootstrap quantile: %8.3f' % np.quantile(tB,0.975))

 2.5% bootstrap quantile:   -2.128
97.5% bootstrap quantile:    2.050


The critical values based on the studentized bootstrap distribution are: $-2.128$ and $2.050$. Rejection region based on bootstrap critical values: 
- $T\leq -2.128$ or $T\geq 2.050$.

Note that these critical values based on the residual-bootstrap are larger (in absolute value) than based on the paired-bootstrap and quite close to the values from the student-t distribution. We cannot reject $H_0:\beta=1$ since $t_{obs}=1.12$.

# 11-2

A sample of size 20 is generated according to the following DGP. Two regressors are generated by $x_1\sim \chi^2(4)-4$ and $x_2 \sim 3.5+\mathcal{U}[1,2]$; the error is from a mixture of normals with $u \sim N(0,25)$ with probability 0.3 and $u \sim N(0,5)$ with probability 0.7; and the dependent variable is $y=1.3\cdot x_1+0.7\cdot x_2+0.5\cdot u$.

### **(a)** Estimate by OLS the model $y=\beta_0+\beta_1\cdot x_1+\beta_2\cdot x_2+u$

In [9]:
y =np.array([-1.68394399, 1.89893235  ,5.587108425, 4.040390467,13.20263535 ,12.9103882  ,4.742519161,-0.22837419 , 0.997667496, 3.917611056,5.264028901,5.470083648,-2.736489722,-0.700599201, 2.968735541, 1.731689435,-1.626249678, 0.246495836, 3.040399679,4.966098157])
x1=np.array([-2.466038711,-0.039161303,0.746740798, 0.370547493, 5.807525562, 6.442885266,0.296510257,-3.434583623,-2.372860055,-2.95813237 ,0.499211306,1.106643338,-1.587499657,-2.36526624 ,-0.789818973,-0.553989793,-3.098652021,-1.793674939, 1.674597685,0.167786355])
x2=np.array([ 5.156698473, 4.709976091,4.963574179, 5.441221931, 5.216287314, 4.576500608,5.148811731, 5.168776938, 4.699506648, 5.411501169,4.847301769,5.211460017, 5.048981452, 4.572994183, 5.334505106, 4.877380298, 4.557628807, 5.14582404 , 4.727755237,4.95463344])
u =np.array([-4.175565193,-2.69428244 ,2.283686924,-0.50035325 , 4.002901992, 2.662173856,1.50577523 , 1.236881328, 1.585461827, 7.950264637,2.44388593 ,0.766850592,-8.41405437,-1.653698034 , 0.522693263,-1.924580086,-1.576684432,-2.047607142,-4.892011955,2.559464974])
N=len(y)
beta0=0
beta1=1.3
beta2=0.5
const=np.ones(N)
X=np.vstack( (const,x1,x2) ).T
k=np.shape(X)[1]
b_ols,SE,res = OLS(y,X)
print('                      C          X1         X2')
print('OLS estimates:    %7.4f   %7.4f   %7.4f' % (b_ols[0],b_ols[1],b_ols[2]) )
print('Standard errors: (%7.4f) (%7.4f) (%7.4f)' % (SE[0],SE[1],SE[2]) )
s2 = (res @ res)/(N-k)
V = s2*np.linalg.inv(X.T@X)
print('Covariance matrix')
with np.printoptions(precision=4, suppress=True):
    print(V)

                      C          X1         X2
OLS estimates:    -8.7716    1.4738    2.4641
Standard errors: ( 7.2042) ( 0.1535) ( 1.4426)
Covariance matrix
[[ 51.9007  -0.093  -10.3767]
 [ -0.093    0.0236   0.0197]
 [-10.3767   0.0197   2.081 ]]


### **(b)** Suppose we are interested in estimating the quantity $\gamma=\beta_1+\beta_2^2$  from the data. Use the least-squares estimates to estimate this quantity. Use the delta method to obtain approximate standard error for this function.

- Delta method: let $\theta=\left (\begin{array}{c}\beta_1 \\ \beta_2 \end{array} \right)$, so that $\gamma=h(\theta)=\beta_1+\beta_2^2$.
- Then we have $R(\theta)=\frac{\partial h(θ)}{\partial θ'}=(1,2β_2)$, so that $V(\hat{\gamma})\approx R(\hat{\theta})V(\hat{\theta})R'(\hat{\theta})$ (see p. 231 of the book;  § 7.2.8).
- This means that $V(\hat{\theta})≈V(\hat{\beta}_1)+4\hat{\beta}_2^2 V(\hat{\beta}_2)+4\hat{\beta}_2 Cov(\hat{\beta}_1,\hat{\beta}_2).$

Determine the point estimate and its standard error.

In [10]:
gamma_hat=b_ols[1]+b_ols[2]**2
R=np.array([0,1,2*b_ols[2]])
se_gamma=np.sqrt(R@V@R.T)                 # using vector & matrix notation :-)
tstat=(gamma_hat-1)/se_gamma
print('gamma_hat (SE): %7.4f (%7.4f)' % (gamma_hat,se_gamma) )
print('t-stat: %7.4f' % tstat);

gamma_hat (SE):  7.5456 ( 7.1245)
t-stat:  0.9187


Manual check:
- $\hat{\gamma}=1.47377+2.46412^2=7.546$
- $SE[\hat{\gamma}]=\sqrt{0.0236+4⋅2.46412^2⋅2.0810+4⋅2.46412⋅0.0197}=\sqrt{50.760}=7.125$
    

### **(c)** Then estimate the standard error of $\hat{\gamma}$ using a paired bootstrap. Compare this to $SE[\hat{\gamma}]$ from part (b) and explain the difference. For the bootstrap use $B=25$ and $B=200$.

In [11]:
w=np.vstack( (y,x1,x2) ).T                # make pairs
for BOOTREP in (25,200,1000):
    gammaB_hat=np.zeros(BOOTREP)              # initialise to zero
    se_gammaB=np.zeros(BOOTREP)
    tB=np.zeros(BOOTREP)
    np.random.seed(3)
    for b in range(BOOTREP):
        index=np.random.randint(N,size=N)  # select the indices  
        wB=np.copy(w[index,])              # resample from data
        yB=wB[:,0]
        XB=np.vstack( (const,wB[:,1],wB[:,2]) ).T
        bB_ols,SEB,resB=OLS(yB,XB)         # obtain bootstrap estimates using OLS(.)-function    
        gammaB_hat[b]=bB_ols[1]+bB_ols[2]**2;
        RB=np.array([0,1,2*bB_ols[2]])
        s2B=(resB @ resB)/(N-k)
        VB=s2B*np.linalg.inv(XB.T@XB)
        se_gammaB[b]=np.sqrt(RB@VB@RB.T)
        tB[b]=(gammaB_hat[b]-gamma_hat)/se_gammaB[b]
    print('Results paired boostrap (B=%d):' % BOOTREP)
    print('  Bootstrapped SE:  %7.4f' % np.std(gammaB_hat,ddof=1))
    print('  t-stat (SE.boot): %7.4f\n' % ((gamma_hat-1)/np.std(gammaB_hat,ddof=1)) )

Results paired boostrap (B=25):
  Bootstrapped SE:   4.6326
  t-stat (SE.boot):  1.4130

Results paired boostrap (B=200):
  Bootstrapped SE:   6.6318
  t-stat (SE.boot):  0.9870

Results paired boostrap (B=1000):
  Bootstrapped SE:   7.9764
  t-stat (SE.boot):  0.8206



For $B=25$, we get $SE_{Boot}[\hat{\gamma}]=4.633$, while for $B=200$, we obtain $6.632$. Note: 25 bootstrap replications is "too small". Bootstrap standard errors for $B=200$ are fairly close to the asymptotic standard errors.

### **(d)** Now test $H_0:\gamma=1$ at level 0.05 using a paired bootstrap with $B=999$. Perform bootstrap tests without asymptotic refinement, i.e. using $SE_{Boot}[\hat{\gamma}]$ of (c),  and with asymptotic refinement, i.e. using $T^*=(\hat{\gamma}^*-\hat{\gamma})/SE(\hat{\gamma}^*)$.

#### Without asymptotic refinement

For $B=999$, we get $se_{Boot}[\hat{\gamma}]= 7.800$, so t-statistic becomes $t_{obs}=((7.546-1))/7.800=0.839$. 

Critical values based on asymptotic N(0,1) distribution:
- $T\leq -z_{0.025}=-1.96$ or $T\geq z_{0.025}=1.96.$

There is insufficient evidence at 5% significance level to reject $H_0:\gamma=1$.

In [12]:
print(' 2.5%% quantile: %8.3f' % np.quantile(tB,0.025))
print('97.5%% quantile: %8.3f' % np.quantile(tB,0.975))

 2.5% quantile:  -11.383
97.5% quantile:    1.424


#### With asymptotic refinement

Asymptotical pivotal test statistic: $T^*=(\hat{\gamma}-\gamma)/SE(\hat{\gamma})$, so we bootstrap $T^*=(\hat{\gamma}^*-\hat{\gamma})/SE(\hat{\gamma}^*)$. 

From the sample, we get $t_{obs}=((7.546-1))/7.1245=0.919$.

Critical bootstrap values are: -11.524  and =1.427.

Again, the null hypothesis cannot be rejected. Simulated bootstrap distribution based on $B=999$ looks like this:

In [13]:
import plotly.express as px
px.histogram(x=tB,labels={'x':'betaB'},title='Paired boostrap with asymptotic refinement') 