# Bootstrapping a Linear Regression

This notebook implements both a traditional bootstrap and a block bootstrap in order to get more robust standard errors of OLS coefficients.

## Load Packages and Extra Functions

In [1]:
using Printf, DelimitedFiles, Statistics, LinearAlgebra, Random

include("jlFiles/printmat.jl")
include("jlFiles/Ols.jl")
include("jlFiles/CovNWFn.jl")

CovNWFn

## Loading Data

The regressions used below are of the type

$
y_t = x_t'b + u_t
$

where $y_t$ are monthly data on 1-year excess returns on a bond and $x_t$ are lagged (12 months) forward rates. 

In [2]:
xx  = readdlm("Data/BondPremiaPs.csv",',',skipstart=1)
rx  = xx[:,5]                     #bond excess returns
f   = xx[:,6:end]                 #forward rates, several columns

x = [ones(size(f,1)-12) f[1:end-12,:]]   #regressors
y = rx[13:end]                           #dependent variable


(T,n) = (size(y,1),size(y,2))            #no. obs and no. test assets
K     = size(x,2)                        #no. regressors

println("T = $T, n = $n, K = $K")

T = 580, n = 1, K = 6


## Point Estimates

In [3]:
(bLS,u,yhat,Covb,) = OlsGMFn(y,x)            #OLS estimate and traditional std errors
StdbLS = sqrt.(diag(Covb))                   #Covb is Cov(sqrt(T)b)

printblue("OLS estimates:\n")
rowNames = [string("x",i) for i=1:K]
printmat([bLS  StdbLS],colNames=["coeff","std (trad.)"],rowNames=rowNames,width=15)

[34m[1mOLS estimates:[22m[39m

            coeff    std (trad.)
x1         -3.306          0.824
x2         -4.209          0.712
x3         10.627          4.513
x4        -14.397         12.896
x5          7.096         15.876
x6          1.284          6.904



## Standard Bootstrap (I)

In each loop, a new series of residuals, $\tilde{u}_{t}$, is created by drawing (with replacement) values from the fitted residuals. Then, simulated values of the dependent variable are created as $\tilde{y}_{t}=x_{t}^{\prime}\beta+\tilde{u}_{t}$ and we redo the estimation on ($\tilde{y}_{t},x_{t}$).

This is repeated `NSim` times.

In [4]:
NSim      = 2000                 #no. of simulations
Random.seed!(123)

bBoot   = fill(NaN,(NSim,K))
for i = 1:NSim                                       #loop over simulations
  #local t_i, utilde, ytilde                         #local/global is needed in script
  t_i        = rand(1:T,T)                           #T random numbers from 1:T (with replacement)
  #println(t_i)                                      #uncomment to see which rows that are picked out
  utilde     = u[t_i]
  ytilde     = x*bLS + utilde[1:T]
  bBoot[i,:] = OlsGMFn(ytilde,x)[1]
end

printblue("Coefficients:")
xx = [bLS  mean(bBoot,dims=1)']
printmat(xx,colNames=["OLS","avg. bootstr"],rowNames=rowNames,width=20)

printblue("Std:")
xx = [StdbLS std(bBoot,dims=1)']
printmat(xx,colNames=["trad.","bootstrap 1"],rowNames=rowNames,width=20)

printred("looks like there is no particular need for bootstrap in this case, but...see below")

[34m[1mCoefficients:[22m[39m
                   OLS        avg. bootstr
x1              -3.306              -3.299
x2              -4.209              -4.192
x3              10.627              10.478
x4             -14.397             -14.010
x5               7.096               6.687
x6               1.284               1.438

[34m[1mStd:[22m[39m
                 trad.         bootstrap 1
x1               0.824               0.821
x2               0.712               0.715
x3               4.513               4.465
x4              12.896              12.628
x5              15.876              15.513
x6               6.904               6.750

[31m[1mlooks like there is no particular need for bootstrap in this case, but...see below[22m[39m


## Block Bootstrap (II)

To handle autocorrelated residuals, we now consider a *block bootstrap*.


In each loop, we initially define a random starting point (row number) of each block (by using the `rand()` function)---and create a vector of all rows that are in a block. For instance, suppose we randomly draw that the blocks should start on rows $27$ and $35$ and that we have decided that each block should contain $10$ rows, then the artificial sample will pick out rows $27-36$ and $35-44$. Clearly, some rows can be in several blocks. Once we have $T$ rows, we define a new series of residuals, $\tilde{u}_{t}$.

Then, new values of the dependent variable are created as $\tilde{y}_{t}=x_{t}^{\prime}\beta+\tilde{u}_{t}$ and we redo the estimation on ($\tilde{y}_{t},x_{t}$).

In [5]:
"""
    DrawBlocksFn(T,BlockSize)

Draw a T-vector of indices `t_i` that can be used to create bootstrap residuals. 
The indices are such that they form blocks of length `BlockSize`

"""
function DrawBlocksFn(T,BlockSize)
    nBlocks     = round(Int,ceil(T/BlockSize))          #number of blocks, rounded up
    t_i         = rand(1:T,nBlocks,1)                   #nBlocks x 1, random starting row of blocks
    t_i         = t_i .+ collect(0:BlockSize-1)'        #nBlocks x BlockSize, each row is a block
    vv_i        = t_i .> T
    t_i[vv_i]   = t_i[vv_i] .- T                        #wrap around if index > T
    #println(t_i)                                       #uncomment to see which rows that are picked out
    t_i         = vec(t_i')                             #column vector of the blocks
    return t_i
end

DrawBlocksFn

In [6]:
Random.seed!(1234567)
BlockSize = 5                   #size of blocks

printblue("illustrating how to draw 25 observations, in blocks of $BlockSize:\n")
t_i = DrawBlocksFn(25,BlockSize)

println("each column in the printout is a block")
printmat(reshape(t_i,BlockSize,:))

[34m[1millustrating how to draw 25 observations, in blocks of 5:[22m[39m

each column in the printout is a block
    18        17         7        10         3    
    19        18         8        11         4    
    20        19         9        12         5    
    21        20        10        13         6    
    22        21        11        14         7    



In [7]:
BlockSize = 10                   #size of blocks
NSim      = 2000                 #no. of simulations
Random.seed!(123)

bBoot2  = fill(NaN,(NSim,K*n))
for i = 1:NSim                                       #loop over simulations
    #local t_i, utilde, ytilde                       #local/global is needed in script
    t_i         = DrawBlocksFn(T,BlockSize)
    utilde      = u[t_i]
    ytilde      = x*bLS + utilde[1:T]
    bBoot2[i,:] = OlsGMFn(ytilde,x)[1]
end

printblue("Std:")
xx = [StdbLS std(bBoot,dims=1)' std(bBoot2,dims=1)']
printmat(xx,colNames=["trad.","bootstrap 1","block bootstr"],rowNames=rowNames,width=20)

printred("this bootstrap handles autocorrelation, so the stds tend to be higher")

[34m[1mStd:[22m[39m
                 trad.         bootstrap 1       block bootstr
x1               0.824               0.821               2.089
x2               0.712               0.715               1.410
x3               4.513               4.465               8.304
x4              12.896              12.628              23.666
x5              15.876              15.513              29.522
x6               6.904               6.750              13.080

[31m[1mthis bootstrap handles autocorrelation, so the stds tend to be higher[22m[39m
