# Bootstrapping a Linear Regression

This notebook implements both a traditional bootstrap and a block bootstrap in order to get more robust standard errors of OLS coefficients.

## Load Packages and Extra Functions

In [1]:
using Printf, DelimitedFiles, Statistics, LinearAlgebra, Random

include("jlFiles/printmat.jl")
include("jlFiles/Ols.jl");

## Loading Data

The regressions used below are of the type

$
y_t = x_t'b + u_t
$

where $y_t$ is monthly data on 1-year excess returns on a 5-year bond (so there is an 11-month overlap between two data points) and $x_t$ includes a constant and the lagged 1-year forward rates for investments starting (0,1,2,3,4) years ahead.

In [2]:
xx  = readdlm("Data/BondPremiaPs.csv",',',skipstart=1)
rx  = xx[:,5]                     #bond excess returns
f   = xx[:,6:end]                 #forward rates, several columns

x = [ones(size(f,1)-12) f[1:end-12,:]]   #regressors
y = rx[13:end]                           #dependent variable


(T,n) = (size(y,1),size(y,2))            #no. obs and no. test assets
K     = size(x,2)                        #no. regressors

println("T = $T, n = $n, K = $K")

T = 580, n = 1, K = 6


## Point Estimates

In [3]:
(bLS,u,yhat,Covb,) = OlsGMFn(y,x)            #OLS estimate and traditional std errors
StdbLS = sqrt.(diag(Covb))

printblue("OLS estimates:\n")
rowNames = [string("x",'₀'+i) for i=1:K]      #'₀'+1 to get ₁
printmat(bLS,StdbLS;colNames=["coeff","std (trad.)"],rowNames=rowNames,width=15)

[34m[1mOLS estimates:[22m[39m

            coeff    std (trad.)
x₁         -3.306          0.824
x₂         -4.209          0.712
x₃         10.627          4.513
x₄        -14.397         12.896
x₅          7.096         15.876
x₆          1.284          6.904



## Standard Bootstrap (I)

In each loop, a new series of residuals, $\tilde{u}_{t}$, is created by drawing (with replacement) values from the fitted residuals (from the estimates in earlier cells). Then, simulated values of the dependent variable are created as 

$\tilde{y}_{t}=x_{t}^{\prime}\beta+\tilde{u}_{t}$ 

and we redo the estimation on ($\tilde{y}_{t},x_{t}$). Notice that $x_t$ is the same as in the data.

This is repeated `NSim` times.

In [4]:
NSim      = 2000                 #no. of simulations
Random.seed!(123)

bBoot   = fill(NaN,(NSim,K))
for i = 1:NSim                                       #loop over simulations
  #local t_i, utilde, ytilde                         #local/global is needed in script
  t_i        = rand(1:T,T)                           #T random numbers from 1:T (with replacement)
  #println(t_i)                                      #uncomment to see which rows that are picked out
  utilde     = u[t_i]
  ytilde     = x*bLS + utilde[1:T]
  bBoot[i,:] = OlsGMFn(ytilde,x)[1]
end

printblue("Coefficients:")
xx = [bLS  mean(bBoot,dims=1)']
printmat(xx;colNames=["OLS","avg. bootstr"],rowNames=rowNames,width=20)

printblue("Std:")
xx = [StdbLS std(bBoot,dims=1)']
printmat(xx;colNames=["trad.","bootstrap 1"],rowNames=rowNames,width=20)

printred("The results from these bootstrap are similar to standard OLS, but...see below")

[34m[1mCoefficients:[22m[39m
                   OLS        avg. bootstr
x₁              -3.306              -3.315
x₂              -4.209              -4.225
x₃              10.627              10.693
x₄             -14.397             -14.619
x₅               7.096               7.403
x₆               1.284               1.150

[34m[1mStd:[22m[39m
                 trad.         bootstrap 1
x₁               0.824               0.828
x₂               0.712               0.722
x₃               4.513               4.576
x₄              12.896              13.011
x₅              15.876              15.924
x₆               6.904               6.891

[31m[1mThe results from these bootstrap are similar to standard OLS, but...see below[22m[39m


## Block Bootstrap (II)

To handle autocorrelated residuals, we now consider a *block bootstrap*.


In each loop, we initially define a random starting point (observation number) of each block (by using the `rand()` function). For instance, if we randomly draw that the blocks should start with observations $27$ and $35$ and have decided that each block should contain $10$ data points, then the artificial sample will pick out observations $27-36$ and $35-44$. Clearly, some observations can be in several blocks. Once we have $T$ data points, we define a new series of residuals, $\tilde{u}_{t}$.

Then, new values of the dependent variable are created as 

$\tilde{y}_{t}=x_{t}^{\prime}\beta+\tilde{u}_{t}$ 

and we redo the estimation on ($\tilde{y}_{t},x_{t}$).

### A Remark on the Code
- `ceil(Int,11/5)` gives 3 (so 3 blocks would be created)
- `[1 9] .+ (0:5-1)` creates a 5x2 matrix with a block in each column
- `replace(z -> z>T ? z-T : z,t_i)` checks each element in `t_i` and subtracts `T` is the element is larger than `T`

In [5]:
"""
    DrawBlocksFn(T,BlockSize)

Draw a T-vector of indices `v` that can be used to create bootstrap residuals. 
The indices are such that they form blocks of length `BlockSize`

"""
function DrawBlocksFn(T,BlockSize)
    nBlocks = cld(T,BlockSize)                 #number of blocks, rounded up
    v0      = rand(1:T,nBlocks)                #nBlocks, random starting obs of blocks
    v       = vec(v0' .+ vec(0:BlockSize-1))   #each block in a column
    v       = replace(z -> z>T ? z-T : z,v)    #wrap around if index > T
    #println(v)                                  #uncomment to see result
    return v
end

DrawBlocksFn

In [6]:
Random.seed!(1234567)
BlockSize = 10                   #size of blocks

printblue("illustrating how to draw 30 observations, in blocks of $BlockSize:\n")
t_i = DrawBlocksFn(30,BlockSize)

printmat(reshape(t_i,BlockSize,:);colNames=["block 1","block 2","block 3"])

[34m[1millustrating how to draw 30 observations, in blocks of 10:[22m[39m

   block 1   block 2   block 3
     7        11        19    
     8        12        20    
     9        13        21    
    10        14        22    
    11        15        23    
    12        16        24    
    13        17        25    
    14        18        26    
    15        19        27    
    16        20        28    



In [7]:
BlockSize = 10                   #size of blocks
NSim      = 2000                 #no. of simulations
Random.seed!(123)

bBoot2  = fill(NaN,(NSim,K*n))
for i = 1:NSim                                       #loop over simulations
    #local t_i, utilde, ytilde                       #local/global is needed in script
    t_i         = DrawBlocksFn(T,BlockSize)
    utilde      = u[t_i]
    ytilde      = x*bLS + utilde[1:T]
    bBoot2[i,:] = OlsGMFn(ytilde,x)[1]
end

printblue("Std:")
xx = [StdbLS std(bBoot,dims=1)' std(bBoot2,dims=1)']
printmat(xx;colNames=["trad.","bootstrap 1","block bootstr"],rowNames=rowNames,width=20)

printred("The block bootstrap accounts for autocorrelation, so the stds tend to be higher (since there is indeed autocorrelation)")

[34m[1mStd:[22m[39m
                 trad.         bootstrap 1       block bootstr
x₁               0.824               0.828               2.102
x₂               0.712               0.722               1.407
x₃               4.513               4.576               8.327
x₄              12.896              13.011              23.881
x₅              15.876              15.924              29.839
x₆               6.904               6.891              13.219

[31m[1mThe block bootstrap accounts for autocorrelation, so the stds tend to be higher (since there is indeed autocorrelation)[22m[39m
