# Bootstrapping a Linear Regression

## Loading Packages

In [1]:
using Dates, DelimitedFiles, Statistics, LinearAlgebra, Random

include("jlFiles/printmat.jl")
include("jlFiles/printTable.jl")
include("jlFiles/OlsFn.jl")
include("jlFiles/NWFn.jl")

NWFn

## Loading Data

The regressions used below are of the type

$
y_t = x_t'b + u_t
$

where $y_t$ are monthly data on 1-year excess returns on a bond and $x_t$ are lagged (12 months) forward rates. 

In [2]:
xx  = readdlm("Data/BondPremiaPs.csv",',',skipstart=1)   
rx  = xx[:,5]                     #bond excess returns
f   = xx[:,6:end]                 #forward rates

x = [ones(size(f,1)-12) f[1:end-12,:]]   #regressors
y = rx[13:end]                           #dependent variable


(T,n) = (size(y,1),size(y,2))            #no. obs and no. test assets
K     = size(x,2)                        #no. regressors

println("T = $T, n = $n, K = $K")

T = 580, n = 1, K = 6


## Point Estimates

In [3]:
(bLS,u,yhat,Covb,) = OlsFn(y,x)              #OLS estimate and classical std errors
StdbLS = sqrt.(diag(Covb))                   #Covb is Cov(sqrt(T)b) 

printblue("OLS estimates:")
rowNames = [string("x",i) for i=1:K]
printTable([bLS  StdbLS],["coeff","std"],rowNames)

[34m[1mOLS estimates:[22m[39m
       coeff       std
x1    -3.306     0.943
x2    -4.209     0.583
x3    10.627     4.378
x4   -14.397    13.989
x5     7.096    18.094
x6     1.284     8.058



## Bootstrap

In each loop, a new series of residuals, $\tilde{u}_{t}$, is created by drawing values from the fitted residuals. Then,simulated values of the dependent variable are created as $\tilde{y}_{t}=x_{t}^{\prime}\beta+\tilde{u}_{t}$ and we redo the estimation on ($\tilde{y}_{t},x_{t}$).

This is repeated `NSim` times.

In [4]:
NSim      = 2000                 #no. of simulations
Random.seed!(123)

bBoot   = fill(NaN,(NSim,K))  
for i = 1:NSim                                       #loop over simulations
  #local t_i, vv_i, utilde, ytilde, b_i              #only needed in REPL/script
  t_i        = rand(1:T,T)                           #T random numbers from 1:T (with replacement)
  #println(t_i)                                      #uncomment to see which rows that are picked out
  utilde     = u[t_i]
  ytilde     = x*bLS + utilde[1:T]
  b_i,       = OlsFn(ytilde,x)                       #,skips the remaining outputs
  bBoot[i,:] = b_i
end

printblue("Coefficients:")
xx = [bLS  mean(bBoot,dims=1)']
printTable(xx,["OLS","avg. bootstr"],rowNames,width=20)

printblue("Std:")
xx = [StdbLS std(bBoot,dims=1)']
printTable(xx,["OLS","bootstr"],rowNames,width=20)

printstyled("looks like no particular need for bootstrap in this case, but...see below",color=:red,bold=true)

[34m[1mCoefficients:[22m[39m
                   OLS        avg. bootstr
x1              -3.306              -3.322
x2              -4.209              -4.238
x3              10.627              10.803
x4             -14.397             -14.763
x5               7.096               7.401
x6               1.284               1.200

[34m[1mStd:[22m[39m
                   OLS             bootstr
x1               0.943               0.831
x2               0.583               0.711
x3               4.378               4.582
x4              13.989              13.108
x5              18.094              16.073
x6               8.058               6.960

[31m[1mlooks like no particular need for bootstrap in this case, but...see below[22m[39m

## Block Bootstrap

To handle autocorrrelated residuals, we now consider a *block bootstrap*.


In each loop, we initially define a random starting point (row number) of each block (by using the rand() function)---and create a vector of all rows that are in a block. For instance, suppose we randomly draw that the blocks should start on rows $27$ and $35$ (...assuming only two blocks in each simulation) and that we have decided that each block should contain $10$ rows, then the artificial sample will pick out rows $27-36$ and $35-44$. Clearly, some rows can be in several blocks. Once we have $T$ rows, we define a new series of residuals, $\tilde{u}_{t}$.

Then, new values of the dependent variable are created as $\tilde{y}_{t}=x_{t}^{\prime}\beta+\tilde{u}_{t}$ and we redo the estimation on ($\tilde{y}_{t},x_{t}$).

In [5]:
BlockSize = 10                   #size of blocks
NSim      = 2000                 #no. of simulations
Random.seed!(123)

nBlocks = round(Int,ceil(T/BlockSize))             #number of blocks, rounded up
bBoot   = fill(NaN,(NSim,K*n))                       #vec(b), [beq1 beq2..beqn]
for i = 1:NSim                                       #loop over simulations
  #local t_i, vv_i, utilde, ytilde, b_i              #only needed in REPL/script
  t_i        = rand(1:T,nBlocks,1)                   #nBlocks x 1, random starting row of blocks
  t_i        = t_i .+ collect(0:BlockSize-1)'        #nBlocks x BlockSize, each row is a block
  vv_i       = t_i .> T
  t_i[vv_i]  = t_i[vv_i] .- T                        #wrap around if index > T
  #println(t_i)                                      #uncomment to see which rows that are picked out
  t_i        = vec(t_i')                             #column vector of the blocks
  utilde     = u[t_i,:]
  ytilde     = x*bLS + utilde[1:T,:]
  b_i,       = OlsFn(ytilde,x)                       #,skips the remaining outputs
  bBoot[i,:] = b_i
end

printblue("Std:")
xx = [StdbLS std(bBoot,dims=1)']
printTable(xx,["OLS","bootstr"],rowNames,width=20)

printstyled("this bootstrap handles autocorrelation",color=:red,bold=true) 

[34m[1mStd:[22m[39m
                   OLS             bootstr
x1               0.943               2.066
x2               0.583               1.398
x3               4.378               8.301
x4              13.989              23.732
x5              18.094              29.594
x6               8.058              13.115

[31m[1mthis bootstrap handles autocorrelation[22m[39m