# Case 2: Properties of OLS and simulation methods

by Milan Van den Heuvel, Ken Bastiaensen, Gonzalo Villa
*Advanced Econometrics 2016-2017.

Strict set of GM assumptions:
* X is deterministic, x is thus fixed over repeated samples
* errors $\mu$ are normally distributed with assumed homoscedastic errors

#### Question: *Give the small sample and asymptotic properties of the OLS estimator for $\beta$ and for the estimator of the standard errors.*

Small sample properties:
* OLS is the best unbiased estimator
* The estimator is normally distributed (stems from the fact that $\hat{\beta}$ is linear function of the disturbance vector $\mu$)
* The covariance matrix $\sigma^2(X'X)^{-1}$ can be estimated with an unbiased estimator of $\sigma^2$ given by:

$$\hat{\sigma}^2 = \frac{\hat{\mu}'\hat{\mu}}{N-K} = \frac{y'My}{N-K}$$


Asymptotic properties:
* same under the GM conditions
* $\bar{x}_N$ assymptotically approaches $N(\mu,\frac{\sigma^2}{N})$

# Part 1: Properties of Monte Carlo simulations

In [1]:
# include functions from file
include("functions_lib.jl"); 

In [2]:
# Note that we can use unicode for identifiers by using latex and tab completion (e.g. \beta+<TAB>)
β₀ = 10
β₁ = 1
β  = [β₀, β₁]
σ² = 1
T  = 50 # sample size
runs = 10_000 # underscore for readability, doesn't affect number

10000

We create a function to run MC simulations:
1. specify a population = N(5,2) and draw a sample once to have a deterministic sample.
1. simulate y by simulating errors with variance $\sigma^2$ (= 1 here).
1. run ols and store results.
1. The 'true' standard errors is the standard deviation over all estimated $\hat\beta$ (True SE = $\sum_{run=0}^{runs} se(\hat\beta_{run}$)).
1. return true value and mean of estimated for $\beta$ as well as for its standard error.


In [3]:
using Distributions: Normal, TDist, ccdf, fit

In [4]:
# simple implementation
function mc_simple(β, σ², T, runs)
    K = length(β)
    
    # simulate X once, deterministically
    X  = hcat(ones(T), rand(Normal(5, 2), T, K-1)) #concatenation of column of ones for the constant terms and the randomly drawn x's for the beta terms 
    
    # variables with mc results
    β_mc    = zeros(runs, K)
    β_var_mc= zeros(runs, K)

    # pre-allocate memory to speed up value-allocation process
    Xβ = X * β
    μ_dist = Normal(0, √σ²) 
    
    for run = 1:runs
        y = Xβ + rand(μ_dist, T)
        result = ols(y, X)
        
        β_mc[run, :] = result.coefs
        β_var_mc[run, :] = diag(result.vcv)
    end
    
    return β, mean(β_mc,1), sqrt(mean(β_var_mc,1)), std(β_mc,1)
end

mc_simple (generic function with 1 method)

In [5]:
mc_simple(β, σ², T, runs)

([10,1],
[10.0006 1.00006],

[0.338415 0.0620775],

[0.341192 0.0625976])

#### Interlude: Julia speedups
You can profile the code to identify possible speedups. We see that most of the time is spent in solving OLS. Because X is deterministic and we only need to factorize it once. Changing this part of the code almost doubles the speed. See the bottom of this notebook.

### Comparison to true standard errors
We see that the mean of the estimated standard errors are close to the 'true' standard errors, even when running only 100 simulations for 25 samples:

In [6]:
for T = [25, 50, 100, 500]
    True_β, Est_β, True_σ, Est_σ = mc_simple(β, σ², T, 100)
    println("For sample size: ", T, " True_β: ", True_β," Est_β: ", Est_β, " True_σ: ", True_σ, " Est_σ: ", Est_σ)
end

For sample size: 25 True_β: [10,1] Est_β: [10.0165 0.997491] True_σ: [0.542967 0.0879833] Est_σ: [0.463739 0.0760987]
For sample size: 50 True_β: [10,1] Est_β: [9.99839 0.999259] True_σ: [0.332692 0.062161] Est_σ: [0.303678 0.0541545]
For sample size: 100 True_β: [10,1] Est_β: [9.99387 0.999491] True_σ: [0.218675 0.0418844] Est_σ: [0.192082 0.0372891]
For sample size: 500 True_β: [10,1] Est_β: [9.99949 1.00009] True_σ: [0.11618 0.0221551] Est_σ: [0.123534 0.0245455]


From these results we clearly see that under the strict assumptions of Gauss-Markov the OLS estimator for $\beta$ ($\hat{\beta}$) and the estimator for the standard deviation ($\hat{\sigma}$) of this estimator are unbiased and very close to the true values. Since the error terms are pulled from a normal distribution and $\hat{\beta}$ is a weighted sum of these, it is itself also normally distributed.

# TODO: We can test normality with , fit(Normal, β_mc[:,1]) but does this make sense? Since it is assumed that the errors are normally distributed

### t-test
We now perform a t-test for several null hypothesis for $\beta_1 = 1; 0.9; 0.8$ and this for several sample sizes, we also report the p-values.

### TODO: what is the power or size of the t-test? Maybe I (Ken) did not understand this correctly.

In [7]:
runs = 10_000
for T = [25, 50, 100, 500, 10000]
    println("## T = ",T," ##")
    for β₁_hyp = [1, 0.9, 0.7, 0.5]
        _, β_mean, _, β_se = mc_simple(β, σ², T, runs)
        K = size(β_mean)[1] #amount of estimated parameters = amount of d.o.f. lost
        ttest = (β_mean[2] - β₁_hyp) / β_se[2]
        pval  = 2 * ccdf(TDist(T-K), abs(ttest)) # what is the change that if you reject a correct null
        println("β₁=", β₁_hyp, "; T-test: ", ttest)
        println("β₁=", β₁_hyp, "; P-val: ", pval)
    end
end

## T = 25 ##
β₁=1.0; T-test: 0.0060736090013119685
β₁=1.0; P-val: 0.9952041950056328
β₁=0.9; T-test: 1.0530422551435958
β₁=0.9; P-val: 0.30280873325977387
β₁=0.7; T-test: 3.100813639376858
β₁=0.7; P-val: 0.0048779905454368255
β₁=0.5; T-test: 5.4091483070015665
β₁=0.5; P-val: 1.48041689492419e-5
## T = 50 ##
β₁=1.0; T-test: -0.0025418572173852697
β₁=1.0; P-val: 0.9979822140145709
β₁=0.9; T-test: 1.2893395023387546
β₁=0.9; P-val: 0.20333328273644719
β₁=0.7; T-test: 4.646940594556088
β₁=0.7; P-val: 2.5658434377650467e-5
β₁=0.5; T-test: 6.679799136129635
β₁=0.5; P-val: 2.0794038580488235e-8
## T = 100 ##
β₁=1.0; T-test: 0.0037406686708817277
β₁=1.0; P-val: 0.9970229125209689
β₁=0.9; T-test: 1.9510889508416305
β₁=0.9; P-val: 0.05387316616972994
β₁=0.7; T-test: 5.69193558613669
β₁=0.7; P-val: 1.2839639062642421e-7
β₁=0.5; T-test: 9.667700698816956
β₁=0.5; P-val: 5.808007395946908e-16
## T = 500 ##
β₁=1.0; T-test: -0.012285964121941842
β₁=1.0; P-val: 0.9902023758702949
β₁=0.9; T-test: 4.29909

Here we see the consistency of the monte carlo simulation. With T growing, the distribution of the estimated parameters grows more peaked around the real values.

## MC with stochastic variables

X and $\mu$ still independent

OLS estimator is still:
* unbiased 
* efficient 

but in small samples no longer necessarily normally distributed and the standard covariance matrix should be interpreted as being conditional on X. Since however X is taken from a normal distribution, the deviation from normality of the OLS estimator will not be large.

In [21]:
function mc_stoch(β, σ², T, runs)
    K = length(β)
    
    # variables with mc results
    β_mc    = zeros(runs, length(β))
    β_var_mc= zeros(runs, length(β))

    # pre-allocate
    μ_dist = Normal(0, √σ²)
    X_dist = Normal(5, 2)
    
    X = ones(T, K)
    for run = 1:runs
        # simulate inside the loop
        X[:, 2:end] = rand(X_dist, T, K-1)
        y = X*β + rand(μ_dist, T)
        result = ols(y, X)
        
        β_mc[run, :] = result.coefs
        β_var_mc[run,:] = diag(result.vcv)
    end
    
    return β, mean(β_mc,1), sqrt(mean(β_var_mc,1)), std(β_mc,1), fit(Normal, β_mc[:,1])
end



mc_stoch (generic function with 1 method)

In [27]:
mc_stoch(β, σ², 10, 5)

([10,1],
[9.32497 1.1224],

[0.923496 0.155],

[1.27586 0.199973],

Distributions.Normal{Float64}(μ=9.324970913771605, σ=1.1411617422844988))

In [25]:
runs = 10_000
for T = [25, 50, 100, 500]
    println("## T = ",T," ##")
    for β₁_hyp = [1, 0.9, 0.7, 0.5]
        _, β_mean, _, β_se = mc_stoch(β, σ², T, runs)
        K = size(β_mean)[1] #amount of estimated parameters = amount of d.o.f. lost
        ttest = (β_mean[2] - β₁_hyp) / β_se[2]
        pval  = 2 * ccdf(TDist(T-K), abs(ttest)) # what is the change that you reject a correct null
        println("β₁=", β₁_hyp, "; T-test: ", ttest)
        println("β₁=", β₁_hyp, "; P-val: ", pval)
    end
end

## T = 25 ##
β₁=1.0; T-test: -0.0029588588003249387
β₁=1.0; P-val: 0.9976636330856012
β₁=0.9; T-test: 0.9250232577225256
β₁=0.9; P-val: 0.36416206205623913
β₁=0.7; T-test: 2.823112841286666
β₁=0.7; P-val: 0.00940948971765497
β₁=0.5; T-test: 4.738214590054209
β₁=0.5; P-val: 8.079945651991213e-5
## T = 50 ##
β₁=1.0; T-test: -0.0030221903386905
β₁=1.0; P-val: 0.9976009153868829
β₁=0.9; T-test: 1.3915116684132078
β₁=0.9; P-val: 0.17035570515021242
β₁=0.7; T-test: 4.091408374438089
β₁=0.7; P-val: 0.00015941531212344947
β₁=0.5; T-test: 6.857823265695702
β₁=0.5; P-val: 1.1016171203220574e-8
## T = 100 ##
β₁=1.0; T-test: 0.01774304827518647
β₁=1.0; P-val: 0.9858795481481266
β₁=0.9; T-test: 1.9745924190704789
β₁=0.9; P-val: 0.051100047639186366
β₁=0.7; T-test: 5.9398718426254336
β₁=0.7; P-val: 4.2556725716321566e-8
β₁=0.5; T-test: 9.753685074389109
β₁=0.5; P-val: 3.7706183530730843e-16
## T = 500 ##
β₁=1.0; T-test: 0.021174110247224452
β₁=1.0; P-val: 0.9831152306276953
β₁=0.9; T-test: 4.4370843

## TODO: what's the conclusion here?

SHOULD BE: in small samples no longer normally distributed and the standard covariance matrix should be interpreted as being conditional on X

BUT OUR SIMULATIONS SEEM TO STILL GIVE VERY GOOD RESULTS WITH RELATIVELY LOW SAMPLE SIZE AND RUNS.

# Lagged Dependent Variable

Introducing lagged dependent variables makes it so that the assumption "X and $\mu$ are independent" has to be relaxed to $E[\mu_t|x_t] = 0$ or thus that the errors are contemporaneously independent with any explanatory variables.

The OLS estimator becomes:
* Biased: $E[\hat{\beta}|X] = \beta + (X'X)^{-1}X'E[\mu|X]$ => $E[\hat{\beta}] = E_X(E[\hat{\beta}|X]) \neq \beta$
* Consistent and asymptotically normally distributed: $plim\hat{\beta} = \beta + plim \frac{X'X}{T}^{-1} plim\frac{X'\mu}{T}$ = 0 because $plim\frac{X'\mu}{T} = E(x_t\mu_t) = 0$
* $\hat{\sigma}^2 = \frac{\hat{\mu}'\hat{\mu}}{T-k}$ is still a consistent estimator for $\sigma^2$

In [11]:
# AR1 MC simulation
function mc_ar1(β, σ², T, runs)
    K = length(β)
    β₀, β₁ = β
    σ = √σ² # = sqrt(σ²)
    
    # variables with mc averages
    β_mc     = zeros(runs, K)
    β_var_mc = zeros(runs, K)

    # pre-allocate
    y = zeros(T)
    X = ones(T, K) # fill second column with y_{t-1}
    y₀_dist = Normal(β₀/(1-β₁), sqrt(σ²/(1-β₁^2)))
    
    for run = 1:runs
        # simulate y
        y₀ = rand(y₀_dist) 
        y[1] = β₀ + β₁*y₀ + σ*randn() 
        for t = 2:T
            y[t] = β₀ + β₁*y[t-1] + σ*randn() 
        end
        # copy into X
        X[1,2] = y₀
        X[2:end, 2] = y[1:end-1]
        
        # ols
        result = ols(y, X)
        β_mc[run,:]    = result.coefs
        β_var_mc[run,:]= diag(result.vcv)
    end
    
    return β, mean(β_mc,1), sqrt(mean(β_var_mc,1)), std(β_mc,1)
end

mc_ar1 (generic function with 1 method)

In [12]:
β₀, β₁ = 10, 0.1
σ² = 1
T = 1000
runs = 10_000
@time mc_ar1([β₀, β₁], σ², T, runs)

  1.049828 seconds (1.16 M allocations: 1.150 GB, 24.74% gc time)


([10.0,0.1],
[10.0124 0.0988979],

[0.351253 0.0314839],

[0.348914 0.0313006])

Let's plot the bias

In [13]:
using Plots
gr();

In [14]:
Ts  = vcat(collect(10:10:90), collect(100:25:500))
β₁s = [0, 0.5, 0.9]
β̂ = [mc_ar1([β₀, β₁], σ², T, runs)[2][1] for T in Ts, β₁ in β₁s]

26×3 Array{Float64,2}:
 10.9795  14.7284  46.9781
 10.5006  12.5083  30.2191
 10.3471  11.6767  23.5593
 10.2375  11.23    20.1191
 10.1949  11.0061  18.1422
 10.1691  10.8147  16.7824
 10.1324  10.6807  15.6284
 10.118   10.6353  14.9455
 10.1127  10.5494  14.4287
 10.0969  10.4893  14.0054
 10.0764  10.4085  13.0908
 10.0658  10.3243  12.6708
 10.0599  10.2779  12.202 
 10.0507  10.2677  11.8743
 10.039   10.2317  11.6509
 10.0324  10.2044  11.538 
 10.0324  10.1936  11.428 
 10.0278  10.157   11.2659
 10.0386  10.1478  11.2317
 10.0144  10.1343  11.0474
 10.0228  10.1375  11.0029
 10.0379  10.135   10.93  
 10.0259  10.1131  10.8937
 10.0275  10.1097  10.8319
 10.0171  10.1017  10.7894
 10.0171  10.1123  10.7677

In [15]:
plot(Ts, β̂, label=string.(β₁s'))

Given a certain sample size and estimated AR(1) coefficient, you can use the matrix for $\hat\beta$ (or the graph) to estimate the bias for $\beta_1$ (note that reported values are relative to 10).

## Appendix: Julia performance profiling

In [16]:
# around 4s on my laptop
@time mc_simple(β, σ², T, 100_000);

  7.614691 seconds (11.20 M allocations: 12.241 GB, 22.79% gc time)


In [17]:
using ProfileView #run `Pkg.add("ProfileView")` if not yet installed

LoadError: ArgumentError: Module ProfileView not found in current path.
Run `Pkg.add("ProfileView")` to install the ProfileView package.

In [18]:
Profile.clear()
@profile mc_simple(β, σ², T, 1_000)
ProfileView.view() #interactive graph with mouse over,  scroll and drag

LoadError: UndefVarError: ProfileView not defined

In [19]:
# implementation that factorizes X only once
function mc_fact(β, σ², T, runs)
    
    # simulate X once, deterministically
    X = hcat(ones(T), rand(Normal(5, 2), T))
    
    # variables with mc results    
    β_mc    = zeros(runs, length(β))
    β_var_mc= zeros(runs) #only keep σ̂²T = dot(μ̂, μ̂) = σ̂²*(T-K) per run

    # pre-allocate
    Xβ     = X * β
    μ_dist = Normal(0, √σ²)
    x_fact = factorize(X)
    XtXinvd= diag(inv(X'*X))
    
    for run = 1:runs
        y = Xβ  + rand(μ_dist, T)
        β̂ = x_fact \ y #factorization already done now
        μ̂ = y - X * β̂
        σ̂²T = dot(μ̂, μ̂) #put factor /(T-K) outside of loop
        
        β_mc[run, :]    = β̂
        β_var_mc[run,:] = σ̂²T
    end
    se_true = std(β_mc, 1)
    se_mc   = sqrt(mean(β_var_mc) / (T - length(β)) * XtXinvd)
    return β, mean(β_mc, 1), se_true, se_mc
end

mc_fact (generic function with 1 method)

In [20]:
# runs in about 2s on my laptop
mc_fact(β, σ², 25, 1) # first run includes JIT compilation
@time mc_fact(β, σ², T, 100_000) 

  7.946002 seconds (6.80 M allocations: 10.332 GB, 43.29% gc time)


([10,1],
[10.0001 0.999961],

[0.0846251 0.0157348],

[0.0844292,0.0156817])