# __Monte Carlo Simulation for OLS Regression__

<br>

In [1]:
using Random
using StatsKit
using StatsPlots

### __Step 1: Design the True/Unknown Data-Generating Process__

In [2]:
n = 1000
X = rand(n, 2);

In [3]:
X[1:end,1] .= 1.0;

In [4]:
X[1:10,:]

10×2 Array{Float64,2}:
 1.0  0.137567
 1.0  0.656139
 1.0  0.0721945
 1.0  0.297141
 1.0  0.804731
 1.0  0.5953
 1.0  0.124095
 1.0  0.876785
 1.0  0.210262
 1.0  0.265158

### __Step 1 - Part A__

<br>

Select the true parameter values

<br>

So our regression equation will become:

$$
y_{i} = \alpha + \beta X_{i} + \varepsilon_{i}
$$

<br>

In [5]:
β = [0.98, 2.34]

2-element Array{Float64,1}:
 0.98
 2.34

In [6]:
u = 0.1 * randn(n);
y = X * β + u;      # suppress output

In [7]:
betahat = llsq(X, y; bias=false)

2-element Array{Float64,1}:
 0.9710240780494982
 2.3379347787034614

### __Step 2: Run a Sampling Theory Simulation__

In [8]:
m = 1_000_000

1000000

In [9]:
betahat = zeros(m);

In [10]:
for i in 1:m
    ε = 0.1 * randn(n);                # the error-term is the sort-of stochastic driver
    y = X * β + ε;                     # the data-generating process ("population")
    betatmp = llsq(X, y; bias=false);  # run the OLS regression (via optimization)
    betahat[i] = betatmp[2]            # we want the slope coefficient estimate
end

In [11]:
(round(mean(betahat); digits=3), round(std(betahat); digits=3))

(2.34, 0.011)

In [12]:
size(y)

(1000,)

In [13]:
typeof(y)

Array{Float64,1}