Here we will simulate data from a model and estimate it by OLS

In [13]:
## The randn function generates draws from a standard normal
@show randn()
@show randn(5);

randn() = 0.9277612466058307
randn(5) = [1.23344, -1.48006, -0.652813, 1.52045, 1.0823]


Before we do the simulation with randn, I want to highlight a useful package, Distributions.jl. It is an all in one stop for all common probability distributions, computing statistics about them, getting their PDF or CDF, and drawing samples from them.

In [14]:
using Distributions
d = Pareto(2., 5.)

Pareto{Float64}(α=2.0, θ=5.0)

The center of Distributions is distribution objects like the one above. d is a Pareto(2, 5) random variable.

In [15]:
@show rand(d)
@show rand(d, 5)
@show cdf(d, 3.)
@show mean(d)
@show std(d);

rand(d) = 6.332489713646807
rand(d, 5) = [6.04932, 5.32044, 10.6168, 6.46565, 5.32643]
cdf(d, 3.0) = 0.0
mean(d) = 10.0
std(d) = Inf


Distributions also supports multivariate distributions, like the multivariate normal with a covariance matrix of your choice. It also supports truncated distributions and mixture models. All of the above methods will work regardless of how fancy your distribution becomes. For example, here is a mixture of a truncated normal and an exponential distribution.

In [16]:
@show d = Truncated(Normal(2, 3), 0, Inf)
@show rand(d)
@show rand(d, 5)
@show cdf(d, 3.)
@show mean(d)
@show std(d);
d2 = Exponential(0.3)
@show dm = MixtureModel([d, d2], [0.3, 0.7])
@show rand(d)
@show rand(d, 5)
@show cdf(d, 3.)
@show mean(d)
@show std(d);

d = Truncated(Normal(2, 3), 0, Inf) = Truncated(Normal{Float64}(μ=2.0, σ=3.0), range=(0.0, Inf))
rand(d) = 0.04414177092599969
rand(d, 5) = [6.38473, 2.51138, 3.16347, 4.40306, 3.28885]
cdf(d, 3.0) = 0.5057690274162923
mean(d) = 3.2820527749944897
std(d) = 2.189117432240662
dm = MixtureModel([d, d2], [0.3, 0.7]) = MixtureModel{Distribution{Univariate,Continuous}}(K = 2)
components[1] (prior = 0.3000): Truncated(Normal{Float64}(μ=2.0, σ=3.0), range=(0.0, Inf))
components[2] (prior = 0.7000): Exponential{Float64}(θ=0.3)

rand(d) = 2.073703111896333
rand(d, 5) = [1.59587, 3.45169, 1.26747, 2.87791, 4.9364]
cdf(d, 3.0) = 0.5057690274162923
mean(d) = 3.2820527749944897
std(d) = 2.189117432240662


Back to the econometrics simulation.

In [22]:
using Distributions
using Statistics # the var function is in the Statistics package
N = 100
X = rand(100, 2)
X = [ones(100, 1) X]
beta = [5
2
3]
dist = Normal()
u = rand(dist, 100)
Y = X*beta + u
betahat = (X' * X)\(X' * Y)
eps = Y - X*betahat
sigma_eps = var(eps)
betahat_var = (X' * X)^-1 * sigma_eps
(β_2_lower = betahat[2] - 1.96*sqrt(betahat_var[2, 2]), 
    β_2_upper = betahat[2] + 1.96*sqrt(betahat_var[2, 2]))

(β_2_lower = 1.7558828057294056, β_2_upper = 3.0240712772195666)

This is literally copy and pasted from the Matlab code. I just changed the () to [] for array indexing.

Here is the measurement error

In [24]:
Xnoise = copy(X) # This creates a copy of X and calls it Xnoise
Xnoise[:, 2] = X[:, 2] + rand(100)
betahat_noise = (Xnoise' * Xnoise)\(Xnoise' * Y)
betahat_noise[2]

1.3303752684925518

It is almost the same, but remember to copy A or you will be modifying the origianl data.