# Exercise 1: Simulate Data and Run OLS
**Zhentao Shi**


### Data Generation
If data is generated from the linear model $Y = X \beta+ \epsilon$, we can estimate $\beta$ from the observable $Y$ and $X$. This Notebooks is a step-by-step illustration. 

We first generate $X$ and $\epsilon$, and then we can generate $Y$ according to the linear model. We set the true parameter as $\beta = (0.5, 1)$.

In [1]:
using Distributions

n = 100
b0 = [ 0.5; 1.0 ]

X = 2 * rand(Normal(), n,1)
X = hcat(ones(n,1), X )
e = rand(Normal(), n)
Y = X * b0 + e

100-element Array{Float64,1}:
  4.25688 
  1.02422 
 -4.07049 
  0.372016
  0.603023
 -1.97952 
  0.615611
  3.46563 
  2.50096 
 -0.493374
  2.44082 
 -1.60009 
  0.251801
  ⋮       
 -3.81217 
  0.831073
  1.17887 
  1.14129 
 -0.736798
 -0.467653
 -2.85293 
  0.783885
 -1.56785 
  0.268431
  0.122983
 -1.98263 

### Parameter Estimation
In the estimation step, we will try to recover $\beta$ using $Y$ and $X$. In real world, we have no access to $\epsilon$. 

We know that the OLS has a closed-form solution $$\hat{\beta} = (X'X)^{-1} X'Y.$$ We literally translate the mathematical expression into code.

In [2]:
bhat = inv( X' * X) * (X' * Y )

2-element Array{Float64,1}:
 0.431186
 0.998171

The estimate is indeed close to the true paramter. It shows that the law of large numbers is not an empty promise.

Next, we calculate the t-statistics for $\beta_2$, the slope coefficient. Again, we translate
$$ T = \frac{\beta_2}{\sqrt{ \hat{\sigma}^2 \left[ (X'X)^{-1} \right]_{22} }  } $$ into code

In [3]:
e_hat = Y - X * bhat
bhat2 = bhat[2]

sigma_hat_square = sum(e_hat.^2)/(n-2)
sig_B = inv( X' * X ) * sigma_hat_square
t_value = bhat2 / sqrt( sig_B[2,2] )

20.96060169175572

The result shows that the null hypothesis that $\beta_2 = 0$ can be rejected at any commonly used significance levels.