# Example of Basic OLS

The first few cells load functions and data. See further down for the calculations.

In [1]:
import Formatting

include("jlFiles/printmat.jl")
include("jlFiles/NWFn.jl")
include("jlFiles/HDirProdFn.jl")
include("jlFiles/OlsFn.jl")
include("jlFiles/Ols2Fn.jl")
include("jlFiles/OlsDiagnosticsFn.jl")
include("jlFiles/excise.jl")
#include("jlFiles/lagnPs.jl")

using StatsBase, Distributions

In [2]:
xx   = readdlm("Data/FFmFactorsPs.csv",',',header=true)
x    = xx[1]
ym   = x[:,1]                                      #[yearmonth]
x    = x[:,2:end]/100
Rme  = x[:,1]
RSMB = x[:,2]                #small minus big firms
RHML = x[:,3]                #high minus low book-to-market ratio

388-element Array{Float64,1}:
  0.0228
  0.0117
 -0.0067
  0.0106
  0.0162
  0.0142
  0.0175
 -0.0153
 -0.0091
 -0.0187
 -0.0334
 -0.0204
  0.0178
  ⋮     
 -0.0236
 -0.0431
  0.0023
 -0.0171
 -0.0305
 -0.0221
 -0.0066
  0.0353
  0.0096
  0.0138
 -0.0131
 -0.0231

## Point Estimates

Consider the linear regression

$$
y_{t}=\beta^{\prime}x_{t}+\varepsilon_{t},
$$

where $y_{t}$ is a scalar and $x_{t}$ is $k\times1$. The OLS estimate is

$$
\begin{align*}
\hat{\beta} &  =S_{xx}^{-1}S_{xy}, \ \text{ where }\\
S_{xx}      &  =\frac{1}{T}\sum\nolimits_{t=1}^{T}x_{t}x_{t}^{\prime}
\ \text{ and } \ S_{xy}=\frac{1}{T}\sum\nolimits_{t=1}^{T}x_{t}y_{t}.
\end{align*}
$$

(The $1/T$ terms clearly cancel, but are sometimes useful to keep to preserve
numerical precision.)

Instead of these sums (loops over $t$), matrix multiplication can be used to
speed up the calculations. Create matrices $X_{T\times k}$ and $Y_{T\times1}$
by letting $x_{t}^{\prime}$ and $y_{t}$ be the $t^{th}$ rows

$$
X_{T\times k}=\left[
\begin{array}[c]{l}
x_{1}^{\prime}\\
\vdots\\
x_{T}^{\prime}
\end{array}
\right] \ \text{ and } \ Y_{T\times1}=\left[
\begin{array}[c]{l}
y_{1}\\
\vdots\\
y_{T}
\end{array}
\right].
$$

We can then calculate the same matrices as
$$
\begin{align*}
S_{xx}      &  =X^{\prime}X/T \ \text{ and } \ S_{xy}=X^{\prime}Y/T\text{, so }\\
\hat{\beta} &  =(X^{\prime}X)^{-1}X^{\prime}Y.
\end{align*}
$$

However, instead of inverting $S_{xx}$, we typically get much better numerical
precision by solving the system of $T$ equations

$$
X_{T\times k}b_{k\times1}=Y_{T\times1}
$$

for the $k\times1$ vector $b$ that minimizes the sum of squared errors. This
is easily done by using the command:

b = X\Y

In [3]:
Y = Rme
X = [ones(size(Rme,1)) RSMB RHML]

(T,K) = size(X)
S_xx = 0.0
S_xy = 0.0
for t = 1:T
    x_t = X[t:t,:]'               #x_t is 2x1
    y_t = Y[t:t,:]                #y_t is a 1x1 vector, helps extending
  S_xx = S_xx + x_t*x_t'/T   #2x2
  S_xy = S_xy + x_t*y_t/T    #2x1
end
b1 = inv(S_xx)*S_xy          #OLS coeffs, version 1

b2 = inv(X'X)*X'Y            #OLS coeffs, version 2

b3 = X\Y                     #OLS coeffs, version 3

println("\nb1, b2 and b3")
printmat([b1 b2 b3])


b1, b2 and b3
     0.007     0.007     0.007
     0.217     0.217     0.217
    -0.429    -0.429    -0.429



## Distribution

To apply the GMM formulas

$$
\sqrt{T}(\hat{\beta}-\beta_{0})\overset{d}{\rightarrow}N(0,V)
\ \text{ where } \ V=\left(  D_{0}^{\prime}S_{0}^{-1}D_{0}\right)  ^{-1}
$$

to the OLS case, first define the moment conditions

$$
g_{t}=x_{t}(y_{t}-x_{t}^{\prime}\beta),
$$

then find $S_{0}$ (covariance matrix of $\sqrt{T}\bar{g}$) and recall that
$D_{0}=-\sum_{t=1}^{T}x_{t}x_{t}^{\prime}/T$.

In [4]:
b = X\Y
u = Y - X*b              #residuals
g = X.*repmat(u,1,K)     #moment conditions
println("\navg moment conditions")
printmat(mean(g,1))

S = NWFn(g,1)            #Newey-West covariance matrix
D = -X'X/T
V = inv(D'inv(S)*D)     #Cov(sqrt(T)*b)

println("\nb and std(b)")
printmat([b3 sqrt.(diag(V/T))])

(b4,res,yhat_,CovbLS_,R2_,T_,CovbNW4) = Ols2Fn(Y,X,1)
println("\nOLS with NW standard errors")
printmat([b4 sqrt.(diag(CovbNW4))])


avg moment conditions
     0.000    -0.000     0.000


b and std(b)
     0.007     0.002
     0.217     0.124
    -0.429     0.108


OLS with NW standard errors

## Testing a Hypothesis

Since the estimator $\hat{\beta}_{_{k\times1}}$ satisfies

$$
\sqrt{T}(\hat{\beta}-\beta_{0})\overset{d}{\rightarrow}N\left(  0,V_{k\times k}\right)  ,
$$

we can easily apply various tests. To test a joint linear hypothesis of the
form

$$
\gamma_{q\times1}=R\beta-a,
$$

use the test
$$
(R\beta-a)^{\prime}\left(  \Lambda/T\right)  ^{-1}(R\beta
-a)\overset{d}{\rightarrow}\chi_{q}^{2}\text{, where }\Lambda=RVR^{\prime}.
$$

In [5]:
R = [0 1 0;               #testing if b(2)=0 and b(3)=0
     0 0 1]
a = [0;0]
Gamma = R*V*R'
test_stat = (R*b-a)'inv(Gamma/T)*(R*b-a)
println("\ntest-statictic and 10% critical value of chi-square(2)")
printmat([test_stat 4.61])

(AutoCorr,DW,BoxPierce,White,Regr) = OlsDiagnosticsFn(Y,X,u,2)     #diagnostics
println("\nDiagnostics, std (df)")
println("lag, autoCorr. p-val:")
printmat([1:2 AutoCorr])
println("BoxPierce: stat, p-val, df")
printmat(BoxPierce)
println("White: stat,p-val, df ")
printmat(White)
println("Test of all slopes: stat, p-val, df")
printmat(Regr)


test-statictic and 10% critical value of chi-square(2)
    26.059     4.610


Diagnostics, std (df)
lag, autoCorr. p-val:
     1.000     1.467     0.142
     2.000    -0.733     0.464

BoxPierce: stat, p-val, df
     2.689     0.261     2.000

White: stat,p-val, df 
    77.278     0.000     5.000

Test of all slopes: stat, p-val, df
    60.165     0.000     2.000

