# OLS, Testing

This notebook estimates a linear regression and tests various hypotheses using standard errors assuming (a) iid residuals (Gauss-Markov assumptions); (b) heteroskedasticity (White); (c) autocorrelation and heteroskedasticity (Newey-West).

## Load Packages and Extra Functions

In [1]:
using Printf, DelimitedFiles, Statistics, LinearAlgebra, Distributions

include("jlFiles/printmat.jl")
include("jlFiles/Ols.jl")         #functions for OLS

OlsNWFn

## Loading Data

In [2]:
x = readdlm("Data/FFmFactorsPs.csv",',',skipstart=1)

                #yearmonth, market, small minus big, high minus low
(ym,Rme,RSMB,RHML) = (x[:,1],x[:,2]/100,x[:,3]/100,x[:,4]/100) 
x = nothing

printlnPs("Sample size:",size(Rme))

Sample size:    (388,)


## OLS (assuming iid residuals)

In [3]:
Y = Rme
T = size(Y,1)
X = [ones(T) RSMB RHML]

(b,u,_,V,R2) = OlsGMFn(Y,X)
std_iid = sqrt.(diag(V))

printblue("OLS Results (assuming iid residuals)\n")
rowNames = ["c","SMB","HML"]
printmat([b std_iid],colNames=["b","std_iid"],rowNames=rowNames)

[34m[1mOLS Results (assuming iid residuals)[22m[39m

            b   std_iid
c       0.007     0.002
SMB     0.217     0.073
HML    -0.429     0.074



# Testing a Joint Hypothesis

Since the estimator $\hat{\beta}_{_{k\times1}}$ satisfies

$
\hat{\beta}-\beta_{0} \sim N(0,V_{k\times k})  ,
$

we can easily apply various tests. Consider a joint linear hypothesis of the
form

$
H_0: R\beta=q,
$

where $R$ is a $J \times K$ matrix and $q$ is a $J$-vector. To test this, use

$
(R\beta-q)^{\prime}(RVR^{\prime}) ^{-1}(R\beta
-q)\overset{d}{\rightarrow}\chi_{J}^{2}.
$

Clearly, this approach is valid also when $V$ has been estimated by one of the other methods discussed below (allowing for heteroskedasticity and/or autocorrelation).

In [4]:
R = [0 1 0;               #testing if b(2)=0 and b(3)=0
     0 0 1]
q = [0;0]
test_stat = (R*b-q)'inv(R*V*R')*(R*b-q)    #R*V*R' is 2x2

printblue("Testing Rb = a:")
println("test-statistic and 10% critical value of chi-square(2)")
printmat([test_stat quantile(Chisq(2),0.9)])

[34m[1mTesting Rb = a:[22m[39m
test-statistic and 10% critical value of chi-square(2)
    60.010     4.605



# Distribution of OLS Estimates without Gauss-Markov


The distribution of the OLS estimates is (typically)

$
(\hat{\beta}-\beta_{0})\overset{d}{\rightarrow}N(0,V)
\: \text{ where } \: V=S_{xx}^{-1} S S_{xx}^{-1}
$

and where $S$ is the covariance matrix of $\sum_{t=1}^{T}u_{t}x_{t}$.

*When* the Gauss-Markov assumptions do hold, then $S$ can be simplified as $S=S_{xx}\sigma^2$, where $\sigma^2$ is the variance of $u_t$. Clearly, this means that $V$ can be simplified to $V=S_{xx}^{-1}\sigma^2$. 

In contrast, with heteroskedasticity and/or autocorrelation $S$ must be estimated differently.

# White's Covariance Matrix

We can calculate `S` as 
```
K = size(X,2)
S = zeros(K,K)
for t = 1:T
    S = S + X[t,:]*X[t,:]'*u[t]^2
end
```

or (more compactly) as 
```
S = (X.*u)'*(X.*u)
```

In [5]:
Sxx = X'X

S     = (X.*u)'*(X.*u)                #
V     = inv(Sxx)'S*inv(Sxx)           #Cov(b), White
std_W = sqrt.(diag(V))

printblue("Standard errors from different methods:")
xx = [b std_iid std_W]
printmat(xx,colNames=["b","std_iid","std_White"],rowNames=rowNames,width=12)

[34m[1mStandard errors from different methods:[22m[39m
              b     std_iid   std_White
c         0.007       0.002       0.002
SMB       0.217       0.073       0.113
HML      -0.429       0.074       0.097



# Newey-West's Covariance Matrix

Let $g_t$ be a $K$-vector of data.

To calculate the Newey-West covariance matrix, we first need the 
autocovariance matrices (multiplied by $T$) $\Lambda_{s}=T\text{Cov}(g_{t},g_{t-s})  $, 
which are estimated as 

$ 
\Lambda_{s} = \sum_{t=s+1}^{T} (g_{t}-\bar{g})(g_{t-s}-\bar{g})^{\prime}.
$

Then we form a linear
combination (with tent-shaped weights) of those autocovariance matrices (from
lag $-m$ to $m$), or equivalently

$
\text{Cov}(\bar{g})  = 
\Lambda_{0} + \sum_{s=1}^{m}( 1-\frac{s}{m+1})  
(\Lambda_{s}+\Lambda_{s}^{\prime}).
$

In [6]:
"""
    CovNWFn(g0,m=0)

Calculates covariance matrix of sample average.

# Input
- `g0::Matrix`: Txq Matrix of q moment conditions
- `m:int`:     scalar, number of lags to use

# Output
- `S::Matrix`: qxq covariance matrix(average g0)

"""
function CovNWFn(g0,m=0)

    T = size(g0,1)                    #g0 is Txq
    m = min(m,T-1)                    #number of lags

    g = g0 .- mean(g0,dims=1)         #normalizing to zero means

    S = g'g                           #(qxT)*(Txq)
    for s = 1:m
        Λ_s = g[s+1:T,:]'g[1:T-s,:]   #same as Sum[g_t*g_{t-s}',t=s+1,T]
        S   = S  +  (1 - s/(m+1))*(Λ_s + Λ_s')
    end
  
    return S

end

CovNWFn

In [7]:
S      = CovNWFn(X.*u,2)         #Newey-West covariance matrix
V      = inv(Sxx)'S*inv(Sxx)     #Cov(b), Newey-West
std_NW = sqrt.(diag(V))

printblue("Standard errors from different methods:")
xx = [b std_iid std_W std_NW]
printmat(xx,colNames=["b","std_iid","std_White","std_NW"],rowNames=rowNames,width=12)

[34m[1mStandard errors from different methods:[22m[39m
              b     std_iid   std_White      std_NW
c         0.007       0.002       0.002       0.002
SMB       0.217       0.073       0.113       0.129
HML      -0.429       0.074       0.097       0.118

