# OLS Diagnostics


This notebook tests (a) the fit of a regression model; (b) properties of the residuals (heteroskedasticity and autocorrelation).

You may also consider the [HypothesisTests.jl](https://github.com/JuliaStats/HypothesisTests.jl) package (not used here).

## Load Packages and Extra Functions

In [1]:
using Printf, DelimitedFiles, Statistics, LinearAlgebra, StatsBase, Distributions

include("jlFiles/printmat.jl") 
include("jlFiles/Ols.jl")        #functions for OLS

OlsNWFn

## Loading Data

In [3]:
x = readdlm("Data/FFmFactorsPs.csv",',',skipstart=1)

                #yearmonth, market, small minus big, high minus low
(ym,Rme,RSMB,RHML) = (x[:,1],x[:,2]/100,x[:,3]/100,x[:,4]/100) 
x = nothing
println(size(Rme))

Y = Rme         #or copy(Rme) is independent copies are needed
T = size(Y,1)
X = [ones(T) RSMB RHML];

(388,)


In [4]:
(b,u,_,V,R²) = OlsGMFn(Y,X)    #do OLS
Stdb = sqrt.(diag(V))

printblue("OLS with traditional standard errors:\n")
xNames = ["c","SMB","HML"]
printmat([b Stdb],colNames=["coef","std"],rowNames=xNames)

[34m[1mOLS with traditional standard errors:[22m[39m

         coef       std
c       0.007     0.002
SMB     0.217     0.073
HML    -0.429     0.074



## Regression Diagnostics: Testing All Slope Coefficients

The function in the next cell tests all slope coefficients (or equivalently, the $R^2$) of a regression.

In [5]:
"""
    OlsR2TestFn(R²,T,df)

Test of all slope coefficients. Notice that the regression must contain an intercept for R² to be useful.

# Input
- `R²::Number`:    R² value
- `T::Int`:        number of observations
- `df::Number`:    number of (non-constant) regressors

# Output
- `RegrStat::Number`: test statistic
- `pval::Number`:     p-value

"""
function OlsR2TestFn(R²,T,df)
    RegrStat = T*R²/(1-R²)           #R\^2[TAB]
    pval     = 1 - cdf(Chisq(df),RegrStat)
    return RegrStat, pval
end

OlsR2TestFn

In [6]:
df = size(X,2) - 1
(RegrStat,pval) = OlsR2TestFn(R²,T,df)

printblue("Test of all slopes = 0:\n")
printmat([RegrStat,pval],rowNames=["stat","p-val"]) 

[34m[1mTest of all slopes = 0:[22m[39m

stat     60.165
p-val     0.000



## Regression Diagnostics: Heteroskedasticity

The function in the next cell performs White's test for heteroskedasticity. Again, the regression must have an intercept for this test to be useful.

In [7]:
"""
    OlsWhitesTestFn(u,x)

Test of heteroskedasticity. Notice that the regression must contain 
an intercept for the test to be useful.

# Input
- `u::Vector`:   T-vector, residuals
- `x::Matrix`:   Txk, regressors

# Output
- `RegrStat::Number`: test statistic
- `pval::Number`:     p-value

"""
function OlsWhitesTestFn(u,x)

    (T,k) = (size(x,1),size(x,2))

    w = zeros(T,round(Int,k*(k+1)/2))   #matrix of cross products of x
    vv = 1
    for i = 1:k, j = i:k
        w[:,vv] = x[:,i].*x[:,j]        #eg. x1*x1, x1*x2, x2*x2
        vv        = vv + 1
    end

    R² = OlsGMFn(u.^2,w)[5]             #[5] picks out output 5
    df = rank(w) - 1                    #number of independent regressors in w

    WhiteStat = T*R²/(1-R²)
    pval      = 1 - cdf(Chisq(df),WhiteStat)

    return WhiteStat, pval

end

OlsWhitesTestFn

In [8]:
(WhiteStat,pval) = OlsWhitesTestFn(u,X)

printblue("White's test (H₀: heteroskedasticity is not correlated with regressors):\n")
printmat([WhiteStat,pval],rowNames=["stat","p-val"])

[34m[1mWhite's test (H₀: heteroskedasticity is not correlated with regressors):[22m[39m

stat     77.278
p-val     0.000



## Regression Diagnostics: Autocorrelation of the Residuals

The function in the next cell estimates autocorrelations, calculates the DW and Box-Pierce statistics.

In [9]:
"""
    OlsAutoCorrFn(u,L=1)

Test the autocorrelation of OLS residuals

# Input:
- `u::Vector`:   T-vector, residuals
- `L::Int`:      scalar, number of lags in autocorrelation and Box-Pierce test

# Output
- `AutoCorr::Matrix`:   Lx3, autocorrelation, t-stat and p-value
- `BoxPierce::Matrix`:  1x2, Box-Pierce statistic and p-value
- `DW::Number`:         DW statistic

# Requires
- StatsBase, Distributions

"""
function OlsAutoCorrFn(u,L=1)

    T = size(u,1)

    Stdu = std(u)
    ρ    = autocor(u,1:L)        #\rho[TAB]
    t_ρ  = sqrt(T)*ρ

    pval      = 2*(1.0 .- cdf.(Normal(0,1),abs.(t_ρ)))
    AutoCorr  = [ρ t_ρ pval]

    BPStat    = T*sum(ρ.^2)
    pval      = 1 - cdf(Chisq(L),BPStat)
    BoxPierce = [BPStat pval]

    DWStat    = mean(diff(u).^2)/Stdu^2

    return AutoCorr, BoxPierce, DWStat

end

OlsAutoCorrFn

In [10]:
L = 3     #number of autocorrs to test

(ρStats,BoxPierce,DW) = OlsAutoCorrFn(u,L)

printmagenta("Testing autocorrelation of residuals\n")

printblue("Autocorrelations (lag 1 to $L):\n")
printmat(ρStats,colNames=["autocorr","t-stat","p-val"],rowNames=1:L,cell00="lag")

printblue("\nBoxPierce ($L lags): ")
printmat(BoxPierce',rowNames=["stat","p-val"])

printblue("DW statistic:")
printlnPs(DW)

[35m[1mTesting autocorrelation of residuals[22m[39m

[34m[1mAutocorrelations (lag 1 to 3):[22m[39m

lag  autocorr    t-stat     p-val
1       0.074     1.467     0.142
2      -0.037    -0.733     0.464
3       0.019     0.377     0.706


[34m[1mBoxPierce (3 lags): [22m[39m
stat      2.831
p-val     0.418

[34m[1mDW statistic:[22m[39m
     1.849
