# OLS Diagnostics


This notebook tests (a) the fit of a regression model; (b) properties of the residuals (heteroskedasticity and autocorrelation).

You may also consider the [HypothesisTests.jl](https://github.com/JuliaStats/HypothesisTests.jl) package.

## Load Packages and Extra Functions

In [1]:
using Dates, DelimitedFiles, Statistics, LinearAlgebra, StatsBase, Distributions

include("jlFiles/printmat.jl")
include("jlFiles/NWFn.jl")
include("jlFiles/OlsFn.jl")

OlsFn

## Loading Data

In [2]:
x = readdlm("Data/FFmFactorsPs.csv",',',skipstart=1)

                #yearmonth, market, small minus big, high minus low
(ym,Rme,RSMB,RHML) = (x[:,1],x[:,2]/100,x[:,3]/100,x[:,4]/100) 
x = nothing
println(size(Rme))

Y = Rme
T = size(Y,1)
X = [ones(T) RSMB RHML];

(388,)


In [3]:
(b,u,_,V,R2) = OlsGMFn(Y,X)
Stdb = sqrt.(diag(V))

printblue("OLS with traditional standard errors")
rowNames = ["c","SMB","HML"]
printTable([b Stdb],["coef","std"],rowNames)

[34m[1mOLS with traditional standard errors[22m[39m
         coef       std
c       0.007     0.002
SMB     0.217     0.073
HML    -0.429     0.074



## Regression Diagnostics: Testing All Slope Coefficients

The function in the next cell tests all slope coefficients (or equivalently, the $R^2$) of a regression.

In [4]:
"""
    OlsR2TestFn(R2,T,k)

"""
function OlsR2TestFn(R2,T,k)
    df       = k-1
    RegrStat = T*R2/(1-R2)
    pval     = 1 - cdf(Chisq(k-1),RegrStat)
    return RegrStat, pval, df
end

OlsR2TestFn

In [5]:
(RegrStat,pval,df) = OlsR2TestFn(R2,T,size(X,2))

printblue("Test of all slopes = 0:")
printTable([RegrStat pval df],["stat","p-val","df"],[""])

[34m[1mTest of all slopes = 0:[22m[39m
      stat     p-val        df
    60.165     0.000     2.000



## Regression Diagnostics: Heteroskedasticity

The function in the next cell performs White's test for heteroskedasticity.

In [6]:
"""
    OlsWhitesTestFn(u,x)

# Input:
- `u::Array`:   Tx1, residuals
- `x::Array`:   Txk, regressors

# Output
- `::`:
- `::`:
- `::`:

"""
function OlsWhitesTestFn(u,x)

    (T,k) = (size(x,1),size(x,2))

    psi = zeros(T,round(Int,k*(k+1)/2))   #matrix of cross products of x
    vv = 0
    for i = 1:k, j = i:k
        vv        = vv + 1  
        psi[:,vv] = x[:,i].*x[:,j]
    end

    R2 = OlsGMFn(u.^2,psi)[5]             #[5] picks out output 5
    df = size(psi,2) - 1                  #rank(psi)-1 is probably a safer choice

    WhiteStat = T*R2/(1-R2)
    pval      = 1 - cdf(Chisq(df),WhiteStat)
    #White     = [WhiteStat pval df]

    return WhiteStat, pval, df

end

OlsWhitesTestFn

In [7]:
(WhiteStat,pval,df) = OlsWhitesTestFn(u,X)

printblue("White's test, heteroskedasticity is not correlated with regressors:")
printTable([WhiteStat pval df],["stat","p-val","df"],[""])

[34m[1mWhite's test, heteroskedasticity is not correlated with regressors:[22m[39m
      stat     p-val        df
    77.278     0.000     5.000



## Regression Diagnostics: Autocorrelation of the Residuals

The function in the next cell estimates autocorrelations, calculates the DW and Box-Pierce statistics.

In [8]:
"""
    OlsAutoCorrFn(u,L=1)

Test the autocorrelation of OLS residuals

# Input:
- `u::Array`:   Tx1, residuals
- `L::Int`:     scalar, number of lags in autocorrelation and Box-Pierce test

# Output
- `AutoCorr::Array`:    Lx3, autocorrelation, t-stat and p-value
- `BoxPierce::Array`:   1x2, Box-Pierce statistic and p-value
- `DW::Number`:         scalar, DW statistic

# Requires
- StatsBase, Distributions

"""
function OlsAutoCorrFn(u,L=1)

    T = size(u,1)

    Stdu  = std(u)
    rho   = autocor(u,1:L)
    t_rho = sqrt(T)*rho
                                     
    pval      = 2*(1.0 .- cdf.(Normal(0,1),abs.(t_rho)))
    AutoCorr  = [rho t_rho pval]

    BPStat    = T*sum(rho.^2)
    pval      = 1 - cdf(Chisq(L),BPStat)
    BoxPierce = [BPStat pval]

    dwStat    = mean(diff(u).^2)/Stdu^2

    return AutoCorr, BoxPierce, dwStat

end

OlsAutoCorrFn

In [9]:
L = 3     #number of autocorrs to test

(ρStats,BoxPierce,DW) = OlsAutoCorrFn(u,L)

printblue("Testing autocorrelation of residuals\n")

println("Autocorrelations (lag 1 to $L):")
printTable(ρStats,["autocorr","t-stat","p-val"],string.(1:L),cell00="lag")

printlnPs("DW:",DW)

println("\nBoxPierce ($L lags): ")
printTable(BoxPierce,["stat","p-val"],[""])

[34m[1mTesting autocorrelation of residuals[22m[39m

Autocorrelations (lag 1 to 3):
lag  autocorr    t-stat     p-val
1       0.074     1.467     0.142
2      -0.037    -0.733     0.464
3       0.019     0.377     0.706

       DW:     1.849

BoxPierce (3 lags): 
      stat     p-val
     2.831     0.418

