# OLS Diagnostics


This notebook tests the $R^2$ of a regression model and whether the residuals are autocorrelated or heteroskedastic.

## Loading Packages

In [1]:
using Dates, DelimitedFiles, Statistics, LinearAlgebra, StatsBase, Distributions

include("jlFiles/printmat.jl")
include("jlFiles/NWFn.jl")
include("jlFiles/OlsFn.jl")

OlsFn

## Loading Data

In [2]:
x = readdlm("Data/FFmFactorsPs.csv",',',skipstart=1)

                #yearmonth, market, small minus big, high minus low
(ym,Rme,RSMB,RHML) = (x[:,1],x[:,2]/100,x[:,3]/100,x[:,4]/100) 
x = nothing                   
println(size(Rme))

Y = Rme
T = size(Y,1)
X = [ones(T) RSMB RHML]

println()

(388,)



In [3]:
(b4,u,_,V,R2a) = OlsFn(Y,X,1)
println("\nOLS with NW standard errors")
printmat([b4 sqrt.(diag(V))])


OLS with NW standard errors
     0.007     0.002
     0.217     0.124
    -0.429     0.108



## Regression Diagnostics: Testing All Slope Coefficients

The function in the next cell tests all slope coefficients (or equivalently, the $R^2$) of a regression.

In [4]:
"""
    OlsR2TestFn(R2a,T,k)

"""
function OlsR2TestFn(R2a,T,k)

  RegrStat = T*R2a/(1-R2a)
  pval     = 1 - cdf(Chisq(k-1),RegrStat)
  Regr     = [RegrStat pval (k-1)]

  return Regr

end

OlsR2TestFn

In [5]:
Regr = OlsR2TestFn(R2a,T,size(X,2))

println("Test of all slopes = 0 (the R2=0):")
println("    stat       p-val     df")
printmat(Regr)

Test of all slopes = 0 (the R2=0):
    stat       p-val     df
    60.165     0.000     2.000



## Regression Diagnostics: Autocorrelation of the Residuals

The function in the next cell estimates autocorrelations, calculates the DW and Box-Pierce statistics.

In [6]:
"""
    OlsAutoCorrFn(u,m=1)

Test the autocorrelation of OLS residuals

# Input:
- `u::Array`:   Tx1, residuals
- `m::Int`:     scalar, number of lags in autocorrelation and Box-Pierce test

# Output
- `AutoCorr::Array`:    mx2, autorrelation and p-value
- `DW::Number`:         scalar, DW statistic
- `BoxPierce::Array`:   1x2, Box-Pierce statistic and p-value

# Requires
- StatsBase, Distributions

"""
function OlsAutoCorrFn(u,m=1)

  T = size(u,1)

  Stdu = std(u)
  rho  = autocor(u,1:m)
                                     #use map to get around bug in cdf.()
  pval      = 2*(1.0 .- cdf.(Normal(0,1),sqrt(T)*abs.(rho)))  
  AutoCorr  = [rho pval]

  BPStat    = T*sum(rho.^2)
  pval      = 1 - cdf(Chisq(m),BPStat)
  BoxPierce = [BPStat pval]

  dwStat    = mean(diff(u).^2)/Stdu^2

  return AutoCorr,dwStat,BoxPierce

end

OlsAutoCorrFn

In [7]:
(AutoCorr,dwStat,BoxPierce) = OlsAutoCorrFn(u,3)

println("    lag        autoCorr  p-val:")
printmat([1:3 AutoCorr])

printlnPs("DW:",dwStat)

println("\nBoxPierce: ")
println("     stat      p-val")
printmat(BoxPierce)

    lag        autoCorr  p-val:
     1.000     0.074     0.142
     2.000    -0.037     0.464
     3.000     0.019     0.706

       DW:     1.849

BoxPierce: 
     stat      p-val
     2.831     0.418



## Regression Diagnostics: Heteroskedasticity

The function in the next cell performs White's test for heteroskedasticity.

In [8]:
"""
    OlsWhitesTestFn(u,x)

# Input:
- `u::Array`:   Tx1, residuals
- `x::Array`:   Txk, regressors

"""
function OlsWhitesTestFn(u,x)

  (T,k) = (size(x,1),size(x,2))

  psi = zeros(T,round(Int,k*(k+1)/2))        #matrix of cross products of x
  vv = 0
  for i = 1:k, j = i:k
      vv        = vv + 1  
      psi[:,vv] = x[:,i].*x[:,j]           #all cross products, incl own
  end
    
  (_,_,_,_,R2a) = OlsFn(u.^2,psi)   #White's test for heteroskedasticity
    
  WhiteStat = T*R2a/(1-R2a)
  pval      = 1 - cdf(Chisq(size(psi,2)-1),WhiteStat)
  White     = [WhiteStat pval (size(psi,2)-1)]

  return White
   
end

OlsWhitesTestFn

In [9]:
White = OlsWhitesTestFn(u,X)

println("White's test: heteroskedasticity is not correlated with regressors:")
println("    stat       p-val     df")
printmat(White)

White's test: heteroskedasticity is not correlated with regressors:
    stat       p-val     df
    77.278     0.000     5.000

