# OLS Diagnostics

This notebook studies key properties of the model (measures of fit, normality, multicollinearity).

You may also consider the [HypothesisTests.jl](https://github.com/JuliaStats/HypothesisTests.jl) package (not used here).

## Load Packages and Extra Functions

The key functions for the diagnostic tests are from the (local) `FinEcmt_OLS` module.

In [1]:
MyModulePath = joinpath(pwd(),"src")
!in(MyModulePath,LOAD_PATH) && push!(LOAD_PATH,MyModulePath)
using FinEcmt_OLS

In [2]:
#=
include(joinpath(pwd(),"src","FinEcmt_OLS.jl"))
using .FinEcmt_OLS
=#

In [3]:
using DelimitedFiles, Statistics, LinearAlgebra

## Loading Data

In [4]:
x = readdlm("Data/FFmFactorsPs.csv",',',skipstart=1)

                #yearmonth, market, small minus big, high minus low
(ym,Rme,RSMB,RHML) = (x[:,1],x[:,2]/100,x[:,3]/100,x[:,4]/100)
x = nothing
println(size(Rme))

Y = Rme         #or copy(Rme) is independent copies are needed
T = size(Y,1)
X = [ones(T) RSMB RHML]
k = size(X,2)

(388,)


3

In [5]:
(b,u,_,V,R²) = OlsGM(Y,X)    #do OLS
Stdb = sqrt.(diag(V))

printblue("OLS with traditional standard errors:\n")
xNames = ["c","SMB","HML"]
printmat([b Stdb],colNames=["coef","std"],rowNames=xNames)

[34m[1mOLS with traditional standard errors:[22m[39m

         coef       std
c       0.007     0.002
SMB     0.217     0.073
HML    -0.429     0.074



# Measures of Fit

Adjusted R², AIC, BIC  (the two latter is discussed in more detail in another chapter)

In [6]:
@doc2 RegressionFit
#using CodeTracking
#println(@code_string RegressionFit([1],0.0,3))    #print the source code

```
RegressionFit(u,R²,k)
```

Calculate adjusted R², AIC and BIC from regression residuals.

### Input

  * `u::Vector`:      T-vector of residuals
  * `R²::Float`:      the R² value
  * `k::Int`:         number of regressors


In [7]:
(R²adj,AIC,BIC) = RegressionFit(u,R²,k)

printblue("Measures of fit")
printmat([R²,R²adj,AIC,BIC];rowNames=["R²","R²adj","AIC","BIC"])

[34m[1mMeasures of fit[22m[39m
R²        0.134
R²adj     0.130
AIC      -6.285
BIC      -6.255



# Test of Normality

of the residuals, applying the Jarque-Bera test.

In [8]:
@doc2 JarqueBeraTest
#println(@code_string JarqueBeraTest([1]))    #print the source code

```
JarqueBeraTest(x)
```

Calculate the JB test for each column in a matrix. Reports `(skewness,kurtosis,JB)`.


In [9]:
(skewness,kurtosis,JB,pvals) = JarqueBeraTest(u)

printblue("Test of normality")
xut = vcat(skewness,kurtosis,JB)
printmat(xut,collect(pvals);rowNames=["skewness","kurtosis","Jarque-Bera"],colNames=["stat","p-value"])

[34m[1mTest of normality[22m[39m
                 stat   p-value
skewness       -0.746     0.000
kurtosis        5.583     0.000
Jarque-Bera   143.834     0.000



# Multicollinearity

by studying the correlation matrix and the variance inflation factor (VIF). A high VIF (5 to 10) might indicate issues with multicollinearity.

In [10]:
@doc2 VIF
#println(@code_string VIF([1]))    #print the source code

```
VIF(X)
```

Calculate the variance inflation factor

### Input

  * `x::Matrix`:    Txk matrix with regressors

### Output

  * `maxVIF::Float`:     highest VIF value
  * `allVIF::Vector`:    a k VIF values


In [11]:
printblue("Correlation matrix (checking multicollinearity)")
printmat(cor(X);colNames=xNames,rowNames=xNames)

[34m[1mCorrelation matrix (checking multicollinearity)[22m[39m
            c       SMB       HML
c       1.000       NaN       NaN
SMB       NaN     1.000    -0.320
HML       NaN    -0.320     1.000



In [12]:
(maxVIF,allVIF) = VIF(X)
printblue("VIF (checking multicollinearity)")
printmat(allVIF;rowNames=xNames)

[34m[1mVIF (checking multicollinearity)[22m[39m
c       1.000
SMB     1.114
HML     1.114



# A Convenience Function for Printing All These Tests (extra)

In [13]:
@doc2 DiagnosticsTable
#println(@code_string DiagnosticsTable([1],[1],0.0))    #print the source code

```
DiagnosticsTable(X,u,R²,nlags,xNames="")
```

Compute and print a number of regression diagnostic tests.

### Input

  * `X::Matrix`:      Txk matrix of regressors
  * `u::Vector`:      T-vector of residuals
  * `R²::Float`:      the R² value
  * `xNames::Vector`: of strings, regressor names


In [14]:
DiagnosticsTable(X,u,R²,xNames)

[34m[1mTest of all slopes = 0[22m[39m
stat     60.165
p-val     0.000

[34m[1mMeasures of fit[22m[39m
R²        0.134
R²adj     0.130
AIC      -6.285
BIC      -6.255

[34m[1mTest of normality[22m[39m
                 stat   p-value
skewness       -0.746     0.000
kurtosis        5.583     0.000
Jarque-Bera   143.834     0.000

[34m[1mCorrelation matrix (checking multicollinearity)[22m[39m
            c       SMB       HML
c       1.000       NaN       NaN
SMB       NaN     1.000    -0.320
HML       NaN    -0.320     1.000

[34m[1mVIF (checking multicollinearity)[22m[39m
c       1.000
SMB     1.114
HML     1.114

