# OLS Diagnostics


This notebook tests *(a)* the fit of a regression model; *(b)* properties of the residuals (heteroskedasticity and autocorrelation).

You may also consider the [HypothesisTests.jl](https://github.com/JuliaStats/HypothesisTests.jl) package (not used here).

## Load Packages and Extra Functions

The key functions for the diagnostic tests are from the (local) `FinEcmt_OLS` module.

In [1]:
MyModulePath = joinpath(pwd(),"jlFiles")
!in(MyModulePath,LOAD_PATH) && push!(LOAD_PATH,MyModulePath);

In [2]:
using FinEcmt_OLS, DelimitedFiles, LinearAlgebra

## Loading Data

In [3]:
x = readdlm("Data/FFmFactorsPs.csv",',',skipstart=1)

                #yearmonth, market, small minus big, high minus low
(ym,Rme,RSMB,RHML) = (x[:,1],x[:,2]/100,x[:,3]/100,x[:,4]/100)
x = nothing
println(size(Rme))

Y = Rme         #or copy(Rme) is independent copies are needed
T = size(Y,1)
X = [ones(T) RSMB RHML];

(388,)


In [4]:
(b,u,_,V,R²) = OlsGM(Y,X)    #do OLS
Stdb = sqrt.(diag(V))

printblue("OLS with traditional standard errors:\n")
xNames = ["c","SMB","HML"]
printmat([b Stdb],colNames=["coef","std"],rowNames=xNames)

[34m[1mOLS with traditional standard errors:[22m[39m

         coef       std
c       0.007     0.002
SMB     0.217     0.073
HML    -0.429     0.074



## Regression Diagnostics: Testing All Slope Coefficients

The `OlsR2Test()` function tests all slope coefficients (or equivalently, the $R^2$) of a regression. Notice that the regression must contain an intercept for R² to be useful.

In [5]:
@doc2 OlsR2Test

```
OlsR2Test(R²,T,df)
```

Test of all slope coefficients. Notice that the regression must contain an intercept for R² to be useful.

### Input

  * `R²::Number`:    R² value
  * `T::Int`:        number of observations
  * `df::Number`:    number of (non-constant) regressors

### Output

  * `RegrStat::Number`: test statistic
  * `pval::Number`:     p-value


In [6]:
using CodeTracking
println(@code_string OlsR2Test(1.0,1,25))    #print the source code

function OlsR2Test(R²,T,df)
    RegrStat = T*R²/(1-R²)           #R\^2[TAB]
    pval     = 1 - cdf(Chisq(df),RegrStat)    #or ccdf() to get 1-cdf()
    return RegrStat, pval
end


In [7]:
df = size(X,2) - 1              #number of slope coefficients
(RegrStat,pval) = OlsR2Test(R²,T,df)

printblue("Test of all slopes = 0:\n")
printmat([RegrStat,pval],rowNames=["stat","p-val"])

[34m[1mTest of all slopes = 0:[22m[39m

stat     60.165
p-val     0.000



## Regression Diagnostics: Heteroskedasticity

The `OlsWhitesTest()` function does White's test for heteroskedasticity. Again, the regression must have an intercept for this test to be useful.

In [8]:
@doc2 OlsWhitesTest

```
OlsWhitesTest(u,x)
```

Test of heteroskedasticity. Notice that the regression must contain  an intercept for the test to be useful.

### Input

  * `u::Vector`:   T-vector, residuals
  * `x::Matrix`:   Txk, regressors

### Output

  * `RegrStat::Number`: test statistic
  * `pval::Number`:     p-value


In [9]:
#println(@code_string OlsWhitesTest([1],[1]))    #print the source code

In [10]:
(WhiteStat,pval) = OlsWhitesTest(u,X)

printblue("White's test (H₀: heteroskedasticity is not correlated with regressors):\n")
printmat([WhiteStat,pval],rowNames=["stat","p-val"])

[34m[1mWhite's test (H₀: heteroskedasticity is not correlated with regressors):[22m[39m

stat     77.278
p-val     0.000



## Regression Diagnostics: Autocorrelation of the Residuals

The `OlsAutoCorr()` function estimates autocorrelations, calculates the DW and Box-Pierce statistics for the input (often, the residual).

In [11]:
@doc2 OlsAutoCorr

```
OlsAutoCorr(u,L=1)
```

Test the autocorrelation of OLS residuals

### Input

  * `u::Vector`:   T-vector, residuals
  * `L::Int`:      scalar, number of lags in autocorrelation and Box-Pierce test

### Output

  * `AutoCorr::Matrix`:   Lx3, autocorrelation, t-stat and p-value
  * `BoxPierce::Matrix`:  1x2, Box-Pierce statistic and p-value
  * `DW::Number`:         DW statistic

### Requires

  * StatsBase, Distributions


In [12]:
#println(@code_string OlsAutoCorr([1],5))    #print the source code

In [13]:
L = 3     #number of autocorrs to test

(ρStats,BoxPierce,DW) = OlsAutoCorr(u,L)

printmagenta("Testing autocorrelation of residuals\n")

printblue("Autocorrelations (lag 1 to $L):\n")
printmat(ρStats,colNames=["autocorr","t-stat","p-val"],rowNames=1:L,cell00="lag")

printblue("\nBoxPierce ($L lags): ")
printmat(BoxPierce',rowNames=["stat","p-val"])

printblue("DW statistic:")
printlnPs(DW)

[35m[1mTesting autocorrelation of residuals[22m[39m

[34m[1mAutocorrelations (lag 1 to 3):[22m[39m

lag  autocorr    t-stat     p-val
1       0.074     1.467     0.142
2      -0.037    -0.733     0.464
3       0.019     0.377     0.706


[34m[1mBoxPierce (3 lags): [22m[39m
stat      2.831
p-val     0.418

[34m[1mDW statistic:[22m[39m
     1.849


## Autocorrelation of of `X.*u`

What matters most for the uncertainty about a slope coefficient is not the autocorrelation of the residual itself, but of the residual times the regressor. This is tested below.

In [14]:
for i = 1:size(X,2)         #iterate over different regressors
    ρStats, = OlsAutoCorr(X[:,i].*u,L)
    printblue("Autocorrelations of $(xNames[i])*u  (lag 1 to $L):")
    printmat(ρStats,colNames=["autocorr","t-stat","p-val"],rowNames=1:L,cell00="lag")
end

[34m[1mAutocorrelations of c*u  (lag 1 to 3):[22m[39m
lag  autocorr    t-stat     p-val
1       0.074     1.467     0.142
2      -0.037    -0.733     0.464
3       0.019     0.377     0.706

[34m[1mAutocorrelations of SMB*u  (lag 1 to 3):[22m[39m
lag  autocorr    t-stat     p-val
1       0.219     4.312     0.000
2      -0.014    -0.268     0.789
3       0.044     0.857     0.391

[34m[1mAutocorrelations of HML*u  (lag 1 to 3):[22m[39m
lag  autocorr    t-stat     p-val
1       0.278     5.472     0.000
2       0.131     2.582     0.010
3       0.225     4.438     0.000

