# Calendar Time Regressions vs. Panel Regressions

This notebook illustrates how calendar time regressions (form portfolios based on characteristics and then estimate a system of regressions) are related to panel regressions. 

## Load Packages and Extra Functions

In [1]:
using Printf, HDF5, Statistics, LinearAlgebra

include("jlFiles/printmat.jl")
include("jlFiles/Ols.jl")
include("jlFiles/CovNWFn.jl")
include("jlFiles/OlsSureFn.jl")
include("jlFiles/excise.jl")
include("jlFiles/PanelOls.jl")

replaceNaNinYX!

## Loading Data

The data is in an HDF5 file. This is a very common data format in science. To load it, the [HDF5.jl](https://github.com/JuliaIO/HDF5.jl) package is used.

In [2]:
fh = h5open("Data/PPM.h5","r")   #open for reading
  (ER,Factors,Investors) = read(fh,"Data/ER","Data/Factors","Data/Investors")
close(fh)

N_Changes = Investors[:,1]

(T,N) = size(ER)
D     = N_Changes .> 50                #logical dummies: [very active]
D0    = .!D                            #inactive

println("T=$(size(ER,1)) and N=$(size(ER,2))")

T=2354 and N=2637


## Individual alphas

The following code takes the matrix of individual daily
excess return `ER` (a $T\times N$ matrix) and runs one regression for each of the $N$ individuals on
a three risk factors (in `Factors`, a $T\times 3$ matrix) which includes excess returns on Swedish equity, Swedish bonds and international equity.

The `D` vector ($N$ elements) is: `D[i] = false` if investor $i$ is classified as inactive (no/few portfolio changes, see above), and `D[i] = true` if active (many portfolio changes).

The cell shows the average alphas for each of the two groups.

In [3]:
alphaM = fill(NaN,N)                                #individual alphas
for i = 1:N
   #local b           #local/global is needed in script
   b, = OlsNWFn(ER[:,i],[Factors ones(T)],0)
   alphaM[i] = b[end]
end

printblue("\nAverage annualised alphas for each of the two groups:\n")
xx = [mean(alphaM[D0]) mean(alphaM[D])]*252
colNames = ["Inactive","Active"]
printmat(xx,colNames=colNames,rowNames=["α"])


[34m[1mAverage annualised alphas for each of the two groups:[22m[39m

   Inactive    Active
α    -0.787     6.217



## Calendar Time Portfolios

The following code creates two time series (with $T$ observations in each) of portfolio returns: one for inactive investors, the other for active investors. In both cases, the portfolios are equally weighted, so the return is the average return of those in the portfolio.

Then, it calculates the average excess returns, the Sharpe ratios and finally the alphas.

The alphas and betas are estimated with OLS, and we test the hypothesis that the two alphas are the same (using a SURE approach).

### A Remark on the Code

- The SURE approach is implemented in the function `OlsSureFn` (included in one of the first cells above).

In [4]:
printblue("group by group, annualised values:\n")

PortfER      = fill(NaN,(T,2))          #create portfolios as average across individuals
PortfER[:,1] = mean(ER[:,D0],dims=2)    #Tx1, portfolio return = average individual return
PortfER[:,2] = mean(ER[:,D],dims=2)


Avg = mean(PortfER,dims=1)*252          #average excess return on portfolios, annualised
Std = std(PortfER,dims=1)*sqrt(252)
SR  = Avg./Std
(b,res,yhat,Covb) = OlsSureFn(PortfER,[ones(T) Factors],true,0)

xx = [Avg;Std;SR;b[1:1,:]*252]
printmat(xx,colNames=colNames,rowNames=["Avg","Std","SR","α"])

[34m[1mgroup by group, annualised values:[22m[39m

     Inactive    Active
Avg    -1.262     5.534
Std    15.728    13.882
SR     -0.080     0.399
α      -0.787     6.217



In [5]:
R       = [1 0 0 0 -1 0 0 0]                       #testing if α₁ = α₂
a_diff  = (R*vec(b))[]                             #[] to make it a scalar
tstatLS = a_diff/sqrt((R*Covb*R')[])

printblue("diff of annual alphas:\n")
xx = [a_diff*252;tstatLS]
printmat(xx,rowNames=["α1-α2","t-stat"])

[34m[1mdiff of annual alphas:[22m[39m

α1-α2     -7.004
t-stat    -2.784



## Panel Regressions

Finally, a panel ($T\times N$) regression is done by simply stacking all data points---but by interacting the factors and constant with the activity dummies. The hypothesis of the same alphas is tested by both an OLS approach (assuming that all data is iid) and a Driscoll-Kraay approach (which accounts for cross-sectional correlations).

The code for the panel regression is in the function `PanelOls()`. It does a straightforward LS regression and then estimates the covariance matrix in several different ways: traditional OLS, White, Driscoll-Kraay and optionally also clustered (the cluster/group membership can be supplied to the function). Also, autocorrelation can be accounted for by applying a Newey-West approach to the (White, DK, clustered) methods.

In calling on `PanelOls()` we use the individual returns (`ER` which is $TxN$) as the dependent variables, a $TxKxN$ array containing the regressors (interactions of `[ones(T) Factors]` with the dummies in`[D0 D]`). This approach is somewhat wasteful with memory since the dummies are (here) time-invariant. However,`PanelOls()` is set up to handle also more general cases.

In [6]:
?PanelOls

search: [0m[1mP[22m[0m[1ma[22m[0m[1mn[22m[0m[1me[22m[0m[1ml[22m[0m[1mO[22m[0m[1ml[22m[0m[1ms[22m



```
PanelOls(y,x,m=0,clust=[],vvM=[])
```

Pooled OLS estimation.

# Input

  * `y::Matrix`:          TxN matrix with the dependent variable, y(t,i) is for period t, individual i
  * `x::3D Array`:        TxKxN matrix with K regressors
  * `m::Int`:             (optional), scalar, number of lags in covariance estimation
  * `clust::Vector{Int}`: (optional), N vector with cluster number for each individual, [ones(N)]
  * `vvM::Matrix`:        (optional), TxN with true/false where false indicates NaN/missings in observation (t,i)

# Output

  * `fnOutput::NamedTuple`:   named tuple with the following elements    [1] theta         (K*L)x1 vector, LS estimates of regression coeefficients on kron(z,x)    [2] CovDK         (K*L)x(K*L) matrix, Driscoll-Kraay covariance matrix    [3] CovC          covariance matrix, cluster    [4] CovW          covariance matrix, White's    [5] R2            scalar, (pseudo-) R2    [6] yhat          TxN matrix with fitted values    [7] Nb            T-vector, number of obs in each period

# Notice

  * for TxNxK -> TxKxN, do `x = permutedims(z,[1,3,2])`
  * for an unbalanced panel, set row t of `(y[t,i],x[t,:,i])` to zeros if there is a NaN/missing value in that row (see vvM)

Paul.Soderlind@unisg.ch


In [7]:
printblue("panel regression:\n")

x  = [ones(T) Factors]
K1 = size(x,2)
X = fill(NaN,T,2*K1,N)                  #create TxKxN array of regressors
for i = 1:N
    X[:,:,i] = hcat(x.*D0[i],x.*D[i])
end

fnO = PanelOls(ER,X)                        #panel regression

R       = [1 0 0 0 -1 0 0 0]                #testing if α₁ = α₂
a_diff  = (R*vec(fnO.theta))[]

tstatLS = a_diff/sqrt((R*fnO.CovLS*R')[])
tstatDK = a_diff/sqrt((R*fnO.CovDK*R')[])

xx = [a_diff*252;tstatLS;tstatDK]
printmat(xx,rowNames=["α1-α2","t-stat (LS)","t-stat (DK)"])

printred("\nCompare with calendar time regressions. Also notice the difference (any?) between the two t-stats")

[34m[1mpanel regression:[22m[39m

α1-α2          -7.004
t-stat (LS)   -24.017
t-stat (DK)    -2.784


[31m[1mCompare with calendar time regressions. Also notice the difference (any?) between the two t-stats[22m[39m


# Unbalanced Panels (extra)

The `PanelOls()` is coded in such a way that an unbalanced panel (NaNs/missings in `(y,x)`) can be handled by zeroing out all of (`y[t,i],x[t,:,i]`) if there is a NaN/missing value there. 

To do that, the `replaceNaNinYX!()` function is useful. The next cell illustrates how it works. *Warning*: the function overwrites the inputs (as indicated by the `!` in the name, following Julia conventions).

In [8]:
y   = [NaN 11;2 12;3 13]                     #y has a NaN for t=1, i = 1
x   = hcat(ones(3,1,2),randn(3,2,2))
vvM = replaceNaNinYX!(y,x)

printblue("after 'zeroing out' observations with NaNs")
printmat(y)
println("x[:,:,1] ")
printmat(x[:,:,1])
println("x[:,:,2]")
printmat(x[:,:,2])

printred("Notice that (y[1,1],x[1,:,1]) are filled with zeros - and that the old (y,x) are OVERWRITTEN")

[34m[1mafter 'zeroing out' observations with NaNs[22m[39m
     0.000    11.000
     2.000    12.000
     3.000    13.000

x[:,:,1] 
     0.000     0.000     0.000
     1.000    -0.728     0.890
     1.000    -0.928     0.655

x[:,:,2]
     1.000     0.320     0.320
     1.000     0.648    -0.836
     1.000    -0.312    -0.397

[31m[1mNotice that (y[1,1],x[1,:,1]) are filled with zeros - and that the old (y,x) are OVERWRITTEN[22m[39m


In [9]:
ER[1]  = NaN                 #let's introduce some NaNs in the data
X[end] = NaN

fnO = PanelOls(ER,X)           #will just give NaN as results

(theta = [NaN; NaN; … ; NaN; NaN;;], CovDK = [NaN NaN … NaN NaN; NaN NaN … NaN NaN; … ; NaN NaN … NaN NaN; NaN NaN … NaN NaN], CovC = [NaN NaN … NaN NaN; NaN NaN … NaN NaN; … ; NaN NaN … NaN NaN; NaN NaN … NaN NaN], CovW = [NaN NaN … NaN NaN; NaN NaN … NaN NaN; … ; NaN NaN … NaN NaN; NaN NaN … NaN NaN], CovLS = [NaN NaN … NaN NaN; NaN NaN … NaN NaN; … ; NaN NaN … NaN NaN; NaN NaN … NaN NaN], R2 = NaN, yhat = [NaN NaN … NaN NaN; NaN NaN … NaN NaN; … ; NaN NaN … NaN NaN; NaN NaN … NaN NaN], Nobs = [2637; 2637; … ; 2637; 2637;;])

In [10]:
#ER_original = copy(ER)                #uncomment if you want to keep the original data
#X_original  = copy(X)                 
vvM = replaceNaNinYX!(ER,X)            #to save space, (ER,X) are overwritten

fnO     = PanelOls(ER,X,0,[],vvM)
a_diff  = (R*vec(fnO.theta))[]
tstatDK = a_diff/sqrt((R*fnO.CovDK*R')[])

xx = [a_diff*252;tstatDK]
printmat(xx,rowNames=["α₁-α₂","t-stat (DK)"])

α₁-α₂          -7.005
t-stat (DK)    -2.784

