# Calendar Time Regressions vs. Panel Regressions

This notebook illustrates how calendar time regressions (form portfolios based on characteristics and then estimate a system of regressions) are related to panel regressions. 

## Load Packages and Extra Functions

In [1]:
using Printf, HDF5, Statistics, LinearAlgebra

include("jlFiles/printmat.jl")
include("jlFiles/CovNWFn.jl")
include("jlFiles/OlsSureFn.jl")

OlsSureFn

## Loading Data

In [2]:
using HDF5                       #to read hdf5 files, a very common data format
fh = h5open("Data/PPM.h5","r")   #open for reading
  (ER,Factors,Investors) = read(fh,"Data/ER","Data/Factors","Data/Investors")
close(fh)

N_Changes = Investors[:,1]

(T,N) = size(ER)
D     = N_Changes .> 50                #logical dummies: [very active]
D0    = .!D                            #inactive

println("T=$(size(ER,1)) and N=$(size(ER,2))")

T=2354 and N=2637


## Individual alphas

The following code takes the matrix of individual daily
excess return $ER_{T\times N}$ and runs one regression for each individual on
a three risk $Factors_{T\times3}$ (excess returns on Swedish equity, Swedish
bonds and international equity). 

The $D$ vector ($N$ elements) is: ```D[i] = false``` if investor $i$ is classified as inactive (no/few portfolio changes, see above), true if active (many portfolio changes). 

The next cell shows the average alphas for each of the two groups.

In [3]:
alphaM = fill(NaN,N)                                #individual alphas
for i = 1:N
   #local b           #only needed in script
   b, = OlsSureFn(ER[:,i],[Factors ones(T)],true,0)
   alphaM[i] = b[end]
end

printblue("\nAverage annualised alphas for each of the two groups:")
xx = [mean(alphaM[D0]) mean(alphaM[D])]*252
colNames = ["Inactive","Active"]
printmat(xx,colNames=colNames,rowNames=["α"])


[34m[1mAverage annualised alphas for each of the two groups:[22m[39m
   Inactive    Active
α    -0.787     6.217



## Calendar Time Portfolios

The following code creates two time series (with $T$ observations in each) of portfolio returns: one for inactive investors, the other for active investors.

Then, it calculates the average excess returns, the Sharpe ratios and finally the alphas.

The alphas and betas are estimated with OLS, and we test the hypothesis that the two alphas are the same (using a SURE approach).

In [4]:
printblue("group by group:")

PortfER      = fill(NaN,(T,2))          #create portfolios as average across individuals
PortfER[:,1] = mean(ER[:,D0],dims=2)    #Tx1, portfolio return = average individual return
PortfER[:,2] = mean(ER[:,D],dims=2)


Avg = mean(PortfER,dims=1)*252          #average excess return on portfolios
Std = std(PortfER,dims=1)*sqrt(252)
SR  = Avg./Std
(b,res,yhat,Covb) = OlsSureFn(PortfER,[ones(T) Factors],true,0)

xx = [Avg;Std;SR;b[1:1,:]*252]
printmat(xx,colNames=colNames,rowNames=["Avg","Std","SR","α"])

[34m[1mgroup by group:[22m[39m
     Inactive    Active
Avg    -1.262     5.534
Std    15.728    13.882
SR     -0.080     0.399
α      -0.787     6.217



In [5]:
R       = [1 0 0 0 -1 0 0 0]                       #testing if alpha(1) = alpha(2)
a_diff  = (R*vec(b))[]                             #[] to make it a scalar
tstatLS = a_diff/sqrt((R*Covb*R')[])

printblue("diff of annual alphas:")
xx = [a_diff*252;tstatLS]
printmat(xx,rowNames=["α1-α2","t-stat"])

[34m[1mdiff of annual alphas:[22m[39m
α1-α2     -7.004
t-stat    -2.784



## Panel Regressions

Finally, a panel ($T\times N$) regression is done by simply stacking all data
points---but by interacting the factors (and constant) with the activity dummies. The
hypothesis of the same alphas is tested by both an OLS approach (assuming that
all data is iid) and a Droscoll-Kraay approach (which accounts for cross-sectional correlations).

The code for that panel regression is in the function `HszDkFn()`. It does a
straightforward LS regression (by a loop over $t$, to save memory space) and
then estimates the covariance matrix of the moment conditions as in
Driscoll-Kraay (allowing for cross-sectional correlations). The coding makes no attempts to be quick.

In [6]:
function HDirProdFn(x,y)
#HDirProdFn    Calculates horizontal direct product of two matrices with equal number of rows.
#              z[i,:] is the Kronecker product of x[i,:] and y[i,:]
  Kx = size(x,2)       #columns in x
  Ky = size(y,2)       #columns in y
  z  = repeat(y,1,Kx) .* kron(x,ones(Int,1,Ky))
  return z
end
#-----------------------------------------------

function HszDkFn(y,x,z)
#HszDkFn   LS and Driscoll-Kray standard errors for panel, assuming x(t,i) = x(t) * z(i)

  (T,N) = (size(y,1),size(y,2))
  K     = size(x,2)*size(z,2)

  Sxx = zeros(K,K)
  Sxy = zeros(K,1)
  for t = 1:T                           #OLS by looping over t
    y_t  = y[t,:]                       #dependent variable, Nx1
    x0_t = repeat(x[t:t,:],N,1)         #factors, NxK, could simplify?
    x_t  = HDirProdFn(z,x0_t)           #effective regressors, z is NxKz, x_t is NxK
    Sxx  = Sxx + x_t'x_t/(T*N)          #building up Sxx and Sxy
    Sxy  = Sxy + x_t'y_t/(T*N)
  end
  theta = Sxx\Sxy

  s2     = 0.0
  omegaj = zeros(K,K)
  for t = 1:T                          #Covariance matrix by looping over t
    y_t  = y[t,:]                      #create y_t and x_t (again)
    x0_t = repeat(x[t:t,:],N,1)
    x_t  = HDirProdFn(z,x0_t)
    e_t  = y_t - x_t*theta             #residuals in t
    h_t  = (x_t'e_t)'/N                #moment conditions in t (divided by N)
    omegaj = omegaj + h_t'h_t          #building up covariance matrix
    s2     = s2 + sum(e_t.^2)/N^2
  end
  Shat = omegaj/T^2                     #estimate of S
  s2   = s2/T^2

  zx_1  = inv(Sxx)
  CovDK = zx_1 * Shat * zx_1'                     #covariance matrix, DK
  stdDK = sqrt.(diag(CovDK))                      #standard errors, DK

  CovLS = zx_1 * s2                               #covariance matrix, LS iid
  stdLS = sqrt.(diag(CovLS))                      #standard errors, LS iid

  return theta,CovDK,CovLS

end

HszDkFn (generic function with 1 method)

In calling on `HszDkFn()` we use the individual returns (`ER` which is $TxN$) as the dependent variables, a constant and the `Factors` (together a $Tx4$ matrix) as the `x` regressors and `[D0 D]` as the dummies that we interact `x` with.

In [7]:
printblue("panel regression:")
(theta,CovDK,CovLS) = HszDkFn(ER,[ones(T) Factors],[D0 D] .+ 0.0)

R       = [1 0 0 0 -1 0 0 0]                #testing if alpha(1) = alpha(2)
a_diff  = (R*vec(theta))[]
tstatLS = a_diff/sqrt((R*CovLS*R')[])
tstatDK = a_diff/sqrt((R*CovDK*R')[])

xx = [a_diff*252;tstatLS;tstatDK]
printmat(xx,rowNames=["α1-α2","t-stat (LS)","t-stat (DK)"])

printred("\nCompare with calendar time regressions. Also notice the difference (any?) between the two t-stats")

[34m[1mpanel regression:[22m[39m
α1-α2          -7.004
t-stat (LS)   -24.017
t-stat (DK)    -2.784


[31m[1mCompare with calendar time regressions. Also notice the difference (any?) between the two t-stats[22m[39m
