# Panel Regressions (extra)

This notebook uses functions (from several files included below) to redo some of the panel regressions from the first notebook on panels. The notebook is essentially an example of how to use the functions. (In contrast, the first notebook tries to explain the background to the various estimation approaches.)

The functions can handle autocorrelation and cross-sectional clustering.

## Load Packages and Extra Functions

In [2]:
using Printf, DelimitedFiles, Statistics, LinearAlgebra

include("jlFiles/printmat.jl")
include("jlFiles/PutDataInNT.jl")
include("jlFiles/excise.jl")
include("jlFiles/FixedEffects.jl")
include("jlFiles/PanelOls.jl");

## Loading Data

In [3]:
(x,header) = readdlm("Data/nls_panelEd.txt",header=true)    #classical data set from Hill et al (2008)

X = PutDataInNT(x,header)                         #NamedTuple with X.id, X.lwage, etc
println(keys(X))

NT = size(x,1)
c  = ones(NT)

T = 5                 #number of time periods
N = round(Int,NT/T)   #number of individuals

id = X.id

println("\nT=$T and N=$N")

(:id, :year, :lwage, :hours, :age, :educ, :collgrad, :msp, :nev_mar, :not_smsa, :c_city, :south, :black, :union, :exper, :exper2, :tenure, :tenure2)

T=5 and N=716


## Creating Variables for the Regressions

The next cell creates a matrix $yx$ which has the dependent variable as the first column and the regressors as the remaining columns.

We then print the first few observations of (some of) the data. Notice the structure: the first 5 observations are for individual (`id`) 1 (period 1-5), the next 5 for individual 2.

In [4]:
xNames = ["exper/100","exper^2/100","tenure/100","tenure^2/100","south","union"]
yx     = [X.lwage c X.exper/100 X.exper2/100 X.tenure/100 X.tenure2/100 X.south X.union]
K      = size(yx,2) - 1

printblue("The first few lines of (some of) the data:\n")
printmat(Any[id[1:11] yx[1:11,1:3]],colNames=["id","lnwage","c","exper/100"],rowNames=string.(1:11),cell00="obs")

id_uniq = unique(id)               #which id values are in data set
N       = length(id_uniq)          #number of cross-sectional units

[34m[1mThe first few lines of (some of) the data:[22m[39m

obs        id    lnwage         c exper/100
1       1.000     1.808     1.000     0.077
2       1.000     1.863     1.000     0.086
3       1.000     1.789     1.000     0.102
4       1.000     1.847     1.000     0.122
5       1.000     1.856     1.000     0.136
6       2.000     1.281     1.000     0.076
7       2.000     1.516     1.000     0.084
8       2.000     1.930     1.000     0.104
9       2.000     1.919     1.000     0.120
10      2.000     2.201     1.000     0.132
11      3.000     1.815     1.000     0.114



716

## Reshuffling the Data

to fit the convention in the `PanelOls()` function.

We reshuffle the dependent variable into an $T\times N$ matrix `Y` and the regressors into a $T \times K \times N$ array `X`. This allows the `PanelOls()` function to handle autocorrelation and cross-sectional clustering.

In [5]:
Y = fill(NaN,T,N)               #reshuffling the data
X = fill(NaN,T,K,N)

for i = 1:N
    vv_i     = id .== id_uniq[i]   #rows in yx which refer to individual i
    Y[:,i]   = yx[vv_i,1]
    X[:,:,i] = yx[vv_i,2:end]
end

println("The Y matrix is now TxN, while X is TxKxN")
display(Y)

The Y matrix is now TxN, while X is TxKxN


5×716 Matrix{Float64}:
 1.80829  1.28093  1.81482  2.31254  …  1.53039  1.52823  1.46094  1.60944
 1.86342  1.51585  1.91991  2.34858     1.59881  2.4065   1.49669  1.45944
 1.78937  1.93017  1.95838  2.37349     1.60405  2.55886  1.55984  1.42712
 1.84653  1.91903  2.00707  2.3689      1.26794  2.64418  1.6536   1.49437
 1.85645  2.20097  2.08985  2.35053     1.55823  2.58664  1.61586  1.34142

We next call a function `FixedEffects()` to remove individual (and/or time) fixed effects. This does basically the same as in the earlier `yxStar` loops, but the whole calculation is (for convenience) done inside a function.

In [6]:
(Ystar,Xstar) = FixedEffects(Y,X,:id)         #:id for individual fixed effects. :t for time fixed effects
Xstar[:,1,:] .= 1                             #put back a non-zero intercept
println()




Finally, we call the `PanelOls()` function. The output is a named tuple. Use `keys(fO)` to see the entries.

In [7]:
fO = PanelOls(Ystar,Xstar)

θ      = fO.theta
StdErr = sqrt.(diag(fO.CovW))
tstat  = θ./StdErr

printblue("results from PanelOls()")
printmat(θ,tstat,colNames=["coef","t-stat"],rowNames=["c";xNames])

printred("Compare with the FE estimates above. The t-stats might differ because of lack of small-sample adjustment.")

[34m[1mresults from PanelOls()[22m[39m
                  coef    t-stat
c               -0.000    -0.000
exper/100        4.108     6.616
exper^2/100     -0.041    -1.640
tenure/100       1.391     4.445
tenure^2/100    -0.090    -4.624
south           -0.016    -0.411
union            0.064     4.675

[31m[1mCompare with the FE estimates above. The t-stats might differ because of lack of small-sample adjustment.[22m[39m


## Clustered Standard Errors

We now redo the estimation but provide information on clustering for the standard errors. For simplicity, the clusters are defined as the value of the `South` dummy in $t=1$.

In [8]:
clust = convert.(Int,X[1,6,:])       #define clusters based on South/North in t=1

fO = PanelOls(Ystar,Xstar,0,clust)   #0 autocorrelations, but clustering

θ      = fO.theta
StdErrW = sqrt.(diag(fO.CovW))       #White's std
StdErrC = sqrt.(diag(fO.CovC))       #clustered std
tstatW  = θ./StdErrW
tstatC  = θ./StdErrC

printblue("results from PanelOls()")
printmat(θ,tstatW,tstatC,colNames=["coef","t-stat White","t-stat Clust"],rowNames=["c";xNames],width=15)

[34m[1mresults from PanelOls()[22m[39m
                       coef   t-stat White   t-stat Clust
c                    -0.000         -0.000         -0.000
exper/100             4.108          6.616          5.586
exper^2/100          -0.041         -1.640         -2.027
tenure/100            1.391          4.445          3.829
tenure^2/100         -0.090         -4.624         -4.880
south                -0.016         -0.411         -0.467
union                 0.064          4.675          3.631



## Individual and Time Fixed Effects

Redo the panel regression, but first we reconstruct `(Ystar,Xstar)` to handle both individual and time fixed effects: see the `:idt` in the function call.

In [9]:
(Ystar,Xstar) = FixedEffects(Y,X,:idt)
Xstar[:,1,:] .= 1

fO = PanelOls(Ystar,Xstar)

θ      = fO.theta
StdErr = sqrt.(diag(fO.CovW))
tstat  = θ./StdErr

printblue("results from PanelOls()")
printmat(θ,tstat,colNames=["coef","t-stat"],rowNames=["c";xNames])

[34m[1mresults from PanelOls()[22m[39m
                  coef    t-stat
c                0.000     0.000
exper/100        6.713     4.654
exper^2/100     -0.045    -1.762
tenure/100       1.347     4.279
tenure^2/100    -0.090    -4.641
south           -0.014    -0.358
union            0.065     4.801

