# Panel Regressions


## Loading Packages

In [1]:
using Dates, DelimitedFiles, Statistics, LinearAlgebra

include("jlFiles/printmat.jl")
include("jlFiles/NWFn.jl")
include("jlFiles/OlsFn.jl")
include("jlFiles/lagnFn.jl")
include("jlFiles/exciseFn.jl")

exciseFn (generic function with 1 method)

## Loading Data

In [2]:
x = readdlm("Data/nls_panelEd.txt",skipstart=1)

NT = size(x,1)
c  = ones(NT)

T = 5                 #number of time periods
N = round(Int,NT/T)   #number of individuals         

(id,year,lnwage)   = (x[:,1],x[:,2],x[:,3])
(exper,exper2)     = (x[:,15],x[:,16])
(tenure,tenure2)   = (x[:,17],x[:,18])
(south,tradeunion) = (x[:,12],x[:,14])

println("T=$T and N=$N")

T=5 and N=716


## Creating Variables for the Regressions

The next cell creates a matrix $yx$ which has the dependent variable as the first column and the regressors as the remaining columns.

The subsequent cell makes a "within transformation" by creating 

$
yx^*_{it} = yx_{it} - \bar{yx}_{it}, 
$

where $\bar{yx}_{it}$ is a row vector with the averages of each column of $yx$ for individual $i$.

In [3]:
xNames = ["exper/100","exper^2/100","tenure/100","tenure^2/100","south","union"]
yx = [lnwage c exper/100 exper2/100 tenure/100 tenure2/100 south tradeunion]
K  = size(yx,2) - 1

7

In [4]:
id_uniq = unique(id)
N       = length(id_uniq)

yxStar = fill(NaN,size(yx))
yxbar = fill(NaN,(N,1+K))
for i = 1:N                          #loop over individuals
    local vv_i
    vv_i          = id .== id_uniq[i]                #locate rows in yx which refer to individual i
    yxbar[i,:]    = mean(yx[vv_i,:],dims=1)          #averages for individual i
    yxStar[vv_i,:] = yx[vv_i,:] .- yxbar[i:i,:]      #i:i to keep it a row vector
end

## Pooled OLS, FE, and Between Estimations

In [5]:
(b,res,yhat,Covb,R2,) = OlsFn(yx[:,1],yx[:,2:end])           #LS
xutLS = hcat(b,b./sqrt.(diag(Covb)))
xutLS = xutLS[2:end,:]


(b,res,yhat,Covb,R2,) = OlsFn(yxStar[:,1],yxStar[:,3:end])    #fixed effect
xutFE = hcat(b,b./sqrt.(diag(Covb))*sqrt(NT-N-2)/sqrt(NT-2))
s2_e  = sum(res.^2)/(NT-N-(K-1))


(b,res,yhat,Covb,R2,) = OlsFn(yxbar[:,1],yxbar[:,2:end])      #between estimator
xutB = hcat(b,b./sqrt.(diag(Covb)))
xutB = xutB[2:end,:]
s2_u = max(0,sum(res.^2)/(N-K) - s2_e/T)

println("Pooled OLS (coef and t-stat)")
printmat(xutLS)
println("FE")
printmat(xutFE)
println("Between")
printmat(xutB)

Pooled OLS (coef and t-stat)
     7.837     8.954
    -0.201    -5.264
     1.206     2.346
    -0.024    -0.828
    -0.196   -13.247
     0.110     6.928

FE
     4.108     5.917
    -0.041    -1.466
     1.391     3.975
    -0.090    -4.136
    -0.016    -0.367
     0.064     4.181

Between
    10.641     4.573
    -0.317    -3.054
     1.247     0.883
    -0.016    -0.198
    -0.201    -6.519
     0.121     3.102



## First-Difference Model

To estimate the first-difference model, we first need to calculate the differences (over two time periods) for the same individual. 

In the cell below, we call on the function `lagnFn` which lags the data once (as a default). For the first time period, the result is a NaN (as there are no earlier values). After the loop we locate and delete all rows that include some NaNs. This means that we will have only $T-1$ data points for each individual. Finally, we replace the zeros created 

In [6]:
yxStarΔ = fill(NaN,size(yx))
for i = 1:N                          #loop over individuals
    local vv_i
    vv_i            = id .== id_uniq[i]   #rows in yx which refer to individual i
    yxStarΔ[vv_i,:] = yx[vv_i,:] - lagnFn(yx[vv_i,:])
end

yxStarΔ = exciseFn(yxStarΔ)          #cut out rows with NaNs
yxStarΔ[:,2] .= 1                    #constant  

println("size of yxStarΔ: ",size(yxStarΔ))

size of yxStarΔ: (2864, 8)


In [7]:
(b,res,yhat,Covb,R2,) = OlsFn(yxStarΔ[:,1],yxStarΔ[:,2:end])
xutΔ = hcat(b,b./sqrt.(diag(Covb)))
xutΔ = xutΔ[2:end,:]

println("1-st difference estimations")
printmat(xutΔ)

1-st difference estimations
     3.548     2.277
    -0.045    -0.933
     1.293     2.527
    -0.083    -2.329
    -0.024    -0.395
     0.044     3.115



## GLS of Random Effects Model (extra)

In [8]:
ϑ = 1 - sqrt(s2_e)/sqrt(T*s2_u+s2_e)                       #GLS
yxStar_ϑ = fill(NaN,size(yx))
for i = 1:N
    local vv_i
    vv_i             = id .== id_uniq[i]
    yxStar_ϑ[vv_i,:] = yx[vv_i,:] .- ϑ*yxbar[i:i,:]       #
end
(b,res,yhat,Covb,R2,) = OlsFn(yxStar_ϑ[:,1],yxStar_ϑ[:,2:end])
xutGLS = hcat(b,b./sqrt.(diag(Covb)))
xutGLS = xutGLS[2:end,:]

println("GLS")
printmat(xutGLS)

GLS
     4.570     7.111
    -0.063    -2.387
     1.380     4.032
    -0.074    -3.575
    -0.132    -5.255
     0.075     5.611

