# Week 2
The purpose of this and next week's exercises is to estimate two basic linear panel data models
with unobserved effects. The two models make different assumptions about the correlation
between observed and unobserved components and it it is important to understand which set
of assumptions are the most appropriate in empirical applications. Next week's exercise goes
through an econometric test procedure (the Hausman test) that tests the assumptions of the two
models against each other. This week's exercise starts out by estimating the unobserved effects
model allowing for arbitrary contemporaneous correlation between the unobserved individual
effect and the explanatory variables. We shall use two estimators: The Fixed-Effects (FE)
estimator and the First-Difference (FD) estimator. <br>

Before we start working on some exercises we will briefly introduce two concepts in Python. First, importing and exporting data. Second, using functions. If you are already familiar
with these features, you can skip the next two sections and jump directly to the exercises.

First, import all necessary packages

In [1]:
import numpy as np
from numpy import linalg as la
import pandas as pd
from io import StringIO
from tabulate import tabulate
from matplotlib import pyplot as plt

## Importing and exporting data in Python
The easiest way to import data into an numpy array is using a .txt file. Normally we specify a path to the text file, but we will create a fake one to illustrate.

In [2]:
# Create a fake file for easy use.
fake_file = StringIO("0 1\n 2 3")
print(f"Fake file looks like this: \n {fake_file.getvalue()}")
print()

# Load the fake txt file into a numpy array.
data = np.loadtxt(fake_file)
print(f'Loaded into a numpy array, we get the following {type(data)}: \n {data}')

Fake file looks like this: 
 0 1
 2 3

Loaded into a numpy array, we get the following <class 'numpy.ndarray'>: 
 [[0. 1.]
 [2. 3.]]


Sadly, there is no direct way to load an excel sheet into numpy. The easiest solution is to use pandas as an intermediate.

In [3]:
# We save the fake file we created earlier as an excel file, so that we can illustrate
# how to import using excel.
to_export = pd.DataFrame(data)
to_export.to_excel('test_file.xlsx', header=None, index=None)

# Its important to note that Pandas will treat the first row as a header. If there is no, header
# this needs to be specified. There are also alot of extra options to load specific sheets, or
# only parts of the sheets and tons of extra options.
df_import = pd.read_excel('test_file.xlsx', header=None)
np_array = df_import.to_numpy()
print(np_array)

[[0 1]
 [2 3]]


### Exporting Data
To save a numpy array as a .txt file is easy:

In [4]:
np.savetxt('real_file.txt', np_array)
print()




*If one have large numpy arrays and want to store them efficiently, they can be saved as a binary .npy files. Such files are not compatible with other programs.*

## Excercise with within-groups estimation (FE)

Consider the following linear model,

$$ y_{it} = \boldsymbol{x'}_{it}\boldsymbol{\beta} + c_i + u_{it}, \tag{1} $$

where $i = 0, ..., N$ indexes the cross sectional unit that is observed (e.g., households), and $t = 0, ..., T$ denotes time (e.g. weeks, years). $\boldsymbol{x'}_{it}$ is a $K \times 1$ vector of regressors, $\boldsymbol{\beta}$ contains the $K$ parameters to be estimated, $c_i$ is an unobserved individual specifc component which is constant across time periods, and $u_{it}$ is an unobserved random error term.  <br>
If $c_i$ turns out to be an additional error term correated with the regressors in the sense of $E[c_i\boldsymbol{x}_{it}]=0$ for all $t$, then $\boldsymbol{\beta}$ can be consistently estimated by pooled OLS (POLS) (as $N\rightarrow \infty$ for fixed $T$), albeit in an inefficient manner. To see this, consider the joint error termt $v_{it} = c_i + u_{it}$, and note that,
$$E[v_{it}\boldsymbol{x}_{it}] = E[c_{i}\boldsymbol{x}_{it}] + E[u_{it}\boldsymbol{x}_{it}] = \boldsymbol{0},$$

so that the usual OLS assumptions are satisfied. Conversely, if $c$ is systematically related to one or more of the observed variables in the sense of $E[c_{i}\boldsymbol{x}_{it}] \neq \boldsymbol{0}$, then the POLS estimator is _not_ consistent for $\boldsymbol{\beta}$.

### Example 1. 
When might $E[c_{i}\boldsymbol{x}_{it}] \neq \boldsymbol{0}$? Consider a model designed to investigate if union membership affects wages. The model explains wages as a function of experience and their union membership.

$$ ln(wage_{it}) = \beta_1 exper_{it} + \beta_2 exper^2_{it} + \beta_3 union_{it} + c_i + u_{it}, $$

where $c_i$ is an individual-specific constant that summarizes innate and unobserved characteristics. If people select into union or non-union jobs based on which sector rewards their innate characteristics best, then $E[uniont_{it}c_i]\neq0$. For this reason, it doesn't seem reasonable to use OLS on the pooled data. <br>
In this example, the inconsistency of OLS is caused by the presence of $c$. The conventional approach to deal with this problem in linear panel data models is to transform equation (1) such that $c$ vanishes, and the transformed model allows $\boldsymbol{x}$ to be estimated by OLS. Because the model is linear, we may rid ourselves of $c$ using relatively simple, linear, transformations. In the following, we shall consider two such transformations: i) the _within-groups_ transformation, and ii) the _first-difference_ transformation.

## Fixed Effects and Within-Groups Transformation

The within-groups transformation subtracts from each variable
its mean within each cross sectional unit. Consequently, every time-invariant variables disappear when using this transformation. To make the within-groups transformation more explicit, take the average of equation (1) across $T$ for each $i$ to obtain

\begin{equation}
\bar{y}_{i}=\mathbf{\bar{x}}_{i}'\mathbf{\beta}+c_{i}+\bar{u}_{i}, \tag{2}
\end{equation}

where $\bar{y}_{i}=T^{-1}\sum_{t=1}^{T}y_{it}$, $\mathbf{\bar{x}}_{i}=T^{-1}\sum_{t=0}^{T}\mathbf{x}_{it}$,
$\bar{y}_{i}=T^{-1}\sum_{t=0}^{T}y_{it}$, and $T^{-1}\sum_{t=0}^{T}c_{i}=c_{i}$.
Subtract equation (2) from equation (1) to get
\begin{align}
y_{it}-\bar{y}_{i} & =\left(\mathbf{x}_{it}-\mathbf{\bar{x}}_{i}\right)'\mathbf{\beta}+(\color{red}{c_{i}-c_{i}})+\left(u_{it}-\bar{u}_{i}\right) \\
\Leftrightarrow\ddot{y}_{it} & =\ddot{\mathbf{x}}_{it}'\mathbf{\beta}+\ddot{u}_{it}. \tag{3}
\end{align}

This simple manipulation of the empirical model has eliminated
$c_{i}$ by subtracting the mean within each *i*-group. This
is called the *within transformation*, and a within-transformed
variable is denoted $\ddot{y}_{it}=y_{it}-\bar{y}_{i}$. The parameters of interest, $\boldsymbol{\beta},$ can be estimated by OLS on the transformed data, i.e.

\begin{equation}
\hat{\mathbf{\beta}}_{FE}=(\mathbf{\ddot{X}}'\mathbf{\ddot{X}})^{-1}\mathbf{\ddot{X}}'\ddot{\mathbf{y}},
\end{equation}

where $\mathbf{\ddot{X}}$ is the $NT\times K$ matrix and $\ddot{\mathbf{y}}$ the $NT\times1$ vector arising from stacking the observables of (3), i.e., $\ddot{\mathbf{x}}_{it}'$ and $\ddot{y}_{it}$, over first $t$ and then $i$.

## FE Assumptions

Let $\mathbf{\ddot{X}}_{i}$ denotes the $T\times K$ matrix arising
from stacking $\ddot{\mathbf{x}}_{it}'$ over $t$. (We here keep
the $i$ subscript to avoid a clash of notation.) We invoke the following assumptions

\begin{eqnarray*}
FE.1 & : & E[u_{it}|\mathbf{x}_{i1},..,\mathbf{x}_{iT},c_{i}]=0,\quad\text{ for }t=1,\dotsc,T,\\
FE.2 & : & \text{Rank }E[\mathbf{\ddot{X}}_{i}'\mathbf{\ddot{X}}_{i}]=K,\quad\text{ for }i=1,\dotsc,N\\
FE.3 & : & E[\mathbf{u}_{i}\mathbf{u}_{i}'|\mathbf{x}_{i},c_{i}]=\sigma_{u}^{2}\mathbf{I}_{T},\quad\text{ for }i=1,\dotsc,N.
\end{eqnarray*}

Under the strict exogeneity assumption ($FE.1$) and the rank condition
($FE.2$), the FE estimator, $\hat{\mathbf{\beta}}_{FE}$, consistently estimate $\mathbf{\beta}$ as $N\to\infty$ for fixed $T$. Under FE.3, $\hat{\mathbf{\beta}}_{FE}$ is also asymptotically efficient. (But the latter assumption is not needed for consistency.)

In order to perform inference on the obtained parameter
estimates, we need standard errors of the parameter estimates. If
the unobservables $\{u_{it}\}_{t=1}^{T}$ of (1) satisfy
$FE.3$, then the variance-covariance matrix of the FE estimator may
be estimated by

\begin{equation}
\widehat{\mathrm{var}}(\hat{\mathbf{\beta}}_{FE})=\hat{\sigma}_{u}^{2}(\mathbf{\ddot{X}}'\mathbf{\ddot{X}})^{-1},
\end{equation}

where $\hat{\sigma}_{u}^{2}:=\widehat{\ddot{\mathbf{u}}}'\widehat{\ddot{\mathbf{u}}}/[N\left(T-1\right)-K]$
and $\widehat{\ddot{\mathbf{u}}}:=\ddot{\mathbf{y}}-\mathbf{\ddot{x}}'\mathbf{\beta}$ so that $\widehat{\ddot{\mathbf{u}}}'\widehat{\ddot{\mathbf{u}}}=\sum_{i=1}^{N}\sum_{t=1}^{T}\hat{\ddot{u}}_{it}^{2}$.


## Transformning data using the `perm` function

The main challenge in implementing (3) in Python lies in de-meaning the variables, i.e., constructing $\ddot{y}_{it}=y_{it}-\bar{y}_{i}$
and $\mathbf{\ddot{x}}_{it}=\mathbf{x}_{it}-\mathbf{\bar{x}}_{i}$.
On the *individual level*, this can be done by premultiplying equation (1) by a transformation matrix

\begin{equation}
\mathbf{Q}_{T}:=\mathbf{I}_{T}-\left(\begin{array}{ccc}
1/T & \ldots & 1/T\\
\vdots & \ddots & \vdots\\
1/T & \ldots & 1/T
\end{array}\right)_{T\times T}.
\end{equation}

However, even though $\mathbf{Q}_{T}\mathbf{y}_{i}=\ddot{\mathbf{y}}_{i}$, we can't simply multiply the full data vector, $\mathbf{y}=(\mathbf{y}_{1},\dots,\mathbf{y}_{N})'$, with $\mathbf{Q}_{T}$ since it needs to be done for each individual. Towards this end, the Python function `perm(P,x)` picks out the elements of the input-vector (here `x`) and premultiplies
with the input-matrix `P` for one individual at the time (using
a `for` loop). For example, 
\begin{align*}
`perm`\left(\mathbf{Q}_{T},\begin{pmatrix}\mathbf{y}_{1}\\
\vdots\\
\mathbf{y}_{N}
\end{pmatrix}\right)=\begin{pmatrix}\mathbf{Q}_{T}\mathbf{y}_{1}\\
\vdots\\
\mathbf{Q}_{T}\mathbf{y}_{N}
\end{pmatrix}=\begin{pmatrix}\ddot{\mathbf{y}}_{1}\\
\vdots\\
\ddot{\mathbf{y}}_{N}
\end{pmatrix} & =\ddot{\mathbf{y}}.
\end{align*}

The same goes for $\textbf{x}$-input. (You may want to
take a look under the hood of this function.)

## Exercises with FE --- Within-Groups Estimation

The exercise takes up the union membership example from before. The data set WAGEPAN.TXT contains information about 545 men who worked every year from 1980 to 1987 in the US. The variables of interest are


| Variable | Content |
|-|-|
| nr | Variable that identifies the individual  |
| year | Year of observation |
| Black | Black |
| Hisp | Hispanic |
| Educ | Years of schooling |
| Exper | Years since left school |
| Expersq | Exper2 |
| Married | Marital status |
| Union | Union membership |
| Lwage | Natural logarithm of hourly wages |

Consider the following wage equation:

$$
\begin{align}
\ln\left(wage_{it}\right) & =\beta_{0}+\beta_{1}\textit{exper}_{it}+\beta_{2}\textit{exper}_{it}^{2}+\beta_{3}\textit{union}_{it}+\beta_{4}\textit{married}_{i}\nonumber \\
 & \quad+\beta_{5}\textit{educ}_{i}+\beta_{6}\textit{hisp}_{i}+\beta_{7}\textit{black}_{i}+c_{i}+u_{it} \tag{4}
\end{align}
$$

Note that *educ, *hisp*, and *black* are time-invariant variables.

## FE Questions
### FE (a):
Consider for the moment the unobserved components of (4) as one (composite) error term $v_{it}=c_{i}+u_{it}$ and estimate (4) by pooled OLS. What assumptions are made about $E\left[c_{i}\mathbf{x}_{it}\right]$ and $E\left[u_{it}\mathbf{x}_{it}\right]$ when justifying this estimation approach?

In [5]:
# First, import the data into numpy. 
data = np.loadtxt('wagepan.txt', delimiter=",")
id_array = np.array(data[:, 0])

# Count how many persons we have. This returns a tuple with the unique IDs,
# and the number of times each person is observed.
unique_id = np.unique(id_array, return_counts=True)
n = unique_id[0].size
t = int(unique_id[1].mean())
year = np.array(data[:, 1], dtype=int)

# Load the rest of the data into arrays.
y = np.array(data[:, 8]).reshape(-1, 1)
X = np.array(
    [np.ones((y.shape[0])), 
    data[:, 2],
    data[:, 4],
    data[:, 6],
    data[:, 3],
    data[:, 9],
    data[:, 5],
    data[:, 7]]
).T

# Lets also make some variable names
label_y = 'Log wage'
label_X = [
    'Constant', 
    'Black', 
    'Hispanic', 
    'Education', 
    'Experience', 
    'Experience sqr', 
    'Married', 
    'Union'
]

In [6]:
# Define the OLS estimator, and then call the function.
def estimate_ols(y, X, fe=False, n=0, t=0):
    b_hat = la.inv(X.T@X)@(X.T@y)
    residual = y - X@b_hat
    TSS = (y - np.mean(y)).T@(y - np.mean(y))
    SSR = residual.T@residual
    ESS = TSS - SSR
    R2 = np.array(ESS/TSS)

    # If we are estiamting a FE model, we need to correct sigma.
    if not fe:
        sigma = np.array(SSR/(X.shape[0] - X.shape[1]))
    else:
        sigma = np.array(SSR/(n*(t - 1) - X.shape[1]))
    b_var = sigma*la.inv(X.T@X)
    b_std = np.sqrt(b_var.diagonal()).reshape(-1, 1)
    t_values = b_hat/b_std

    return b_hat, b_std, sigma, t_values, R2
b_hat, b_std, sigma, t_values, R2 = estimate_ols(y, X)

In [7]:
# Print the table
def print_table(headers, title, label_X, label_y, b_hat, b_std, t_values):
    table = []
    for i, name in enumerate(label_X):
        table_row = [name, b_hat[i], b_std[i], t_values[i]]
        table.append(table_row)
        
    print(title)
    print(f'Dependent variable: {label_y}\n')
    print(tabulate(table, headers, floatfmt='.4f'))
    print(f'R\u00b2 = {R2[0, 0]:.3f}')
    print(f'\u03C3\u00b2 = {sigma[0, 0]:.3f}' )

headers = ['', 'Beta hat', 'Std', 'T value']
title = 'Pooled OLS'
print_table(headers, title, label_X, label_y, b_hat, b_std, t_values)

Pooled OLS
Dependent variable: Log wage

                  Beta hat     Std    T value
--------------  ----------  ------  ---------
Constant           -0.0347  0.0646    -0.5375
Black              -0.1438  0.0236    -6.1055
Hispanic            0.0157  0.0208     0.7543
Education           0.0994  0.0047    21.2476
Experience          0.0892  0.0101     8.8200
Experience sqr     -0.0028  0.0007    -4.0272
Married             0.1077  0.0157     6.8592
Union               0.1801  0.0171    10.5179
R² = 0.187
σ² = 0.231


### FE (b):
Within transform the data. What happens to *educ, hisp, and black* and $x_{it1}\equiv1$ when the data are within transformed? What is the rank of the within transformed $\mathbf{X}$ matrix? Why?

In [8]:
def demeaning_matrix(t):
    Q_T = np.eye(t) - np.tile(1/t, (t, t))
    return Q_T

In [9]:
def perm(Q_T, Z):
    '''
        Q_T is the transformation matrix.
        Z is the matrix that is to be transformed.
    '''
    # We can infer t from the shape of the transformation matrix.
    t = Q_T.shape[0]

    # Initialize the numpy array
    A = np.zeros(Z.shape)

    # Loop over the individuals, and permutate their values.
    for i in range(int(Z.shape[0]/t)):
        A[i*t : (i + 1)*t] = Q_T@Z[i*t : (i + 1)*t]
    return A

        

In [10]:
Q_T = demeaning_matrix(t)
y_demean = perm(Q_T, y)
X_demean = perm(Q_T, X)

# Check rank of demeaned matrix, and return its eigenvalues.
def check_rank(X):
    print(f'Rank of demeaned X: {la.matrix_rank(X)}')
    lambdas, V = la.eig(X.T@X)
    np.set_printoptions(suppress=True)  # This is just to print nicely.
    print(f'Eigenvalues of within-transformed X: {lambdas.round(decimals=0)}')
check_rank(X_demean)

# We need to drop the columns that are constant over time.
# Try to first estimate the OLS without removing the 0-columns.
X_demean_non_singular = X_demean[:, 4:]
label_X_non_singular = label_X[4:]

Rank of demeaned X: 4
Eigenvalues of within-transformed X: [4248875.    1871.     365.     329.       0.       0.       0.       0.]


### FE (c):
Estimate (4) on within transformed data (make sure that the employed $\mathbf{\ddot{X}}$ has full rank - drop columns if necessary). How big is the union premium according to the estimate from the FE model? Compare this with the estimate that you calculated from the pooled OLS regression. What does this suggest about $E\left[union_{it}c_{i}\right]$?

In [11]:
# Estimate FE OLS using the demeaned variables.
b_hat, b_std, sigma, t_values, R2 = estimate_ols(
    y_demean, X_demean_non_singular, fe=True, n=n, t=t
)
title = 'FE'
print_table(headers, title, label_X_non_singular, label_y, b_hat, b_std, t_values)

FE
Dependent variable: Log wage

                  Beta hat     Std    T value
--------------  ----------  ------  ---------
Experience          0.1168  0.0084    13.8778
Experience sqr     -0.0043  0.0006    -7.1057
Married             0.0453  0.0183     2.4743
Union               0.0821  0.0193     4.2553
R² = 0.178
σ² = 0.123


## Excercises with first-difference estimation (FD)

The within transformation is one particular transformation
that enables us to get rid of $c_{i}$. An alternative is the first-difference transformation. To see how it works, lag Equation (1) one period and subtract it from (1) such that

\begin{equation}
\Delta y_{it}=\Delta\mathbf{x}_{it}'\mathbf{\beta}+\Delta u_{it},\quad t=\color{red}{2},\dotsc,T, \tag{5}
\end{equation}

where $\Delta y_{it}:=y_{it}-y_{it-1}$, $\Delta\mathbf{x}_{it}:=\mathbf{x}_{it}-\mathbf{x}_{it-1}$
and $\Delta u_{it}:=u_{it}-u_{it-1}$. As was the case for the within
transformation, first differencing eliminates the time invariant component
$c_{i}$. Note, however, that one time period is lost when differencing.

### FD Assumptions

\begin{eqnarray*}
FD.1 & : & E[u_{it}|\mathbf{x}_{i1},..,\mathbf{x}_{iT},c_{i}]=0 \; \; \; \; t=1,\dots,T, \; \;  \text{(as in }FE.1\text{)} \\
FD.2 & : & \text{Rank }E[\Delta\mathbf{x}_{i}\Delta\mathbf{x}_{i}']=K,\quad\text{ for }i=1,\dots,N,\\
FD.3 & : & E[\mathbf{e}_{i}\mathbf{e}_{i}'|\mathbf{x}_{i},c_{i}]=\sigma_{u}^{2}\mathbf{I}_{T-1} \; \text{ with }\mathbf{e}_{i}:=\Delta\mathbf{u}_{i},\quad\text{ for }i=1,\dots,N.
\end{eqnarray*}
Under the strict exogeneity assumption ($FD.1$) and the rank condition ($FD.2$), the FD estimator

\begin{equation}
\hat{\beta}_{FD}=\left(\Delta\mathbf{X}\Delta\mathbf{X}\right)^{-1}\Delta\mathbf{X}^{\prime}\Delta\mathbf{y}
\end{equation}

consistently estimates $\boldsymbol{\beta}$ (as $N\to\infty$ for
fixed $T$). If also FD.3 holds, then $\hat{\boldsymbol{\beta}}_{FD}$
is asymptotically efficient. (Again, the latter assumption is not
needed for consistency.)

Under $FD.3$, $u_{it}=u_{it-1}+e_{it}$ follows a random walk. This
is the opposite extreme relative to assumption $FE.3$, where the
$u_{it}$ are assumed to be serially uncorrelated. In many cases,
the truth is likely to lie somewhere in between. The variance-covariance
matrix of the FE estimator may be estimated by

\begin{equation}
\widehat{\text{var}}(\hat{\mathbf{\beta}}_{FD})=\hat{\sigma}_{e}^{2}\left(\Delta\mathbf{X}'\Delta\mathbf{X}\right)^{-1}
\end{equation}

where $\hat{\sigma}_{e}^{2}:=\hat{\mathbf{e}}^{\prime}\hat{\mathbf{e}}/[N\left(T-1\right)-K]$
and $\hat{e}_{it}:=$ $\Delta y_{it}-\Delta\mathbf{x}_{it}'\widehat{\mathbf{\beta}}$.

Notice how we, both in the case of FE and FD, manipulate the model
in a way that allows the standard OLS assumptions to hold on the *transformed* data, and then simply treat the transformed model as if it was our model of interest. Under exogeneity ($FE.1/FD.1$) the choice between first difference and the within estimator pertains to efficiency considerations, and the choice hinges on the assumptions made about the serial correlation of the errors $(FE.3$$/$$FD.3)$.

To estimate the coefficients in (5) in Python, we must first
difference all the variables, i.e construct $\Delta y_{it}=y_{it}-y_{it-1}$ and $\Delta\mathbf{x}_{it}=\mathbf{x}_{it}-\mathbf{x}_{it-1}$. This can be done by premultiplying the variables in levels ($y_{i}$ and $\mathbf{x}_{i})$ by the transformation matrix $\mathbf{D}$ given by

\begin{equation}
\mathbf{D}:=\left(\begin{array}{cccccc}
0 & 0 & 0 & 0 & 0 & 0\\
-1 & 1 & 0 & \ldots & 0 & 0\\
0 & -1 & 1 &  & 0 & 0\\
\vdots &  &  & \ddots &  & \vdots\\
0 & 0 & 0 & \ldots & -1 & 1
\end{array}\right)_{T\times T}.
\end{equation}

(Can you see why $\mathbf{D}$ gets the job done?)

## FD Questions
### FD (a):
Construct $\mathbf{D}$ and use the procedure `perm` $(\mathbf{D},\mathbf{x})$ to compute first differences of the elements of $\mathbf{y}$ and $\mathbf{x}$. What happens to *educ, hisp* and *black* and $x_{it1}\equiv1$ when the data are transformed into first differences? What is the rank of the first differenced $\mathbf{x}$-matrix? Why?

In [12]:
def fd_matrix(n):
    D_T = np.eye(8) - np.eye(8, k=-1)
    D_T[0, :] = 0
    return D_T

In [13]:
# Transform the data.
D_T = fd_matrix(n)
y_diff = perm(D_T, y)
X_diff = perm(D_T, X)

# Again, check rank condition.
check_rank(X_diff)

# Remember to remove the first observation for each person (which is year 1980).
# Not strictly necessary?
y_diff = y_diff[year != 1980]
X_diff = X_diff[year != 1980]

# Remember to remove linear dependent columns
X_diff_non_singular = X_diff[:, 4:]

Rank of demeaned X: 4
Eigenvalues of within-transformed X: [753711.    356.    545.    508.      0.      0.      0.      0.]


### FD (b):
Estimate (4) in first differences. How big is the union premium according to the estimate from this model? Compare the FD estimate with the estimate that you calculated from the FE regression. Is there a difference? If yes, what (if anything) can we conclude based on this finding?

In [14]:
b_hat, b_std, sigma, t_values, R2 = estimate_ols(y_diff, X_diff_non_singular)
title = 'FD'
print_table(headers, title, label_X_non_singular, label_y, b_hat, b_std, t_values)

FD
Dependent variable: Log wage

                  Beta hat     Std    T value
--------------  ----------  ------  ---------
Experience          0.1158  0.0196     5.9096
Experience sqr     -0.0039  0.0014    -2.8005
Married             0.0381  0.0229     1.6633
Union               0.0428  0.0197     2.1767
R² = 0.004
σ² = 0.196


## Excercise comparing FE and FD
### Question FE v. FD (a):
Test for serial correlation in the errors using an auxilliary AR(1) model, to test assumption FD.3, where the errors $e_{it} = \Delta u_{it}$ should be serially uncorrelated.

We can easily test this assumption given the OLS residuals from equation (5). Run the regression (note that you will loose data for
the *two* first periods)
\begin{equation}
\hat{e}_{it}=\rho\hat{e}_{it-1}+error_{it},\quad t=\color{red}{3},\dotsc,T,\quad i=1,\dotsc,N
\end{equation}

Do you find any evidence for serial correlation? Does FD.3 seem appropriate? And why don't we include an intercept in this auxilliary equation?

*Note:* Under FE.3, the idiosyncratic errors $u_{it}$
are uncorrelated. However, FE.3 implies that the $e_{it}$'s are autocorrelated. In fact, of the $u_{it}$'s are serially uncorrelated to beging with, corr$\left(e_{it},e_{it-1}\right)=-0.5$. (Check!) This test is of course only valid if the explanatory variables are strictly exogenous!

*Hint:* You can use the `perm` function to lag
the error term variable. Consider the following; 
\begin{align*}
\underset{T\times T}{\begin{pmatrix}0 & 0 & 0 & \cdots & 0 & 0\\
1 & 0 & 0 & \cdots & 0 & 0\\
0 & 1 & 0 & \cdots & 0 & 0\\
\vdots & \vdots & \vdots & \ddots & \vdots & \vdots\\
0 & 0 & 0 & \cdots & 1 & 0
\end{pmatrix}}\underset{1\times T}{\begin{pmatrix}y_{1}\\
y_{2}\\
\vdots\\
y_{T}
\end{pmatrix}}=\underset{1\times T}{\begin{pmatrix}y_{2}\\
y_{3}\\
\vdots\\
0
\end{pmatrix}}
\end{align*}

In [17]:
def serial_correlation(y, X, year):
    # We often use _ for a variable that we are not interested in,
    # but is returned anywa.
    b_hat, _, _, _, _ = estimate_ols(y, X)
    e = y - X@b_hat
    
    # Create a lag to estimate the error on.
    L_T = np.eye(y.size, k=-1)
    e_x = perm(L_T, e)

    # We then need to remove the first obs for every person again.
    reduced_year = year[year != 1980]
    e = e[reduced_year != 1981]
    e_l = e_x[reduced_year != 1981]

    return estimate_ols(e, e_l)

In [18]:
b_hat, b_std, sigma, t_values, R2 = serial_correlation(y_diff, X_diff_non_singular, year)
# Replace the t-value.

t_values[0] = (b_hat[0] + 0.5)/b_std[0]

label_y = 'OLS residual, e\u1d62\u209c'
label_e = ['e\u1d62\u209c\u208B\u2081']
title = 'Serial Correlation'
print_table(headers, title, label_e, label_y, b_hat, b_std, t_values)

Serial Correlation
Dependent variable: OLS residual, eᵢₜ

         Beta hat     Std    T value
-----  ----------  ------  ---------
eᵢₜ₋₁     -0.3961  0.0147     7.0843
R² = 0.182
σ² = 0.143


### Question FE v FD (b):
Add interactions on the form $d_{81}\cdot educ, d_{82}\cdot educ, ..., d_{87}\cdot educ$ and estimate the model with fixed effect. Has the return to education increased over time?

*Hint:* Remember that $educ_{i}$ doesn't vary over
time! Therefore we didn't use $educ$ in levels in the FE estimation.
However, if we suppose that the structural equation (4) contains a term $\sum_{s=2}^{T}\delta_{s}d_{s}educ_{i}$, it will be perfectly fine to within-transform these interactions since they vary over time (although in a highly structured manner - they equal
zero in all time periods but one, and then $educ$). Note that one
period is dropped for the within-transformation to work whereas the
levels term, $\beta_{5}educ_{i}$, is dropped to avoid producing a
constant row.

*Programming hint:* You want to append the dataset with a dummy matrix, that would look something like this:

$$
\begin{bmatrix}
0 & 0 & 0 & 0 & 0 & 0 & 0 \\
14 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 14 & 0 & 0 & 0 & 0 & 0 \\
\vdots & \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & 0 & 0 & 0 & 14 \\
0 & 0 & 0 & 0 & 0 & 0 & 0 \\
0 & 9 & 0 & 0 & 0 & 0 & 0 \\
\vdots & \vdots & \ddots & \vdots & \vdots & \vdots & \vdots \\
\end{bmatrix}
$$

This example shows our two first persons, that have 14 and 9 years of education respectively. This matrix can be created as a product of two matrices, what would they look like? Why is the first row only zeros?

In [30]:

# This dummy block has a 0 row, as we need to exclude one
# year in order to not end up in the dummy trap.
dummy_block = np.eye(t, k=-1)[:, :-1]

# Expand thid dummy block to all persons
dummy_matrix = np.tile(dummy_block, (n, 1))

# We now create a n*t-1 matrix, it with the person's education 
expanded_educ = np.transpose([X[:, 3]] * (t-1))

# We can now multiplu the year dummy with a person's education
educ_dummies = dummy_matrix*expanded_educ
educ_demean = perm(Q_T, educ_dummies)
X_demean_dummies = np.hstack([X_demean_non_singular, educ_demean])

label_x_interactions = label_X_non_singular + ['E81', 'E82', 'E83', 'E84', 'E85', 'E86', 'E87']

In [31]:
b_hat, b_std, sigma, t_values, R2 = estimate_ols(y_demean, X_demean_dummies, fe=True, n=n, t=t)
title = 'FE with year interactions'
print_table(headers, title, label_x_interactions, label_y, b_hat, b_std, t_values)

FE with year interactions
Dependent variable: OLS residual, eᵢₜ

                  Beta hat     Std    T value
--------------  ----------  ------  ---------
Experience          0.1705  0.0273     6.2462
Experience sqr     -0.0060  0.0009    -6.9581
Married             0.0475  0.0183     2.5925
Union               0.0794  0.0193     4.1138
E81                -0.0010  0.0026    -0.4009
E82                -0.0062  0.0041    -1.5224
E83                -0.0114  0.0057    -2.0006
E84                -0.0136  0.0072    -1.8787
E85                -0.0162  0.0087    -1.8578
E86                -0.0170  0.0101    -1.6804
E87                -0.0167  0.0115    -1.4619
R² = 0.181
σ² = 0.123
