In [41]:
import numpy as np
from numpy import linalg as la
from scipy.stats import chi2
from tabulate import tabulate
import LinearModelsWeek3 as lm
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [42]:
y, x, n, t, year, label_y, label_x = lm.load_example_data()

Basic Linear Unobserved Effects Panel Data (continued)
======================================================

The purpose of this week's exercises is to familiarize ourselves with
the *Random-Effects (RE) model* and *estimator*. Among other conditions,
the RE approach relies on the assumption that the unobserved component
$c_{i}$ is *mean independent* of $\mathbf{x}_{i}$, i.e.,
$E\left[c_{i}|\mathbf{x}_{i}\right]=E\left[c_{i}\right]$. This
assumption is similar to the one justifying a pooled-OLS approach.
However, as will become clear below, even when both approaches to
estimation are justified, the RE estimator has a smaller (asymptotic)
variance because it takes into account the panel structure of the data
(i.e., the availability of repeated observations concerning the same
individuals). The RE estimator is therefore interesting from an
efficiency perspective. Towards the end of this exercise, you will be
asked to test whether the assumptions of the RE model are reasonable
compared to the assumptions underlying the Fixed-Effects (FE) model from
last week's exercises. The specific procedure is often referred to as a
*Hausman test*.

The Random-Effects Estimator
----------------------------

Consider again the linear panel-data model

$$y_{it}=\mathbf{x}_{it}\mathbf{\beta}+c_{i}+u_{it},\label{Model} \tag{1}$$ 

where
$i=1,\dotsc,N$ indexes cross sectional unit (e.g., households), and
$t=1,\dotsc,T$ time (e.g., weeks, years). Again $\mathbf{x}_{it}$ is a
$1\times K$ vector of observed variables, $\mathbf{\beta}$ are the $K$
parameters of interest, $c_{i}$ is an unobserved component that is
specific to individual $i$, and $u_{it}$ is an unobserved random error
term. Rewrite (1) to get a model in terms of the composite error
terms, $v_{it}:=c_{i}+u_{it}$

$$y_{it}=\mathbf{x}_{it}\mathbf{\beta}+v_{it}\label{Model Err} \tag{2}$$ 

For the RE model we assume 

$$\begin{aligned}
RE.1\left(a\right) & :  E[u_{it}|\mathbf{x}_{i1},..,\mathbf{x}_{iT},c_{i}]=0 \quad t=1,..,T\\
RE.1\left(b\right) & :  E[c_{i}|\mathbf{x}_{i1},..,\mathbf{x}_{iT}]=E[c_{i}]=0\\
RE.2 & :  \text{Rank }E[\mathbf{X}_{i}'\boldsymbol{\mathbf{\Omega}}^{-1}\mathbf{X}_{i}]=K\\
RE.3\left(a\right) & :  E[\mathbf{u}_{i}\mathbf{u}_{i}^{\prime}|\mathbf{x}_{i},c_{i}]=\sigma_{u}^{2}\mathbf{I}_{T}\\
RE.3\left(b\right) & :  E[c_{i}^{2}|\mathbf{x}_{i}]=\sigma_{c}^{2},\end{aligned}$$

where $\mathbf{X}_{i}$ is the $T\times K$ matrix arising from stacking
$\mathbf{x}_{it}$ over $t=1,\dotsc,T$, $\boldsymbol{\Omega}$ is the
$T\times T$ variance-covariance matrix of the composite error vector
$\mathbf{v}_{i}=c_{i}\mathbf{j}_{T}+\mathbf{u}_{i}$ (elaborated on
below), and $\mathbf{j}_{T}$ is a $T\times1$ vector of ones. Note that
$\boldsymbol{\Omega}$ does not depend on $i$.

Under the assumptions $RE.1(a)$ and $RE.1(b)$, the explanatory variables
included in $\mathbf{X}_{i}$ are strictly exogenous. If, in addition,
the rank condition $\text{Rank }E[\mathbf{X}_{i}'\mathbf{X}_{i}]=K$
holds, then the parameters of $2$ may be
estimated consistently by pooled OLS. However, under
$RE.3\left(a\right)$ and $RE.3\left(b\right),$ the covariance matrix for
individual $i$ is given by the $T\times T$ matrix 

$$\begin{aligned}
\mathbf{\Omega} & = E\left[\mathbf{v}_{i}\mathbf{v}_{i}^{\prime}\right]=\left[\begin{array}{cccc}
\sigma_{c}^{2}+\sigma_{u}^{2} & \sigma_{c}^{2} & \cdots & \sigma_{c}^{2}\\
\sigma_{c}^{2} & \sigma_{c}^{2}+\sigma_{u}^{2} &  & \vdots\\
\vdots &  & \ddots & \sigma_{c}^{2}\\
\sigma_{c}^{2} & \cdots & \sigma_{c}^{2} & \sigma_{c}^{2}+\sigma_{u}^{2}
\end{array}\right]\nonumber \\
 & = \sigma_{c}^{2}\mathbf{j}_{T}\mathbf{j}_{T}^{\prime}+\sigma_{u}^{2}\mathbf{I}_{T}.  \label{CovMatrix}\end{aligned} \tag{3} $$

If no individual effect existed, we would have $v_{it}=u_{it}$, and the
covariance matrix $\mathbf{\Omega}$ would be propertional to the identity
matrix. Because the pooled OLS estimator does not take into account the
structure (3) in variance/covariance, the pooled OLS estimator
is generally inefficient (asymptotically). An efficient alternative is
*Generalized Least Squares (GLS)* \[see Wooldridge (2010, Section 7) for details, which we won't cover in detail \]. Assuming that estimates of $\sigma_{c}^{2}$ and
$\sigma_{u}^{2}$ are available, the covariance matrix can be estimated
by
$\hat{\mathbf{\Omega}}=\hat{\sigma}_{c}^{2}\mathbf{j}_{T}\mathbf{j}_{T}^{\prime}+\hat{\sigma}_{u}^{2}\mathbf{I}_{T}$.
The (feasible) *GLS estimator* is then

$$\hat{\mathbf{\beta}}_{RE}=\Big(\sum_{i=1}^{N}\mathbf{X}_{i}'\hat{\mathbf{\Omega}}^{-1}\mathbf{X}_{i}\Big)^{-1}\sum_{i=1}^{N}\mathbf{X}_{i}'\hat{\mathbf{\Omega}}^{-1}\mathbf{y}_{i}.\label{GLS} \tag{4}$$

Again, when there is no individual-specific time-constant unobservable
$(c_{i}\equiv0)$, the conditional mean of its square $(\sigma_{c}^{2})$
is zero. In this case, $\mathbf{\Omega}$ reduces to
$\sigma_{u}^{2}\mathbf{I}_{T}$, and the estimator in
(4) is numerically identical to the pooled OLS estimator. (You should check this.)

The RE estimator can be obtained by transforming the data in a
particular way and then compute the OLS estimates. Specifically,
$\mathbf{X}_{i}$ and $\mathbf{y}_{i}$ can be transformed by
premultiplying with an appropriate matrix, so that 

$$\begin{aligned}
\check{\mathbf{X}}_{i}:=\mathbf{C}_{T}\mathbf{X}_{i}, & \quad\check{\mathbf{y}}_{i}:=\mathbf{C}_{T}\mathbf{y}_{i},\\
\mathrm{where}\quad\mathbf{C}_{T}:=\mathbf{I}_{T}-\lambda\mathbf{P}_{T},\quad\mathbf{P}_{T} & :=\mathbf{I}_{T}-\mathbf{Q}_{T}=T^{-1}\mathbf{j}_{T}\mathbf{j}_{T}',\\
\text{and}\quad\lambda & :=1-\sqrt{\frac{\sigma_{u}^{2}}{\sigma_{u}^{2}+T\sigma_{c}^{2}}}.\end{aligned}$$

(The $\mathbf{Q}_{T}=\mathbf{I}_{T}-T^{-1}\mathbf{j}_{T}\mathbf{j}_{T}'$
matrix is the same as in last week's problem set.) The infeasible RE
estimator $\boldsymbol{\tilde{\beta}}_{RE}^{\mathrm{}}$ (infeasible
because it relies on knowledge of $\boldsymbol{\Omega}$) may then be
calculated as the pooled OLS estimator using the transformed sample,
$$\boldsymbol{\tilde{\beta}}_{RE}^{\mathrm{}}=\left(\mathbf{\check{X}}'\mathbf{\check{X}}\right)^{-1}\mathbf{\check{X}}'\check{\mathbf{y}},\label{re} \tag{5}$$
where $\mathbf{\check{X}}$ is now the $NT\times K$ matrix and
$\check{\mathbf{y}}$ the $NT\times1$ vector arising from stacking
transformed variables stacked over $t$ and then $i$.

To make the procedure (5) feasible, we need estimates of $\sigma_{c}^{2}$ and
$\sigma_{u}^{2},$ such that we can construct
$\hat{\lambda}=1-\sqrt{\widehat{\sigma}_{u}^{2}/(\widehat{\sigma}_{u}^{2}+T\widehat{\sigma}_{c}^{2})}$.
Within- and between-group residuals can be used to obtain estimates of
$\sigma_{c}^{2}$ and $\sigma_{u}^{2}$, respectively: 

$$\begin{aligned}
\hat{\sigma}_{u}^{2} & = \frac{1}{NT-N-K}\left(\ddot{\mathbf{y}}-\mathbf{\ddot{X}}\hat{\mathbf{\beta}}_{FE}\right)^{\prime}\left(\ddot{\mathbf{\mathbf{y}}}-\mathbf{\ddot{X}}\hat{\mathbf{\beta}}_{FE}\right)=\frac{\widehat{\ddot{\mathbf{u}}}^{\prime}\widehat{\ddot{\mathbf{u}}}}{NT-N-K}\label{su} & (6) \\
\hat{\sigma}_{w}^{2} & = \frac{1}{T}\frac{1}{N-K}\left(\bar{\mathbf{y}}-\mathbf{\bar{X}}\hat{\mathbf{\beta}}_{BE}\right)^{\prime}\left(\bar{\mathbf{y}}-\mathbf{\bar{X}}\hat{\mathbf{\beta}}_{BE}\right) & (7) \\
\hat{\sigma}_{c}^{2} & = \hat{\sigma}_{w}^{2}-\frac{1}{T}\hat{\sigma}_{u}^{2},\label{sc} & (8) \end{aligned}$$

where $\mathbf{\hat{\mathbf{\beta}}}_{FE}$ is the FE estimator and
$\ddot{\mathbf{y}}$ and $\mathbf{\ddot{X}}$ are within-transformed
counterparts of $\mathbf{y}$ and $\mathbf{X}$, respectively. (See last
week's problem set.) Correspondingly, $\bar{\mathbf{y}}$ and
$\mathbf{\bar{X}}$ are between-group transformed variables and
$\hat{\mathbf{\beta}}_{BE}=\left(\mathbf{\bar{X}}^{\prime}\mathbf{\bar{X}}\right)^{-1}\mathbf{\bar{X}}^{\prime}\bar{\mathbf{y}}$ is
the between-groups estimator. The between-groups estimator is not something we have introduced before, but is attained by regressing the time-averaged outcomes $\overline{y}_i$ on the time-averaged regressors $\overline{\mathbf{x}}_i,i=1,2,\dotsc,N$.

An estimate of the variance-covariance matrix for
$\hat{\mathbf{\beta}}_{RE}$ may then be obtained as

$$\widehat{\text{var}}(\hat{\mathbf{\beta}}_{RE})=\hat{\sigma}_{\breve{v}}^{2}\left(\mathbf{\check{X}}^{\prime}\mathbf{\check{X}}\right)^{-1},\label{VarRE} \tag{9}$$
where

$$\hat{\sigma}_{\breve{v}}^{2}:=\frac{1}{NT-K}\sum_{i=1}^{N}\sum_{t=1}^{T}\breve{v}_{it}^{2},$$

and
$\breve{v}_{it}=\breve{y}_{it}-\mathbf{\breve{x}}_{it}\boldsymbol{\hat{\beta}}_{RE}$
are the *quasi-demeaned residuals* obtained from the pooled regression
of $\check{y}_{it}$ on $\check{\mathbf{x}}_{it}$ (with $\lambda$
replaced by its estimate $\widehat{\lambda}$).

The Hausman Test
----------------

We now turn to the test for whether the assumptions of the RE model are
plausible when tested against the assumptions of the FE model. The
crucial assumption for consistency of the RE estimator is that
$E\left[c_{i}\mathbf{x}_{it}\right]=\mathbf{0}$ for all $t$, as opposed
to the FE estimator which allows
$E\left[c_{i}\mathbf{x}_{it}\right]\neq\mathbf{0}$.

If $E\left[c_{i}\mathbf{x}_{it}\right]=\mathbf{0}$ is true, then the RE
and FE estimators are both consistent. However, the FE estimator is
inefficient as it does not exploit the structure of the error variance.
On the other hand, if
$E\left[c_{i}\mathbf{x}_{it}\right]\neq\boldsymbol{0}$ for some $t$,
then the FE estimator is consistent, but the RE estimator is not. This
can be used to form a $\chi^{2}$-statistic under the null hypothesis
that $E\left[c_{i}\mathbf{x}_{it}\right]=\mathbf{0}$. The *Hausman test
statistic* and its asymptotic distribution are given by

$$H:=(\hat{\mathbf{\beta}}_{FE}-\hat{\mathbf{\beta}}_{RE})'[\widehat{\mathrm{avar}}(\hat{\mathbf{\beta}}_{FE})-\widehat{\mathrm{avar}}(\hat{\mathbf{\beta}}_{RE})]^{-1}(\hat{\mathbf{\beta}}_{FE}-\hat{\mathbf{\beta}}_{RE})\overset{d}{\to}\chi_{M}^{2},\label{eq:HausmanTestStatistic} \tag{10} $$

where $\hat{\mathbf{\beta}}_{RE}$ is now understood as the $M\times1$ vector
of RE estimates, *excluding the estimates of parameters of
time-invariant variables* (For such variables there are no FE estimates available due to the within transformation.). If the null hypothesis is correct then
$(\hat{\mathbf{\beta}}_{FE}-\hat{\mathbf{\beta}}_{RE})$ should be close to the
zero vector (by consistency). Moreover, the asymptotic efficiency of RE
ensures that *the asymptotic variance of their difference is the
difference in asymptotic variances*. This observation suggests the test
statistic 10 and constitutes Hausman's major
insight. (One of many, really.) While the variance estimates in (10) could be arbitrary (subject to
consistency), it is best to use the same estimate of $\sigma_{u}^{2}$ in
constructing both matrices---otherwise their difference need not be even
positive semi-definite. Even if inversion is allowed, a lack of positive
semi-definiteness may result in a negative value for $H$, which fits
poorly with its interpretation as a measure of distance \[and is in
direct conflict with the distributional approximation in (10)\].

Note that both models rely on the assumption of strict exogeneity
conditional on $c_{i}$, and that the Hausman statistic is only valid
under the (conditional homeskedasticity) assumption
$E\left[c_{i}^{2}|\mathbf{x}_{i}\right] = \sigma_{c}^{2}$.

Exercises
=========

The exercise takes up the union membership example from last week. Use
the data and the program from last week. First, read in the data and
construct the $\mathbf{y}$ and the $\mathbf{X}$ matrices using the
provided script. Then answer the following:

### 1. Construct the transformation matrices $Q_T$ and $P_t$

In [43]:
def mean_matrix(t):
    return np.tile(1/t, (t, t))

def demeaning_matrix(t):
    return np.eye(t) - mean_matrix(t)

In [44]:
Q_T = demeaning_matrix(t)
P_T = mean_matrix(t)

### 2.Estimate the FE model from last week’s problem set.
* Within transform the data using the procedure `perm(Q_T, X)`.
* Estimate the FE model using OLS on the transformed data $\mathbf{\ddot{y}}$ and $\mathbf{\ddot{X}}$.
* Store residuals.
* Compute $\hat{\sigma}^2_u$ according to (6).
* Compute the covariance matrix estimate, $\widehat{Var}(\hat{\beta}) = \hat{\sigma}^2_u (\mathbf{\ddot{X'}}\mathbf{\ddot{X}})^{-1}$

In [45]:
y_demeaned = lm.perm(Q_T, y)
x_demeaned = lm.perm(Q_T, x)

x_demeaned = x_demeaned[:, 4:]
label_x_demeaned = label_x[4:]

In [60]:
results_fe = lm.estimate(
    y_demeaned, x_demeaned, transform='fe', n=n, t=t
)
lm.print_table(
    labels=(label_y, label_x_demeaned), results=results_fe, 
    title='FE',
    floatfmt=['', '.3f', '.4f', '.2f']
)

FE
Dependent variable: Log wage

                  Beta      Se    t-values
--------------  ------  ------  ----------
Experience       0.117  0.0084       13.88
Experience sqr  -0.004  0.0006       -7.11
Married          0.045  0.0183        2.47
Union            0.082  0.0193        4.26
R² = 0.178
σ² = 0.123


You should get a table that looks like this:

FE <br>
Dependent variable: Log wage

|                |   Beta |     Se |   t-values |
|----------------|--------|--------|------------|
| Experience     |  0.117 | 0.0084 |      13.88 |
| Experience sqr | -0.004 | 0.0006 |      -7.11 |
| Married        |  0.045 | 0.0183 |       2.47 |
| Union          |  0.082 | 0.0193 |       4.26 |
R² = 0.178 <br>
σ² = 0.123

### 3.Estimate the BE model from last week’s problem set.
* Between transform the data using the procedure `perm(P_T, X)`.
* Estimate the BE model using OLS on the transformed data $\bar{y}$ and $\mathbf{\bar{X}}$.
* Store residuals.
* Compute $\hat{\sigma}^2_c$ according to (8).
* Compute the covariance matrix estimate, $\widehat{Var}(\hat{\beta}) = \hat{\sigma}^2_u (\mathbf{\ddot{X'}}\mathbf{\ddot{X}})^{-1}$

In [47]:
y_mean = lm.perm(P_T, y)
x_mean = lm.perm(P_T, x)

In [62]:
results_be = lm.estimate(
    y_mean, x_mean, transform='be')
lm.print_table(
    labels=(label_y, label_x), results=results_be, 
    title='BE',
    floatfmt=['', '.3f', '.4f', '.2f']
)

BE
Dependent variable: Log wage

                  Beta      Se    t-values
--------------  ------  ------  ----------
Constant         0.492  0.0776        6.34
Black           -0.139  0.0172       -8.09
Hispanic         0.005  0.0150        0.32
Education        0.095  0.0038       24.70
Experience      -0.050  0.0177       -2.85
Experience sqr   0.005  0.0011        4.54
Married          0.144  0.0145        9.93
Union            0.271  0.0164       16.55
R² = 0.219
σ² = 0.119


You should get a table that looks like this:

BE <br>
Dependent variable: Log wage

|                |   Beta |     Se |   t-values |
|----------------|--------|--------|------------|
| Constant       |  0.492 | 0.0776 |       6.34 |
| Black          | -0.139 | 0.0172 |      -8.09 |
| Hispanic       |  0.005 | 0.0150 |       0.32 |
| Education      |  0.095 | 0.0038 |      24.70 |
| Experience     | -0.050 | 0.0177 |      -2.85 |
| Experience sqr |  0.005 | 0.0011 |       4.54 |
| Married        |  0.144 | 0.0145 |       9.93 |
| Union          |  0.271 | 0.0164 |      16.55 |
R² = 0.219 <br>
σ² = 0.119

In [49]:
sigma_c = results_be['sigma'] - results_fe['sigma']/t

### 4. Calculate the RE estimator (5)
* Calculate the scalar estimate $\hat{\lambda}$.
* Quasi-demean the data using $\hat{\lambda}$, i.e. compute $\mathbf{\check{y}}_i = \mathbf{y}_i - \hat{\lambda}\mathbf{\bar{y}}_i$ and $\mathbf{\check{X}}_i = \mathbf{X}_i - \hat{\lambda}\mathbf{\bar{X}}_i$
* Estimate the RE model using OLS on the transformed data $\mathbf{\check{y}}_i$ and $\mathbf{\check{X}}_i$, $i = 0, ..., N-1$, $t=0, ...,T-1$
* Compute the variance-matrix estimate (9), extract standard errors of the estimates, and compute t-values.
* Note: The $\mathbf{X}$-matrix should contain a constant term as well as all time invariant variables (why?).

In [50]:
# Note that lambda is reserved for unnamed functions
sigma_u = results_fe['sigma']
_lambda = 1 - np.sqrt(sigma_u/(sigma_u + t*sigma_c))

In [51]:
C_t = np.eye(t) - _lambda*mean_matrix(t)
x_re = lm.perm(C_t, x)
y_re = lm.perm(C_t, y)

In [64]:
results_re = lm.estimate(
    y_re, x_re, transform='re', n=n, t=t
)
lm.print_table(
    labels=(label_y, label_x), results=results_re, _lambda=_lambda,
    title='RE',
    floatfmt=['', '.3f', '.4f', '.2f']
)

RE
Dependent variable: Log wage

                  Beta      Se    t-values
--------------  ------  ------  ----------
Constant        -0.107  0.1109       -0.97
Black           -0.144  0.0477       -3.02
Hispanic         0.020  0.0426        0.47
Education        0.101  0.0089       11.34
Experience       0.112  0.0083       13.47
Experience sqr  -0.004  0.0006       -6.82
Married          0.063  0.0169        3.73
Union            0.108  0.0179        6.00
R² = 0.178
σ² = 0.126
λ = 0.640


The table should look like this:

RE <br>
Dependent variable: Log wage

|                |   Beta |     Se |   t-values |
|----------------|--------|--------|------------|
| Constant       | -0.107 | 0.1109 |      -0.97 |
| Black          | -0.144 | 0.0477 |      -3.02 |
| Hispanic       |  0.020 | 0.0426 |       0.47 |
| Education      |  0.101 | 0.0089 |      11.34 |
| Experience     |  0.112 | 0.0083 |      13.47 |
| Experience sqr | -0.004 | 0.0006 |      -6.82 |
| Married        |  0.063 | 0.0169 |       3.73 |
| Union          |  0.108 | 0.0179 |       6.00 |
R² = 0.178 <br>
σ² = 0.126 <br>
λ = 0.640

### 5. Calculate the Hausman statistic and the p-value associated with the test of the crucial RE.1(b) assumption. Is this assumption satisfied? Which model should be preferred?

In [53]:
# First calculate the covar matrices.
# Remember to remove the FE time invarant regressors from RE
hat_diff = results_fe['b_hat'] - results_re['b_hat'][4:]
cov_diff = la.inv(results_fe['cov'] - results_re['cov'][4:, 4:])

In [54]:
H = hat_diff.T@(cov_diff@hat_diff)
# This takes the chi2 value, and then DF.
p_val = chi2.sf(H, hat_diff.size)

In [55]:
table = [results_fe['b_hat'], results_re['b_hat'][4:], hat_diff]

In [56]:
table = []
for i in range(len(hat_diff)):
    row = [
        results_fe['b_hat'][i], results_re['b_hat'][4:][i], hat_diff[i]
    ]
    table.append(row)

In [66]:
print(tabulate(
    table, headers=['b_fe', 'b_re', 'b_diff'], floatfmt='.4f'
    ))
print(f'The Hausman test statistic is: {H[0, 0]:.2f}, with p-value: {p_val[0, 0]:.2f}.')

   b_fe     b_re    b_diff
-------  -------  --------
 0.1168   0.1121    0.0048
-0.0043  -0.0041   -0.0002
 0.0453   0.0630   -0.0177
 0.0821   0.1077   -0.0256
The Hausman test statistic is: 195.73, with p-value: 0.00.


You should get a table that looks like this:

|    b_fe |    b_re |   b_diff |
|---------|---------|----------|
|  0.1168 |  0.1121 |   0.0048 |
| -0.0043 | -0.0041 |  -0.0002 |
|  0.0453 |  0.0630 |  -0.0177 |
|  0.0821 |  0.1077 |  -0.0256 |

The Hausman test statistic is: 195.73, with p-value: 0.00.