Alex Kappes <br>
Problem Set 3 <br>
EconS 512

**Problem 1**

The data set used for this problem consists of 72 observations on average expenditures ($Y_i$), the age of individual $i$ ($age_i$), and individual $i$'s income ($inc$).

For $\mathbf{X}_1 = [1, age_i]$ and $\mathbf{X}_2 = [inc_i]\ \forall\ i\ \epsilon\ I$ the estimable model is specified as 

$$Y_i = X_{1i}'\beta_1 + X_{2i}'\beta_2 + u_i.$$

**(a)** OLS estimation produces the following paramter results.

In [1]:
import numpy as np
import pandas as pd
import statsmodels.api as sm

data1 = pd.read_csv('/home/akappes/WSU/512_MetricsII/PS3_1_511PS6data.csv', header=None)
data1 = data1.rename(columns={0: 'avexp',
                              1: 'age',
                              2: 'ho_bin',
                              3: 'inc',
                              4: 'incsq'})

y = data1['avexp']
X1 = sm.add_constant(data1['age'])
X2 = sm.add_constant(data1['inc'])
X = pd.concat([X1, X2.loc[:, X2.columns != 'const']], axis=1)
mod_params = round(sm.OLS(y, X).fit().params, 3)
print(mod_params)

const   -19.440
age      -0.119
inc      83.125
dtype: float64


The estimated parameter for $\hat{\beta}_1$ is -0.119 as shown above.

**(b)** The two-part regression for $\hat{e}_{Y_i} = Y_i - X_{i1}'\hat{\beta}$ and $\hat{e}_{X_{1i}} = X_{1i} - X_{2i}'\hat{\beta}$ to estimate $\hat{e}_{Y_i} = \hat{e}_{X_{1i}}\delta+ v_i$, produces the following $\delta$ estimate.

In [2]:
res_y_x2 = pd.DataFrame(sm.OLS(y, X2).fit().resid)
res_x1_x2 = pd.DataFrame(sm.add_constant(sm.OLS(X1['age'], X2).fit().resid)).rename(columns={0: 'delta'})
resy_resx = round(sm.OLS(res_y_x2, res_x1_x2).fit().params, 3)
print(resy_resx)

const   -0.000
delta   -0.119
dtype: float64


The result is $\hat{\beta}_1 = \hat{\delta} = -0.119$.

**(C)** The model is now specified as $Y_i = \hat{e}_{X_{1i}}\delta + \upsilon_i$. The test for parameter estimate equality is provided below.

In [3]:
def diff(a, b):
    return np.abs(a - b)

t1 = round(sm.OLS(res_y_x2, res_x1_x2).fit().params[1], 3)
t2 = round(sm.OLS(y, res_x1_x2).fit().params[1], 3)

if diff(t1, t2) > 0:
    print('The difference in estimators is', diff(t1,t2))
else:
    print('The estimators are equal')

The estimators are equal


The parameter estimates $\hat{\delta}$ produced above for $\hat{e}_{Y_i} = \hat{e}_{X_{1i}}\delta + v_i$ and $Y_i = \hat{e}_{X_{1i}}\delta + \upsilon_i$ are shown to be the same up to three significant digits. 

Expanding $\hat{e}_{Y_i} = \hat{e}_{X_{1i}}\delta + v_i$ we see that
\begin{equation}
\hat{e}_{Y_i} = \hat{e}_{X_{1i}}\delta + v_i \Longleftrightarrow (Y_i - X_{2i}'\hat{\beta}) = (X_{1i} - X_{2i}'\hat{\beta})\delta + v_i,
\tag{1}
\end{equation}

and expanding $Y_i = \hat{e}_{X_{1i}}\delta + \upsilon_i$ we see that
\begin{equation}
Y_i = \hat{e}_{X_{1i}}\delta + \upsilon_i \Longleftrightarrow Y_i = (X_{1i} - X_{2i}'\hat{\beta})\delta + \upsilon_i.
\tag{2}
\end{equation}

The OLS estimator for equation (1) is 
\begin{equation}
\delta = \left[(X_{1i} - X_{2i}'\hat{\beta})'(X_{1i} - X_{2i}'\hat{\beta})\right]^{-1}(X_{1i} - X_{2i}'\hat{\beta})'(Y_i - X_{2i}'\hat{\beta}),
\tag{3}
\end{equation}

and the OLS estimator for equation (2) is
\begin{equation}
\delta = \left[(X_{1i} - X_{2i}'\hat{\beta})'(X_{1i} - X_{2i}'\hat{\beta})\right]^{-1}(X_{1i} - X_{2i}'\hat{\beta})'Y_i.
\tag{4}
\end{equation}

Concentrating on the post multiplication in equation (3), if we expand out the parenthesis we will arrive at the term

\begin{equation}
(X_{1i}' - (X_{2i}'X_{2i})^{-1}X_{2i}'X_{1i}X_{2i}')Y_i.
\tag{5}
\end{equation}

Now looking at the post multiplication in equation (4), expanding out $\hat{\beta}$ provides

\begin{equation}
(X_{1i} - X_{2i}(X_{2i}'X_{2i})^{-1}X_{2i}'X_{1i})'Y_i.
\tag{6}
\end{equation}

Equations (5) and (6) are equivalent up to a few significant digits, resulting in equations (3) and (4) being equivalent up to the same significant digit standard. 

**Problem 2**

The selected data set includes observations on consumption $(c_i)$, income $(m_i^*)$, and investment $(I_i)$. However, income is not perfectly observable. The true value is assumed to vary by $m_i = m_i^* + u_i$ for $u_i \sim N(0, \sigma_u^2)$. Investment is observable. The specified consumption model takes the simple form $c_i = \beta_0 + \beta_1m_i^* + e_i = \beta_0 + \beta_1m_i + (e_i - \beta_1u_i) = \beta_0 + \beta_1m_i + \phi_i$. It is further assumed that $Cov(I_i, m_i) \neq 0$ and $Cov(I_i, \phi_i) = 0$.

**(a)** From the GMM sample moment conditions it is assumed that $E[m_i\phi_i] = 0$, but the specification above clearly results in $E[m_i\phi_i] \neq 0$. Unobservable income will be instrumented with investment, given that the orthogonality condition $E[I_i\phi_i] = E[\mathbf{Z}'(\mathbf{y} - \mathbf{X}\boldsymbol{\beta})] = 0$ is satisfied for $\mathbf{Z} = [\mathbf{I}]$, $\mathbf{y} = [\mathbf{c}]$, $\mathbf{X} = [\mathbf{m}]$, and $\boldsymbol{\beta} = [\beta_0, \beta_1]$. The orthogonality condition produces the instrumental variable estimator $\boldsymbol{\beta}_{IV} = (\mathbf{Z}'\mathbf{X})^{-1}\mathbf{Z}'\mathbf{y}$.

In [4]:
# Just identified
data2 = pd.read_csv('/home/akappes/WSU/512_MetricsII/PS3_2_525PS10IVdata.csv').rename(columns= {'y': 'm'})

y = np.array(data2['c'])[np.newaxis].T
X = np.array([np.repeat(1, len(data2.index)), data2['m']]).T
Z = np.array([np.repeat(1, len(data2.index)), data2['i']])

b_iv = np.linalg.multi_dot([np.linalg.inv(np.dot(Z, X)), Z, y])
print('The IV estimates are', np.round(b_iv.T, 3))

The IV estimates are [[2.8   0.774]]


**(b)** The two-stage least squares objective is $\underset{\boldsymbol{\beta}}{\text{min}}(\mathbf{y}-\hat{\mathbf{X}}\boldsymbol{\beta})'(\mathbf{y}-\hat{\mathbf{X}}\boldsymbol{\beta})$, where $\hat{\mathbf{X}}=\mathbf{Z}(\mathbf{Z}'\mathbf{Z})^{-1}\mathbf{Z}'\mathbf{X}$.

In [5]:
# 2SLS
y = data2['c']
x_fit = pd.DataFrame(sm.OLS(data2['m'], sm.add_constant(data2['i'])).fit().predict())

stage2_params = np.array([sm.OLS(y, sm.add_constant(x_fit)).fit().params])
print('The 2SLS estimates are', np.round(stage2_params, 3))

The 2SLS estimates are [[2.8   0.774]]


**(c)** Assuming that the true income value is known and switching to a simultaneous equation environment, the system becomes

\begin{equation*}
c_i = \alpha_0 + \alpha_1m_i + e_i \\[7pt]
m_i = \rho_0 + \rho_1c_i + \delta I_i + u_i,
\end{equation*}

with the reduced form coefficient for endogenous income as

\begin{equation*}
\frac{\delta_1^*}{\delta_2^*} = \left(\frac{\alpha_1\delta}{1-\alpha_1\rho_1}\right)\left(\frac{1-\alpha_1\rho_1}{\delta}\right) = \alpha_1.
\end{equation*}

In [6]:
# Simultaneous eqs
y_params = sm.OLS(y, sm.add_constant(data2['m'])).fit().params
m = data2['m']
m_params = sm.OLS(m, sm.add_constant(data2[['c', 'i']])).fit().params

delta1 = (y_params[1] * m_params[2])/(1 - m_params[1] * y_params[1])
delta2 = m_params[2]/(1 - m_params[1] * y_params[1])
alpha = delta1/delta2
print('Endogenous income variable reduced form estimate is', round(alpha, 3))

Endogenous income variable reduced form estimate is 0.775


**(d)** The instrumental variable covariance ratio estimate is specified as

\begin{equation*}
\beta_{IV} = \frac{Cov(\mathbf{Z}, \mathbf{Y})}{Cov(\mathbf{Z}, \mathbf{X})}
\end{equation*}

In [7]:
cov_ratio = np.divide(np.cov(data2['i'], data2['c']), np.cov(data2['i'], data2['m']))
print('The covariance ratio IV estimate is', round(cov_ratio[0, 1], 3))

The covariance ratio IV estimate is 0.774


The results from parts **(a)**-**(d)** show that the same estimate is produced for different instrumental variable estimation strategies, given the just identified environment.