Alex Kappes <br>
Problem Set 5 <br>
EconS 512

**Problem 1**. Pear consumption across time $t$ for three individuals $i$ comprises the data set selected for this problem. Included observations consist of pear consumption ($Q_{it}$), pear and apple prices ($pear\_p_t$, $apple\_p_t$), income ($income_{it}$), the consumer price index ($cpi_t$), and the month ($month_t$). Time $t$ is measured by month over the year 2001. Seasonal effects for preliminary estimation results are measured such that

\begin{equation*}
bin\_season = \left\{
\begin{array}{l l}
1 & \text{if month} \in \{9, 10 , 11, 12, 1\} \\
0 & \text{otherwise}
\end{array}
\right..
\end{equation*}

$\forall\ i \in\ I,\ \text{and}\ t \in\ T$ let $\mathbf{y} = [\mathbf{Q}]$ and $\mathbf{X} = [\mathbf{1}, \mathbf{pear\_p}, \mathbf{apple\_p}, \mathbf{income}, \mathbf{bin\_season}]$, which implies the specification $\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \xi_{it}$ for estimable parameters $\boldsymbol{\beta}$.

**(a)** Results for $\hat{\boldsymbol{\beta}}$ are presented below.

In [68]:
import numpy as np
import pandas as pd

df = pd.read_csv('/home/akappes/WSU/512_MetricsII/fruit_panel.csv')

# Data mgmt
bin_season = pd.DataFrame({'bin_season': 0}, index=df.index)

for i in range(len(bin_season)):
    if df.loc[i, 'month'] == 9:
        bin_season.loc[i, 'bin_season'] = 1
    elif df.loc[i, 'month'] == 10:
        bin_season.loc[i, 'bin_season'] = 1
    elif df.loc[i, 'month'] == 11:
        bin_season.loc[i, 'bin_season'] = 1
    elif df.loc[i, 'month'] == 12:
        bin_season.loc[i, 'bin_season'] = 1
    elif df.loc[i, 'month'] == 1:
        bin_season.loc[i, 'bin_season'] = 1
    else:
        bin_season.loc[i, 'bin_season'] = 0

reals = pd.DataFrame({'pear_rp': df['pear_p'] / df['cpi'],
                      'apple_rp': df['apple_p'] / df['cpi'],
                      'r_income': df['income'] / df['cpi']})

df = pd.concat([df['Q'], reals, bin_season], axis=1)

# Parameter estimation and White's HC estimation
n = len(df.index)
y = np.array(df['Q']).reshape(n, 1)
X = np.concatenate((np.repeat(1, len(df)).reshape(n, 1), np.array(df.loc[:, df.columns != 'Q'])), axis=1)
k = X.shape[1]

b = np.linalg.multi_dot([np.linalg.inv(np.dot(X.T, X)), X.T, y])

e_hat = y - np.dot(X, b)
e_sq = np.power(e_hat, 2)

w = np.zeros((k, k))
i = 1

while i < n:

    w = w + e_sq[i] * np.dot(X[i, :].T, X[i, :])
    i = i + 1

    if i > n:

        break

w_hce = np.linalg.multi_dot([np.linalg.inv(np.dot(X.T, X)),
                             w,
                             np.linalg.inv(np.dot(X.T, X))])
w_se = np.sqrt(np.diag(w_hce))

print('Parameters are estimated as', np.round(b[1:5].T, 3))

Parameters are estimated as [[-33.644  36.786   0.218  -8.404]]


**(b)** Parameter estimates for $(y_{it} - \bar{y}) = (X_{it} - \bar{X})\beta_l + \xi_{it}$ are presented below. 

In [69]:
y_diff = y - y.mean()
X_diff = np.array([X[:, 0], X[:, 1] - X[:, 1].mean(), X[:, 2] - X[:, 2].mean(), X[:, 3] - X[:, 3].mean(),
                   X[:, 4]]).T

b_diff = np.linalg.multi_dot([np.linalg.inv(np.dot(X_diff.T, X_diff)), X_diff.T, y_diff])

print('Parameter estimates for the above specification are', np.round(b_diff[1:5].T, 3))

Parameter estimates for the above specification are [[-33.644  36.786   0.218  -8.404]]


Estimation results show no change in $\hat{\boldsymbol\beta}_{-0}$.

**(c)** Parameter estimates for the specification $\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\gamma}\boldsymbol{\alpha} + \boldsymbol{\xi}$, where $\boldsymbol{\gamma}$ represents an $n\times(i-1)$ matrix of indicator variables for $i=\{1, 2, 3\}$, are presented below.

In [70]:
individuals = pd.DataFrame({'inds': np.resize([1, 2, 3], n)}, index=df.index)
bin_i_1 = pd.DataFrame()
bin_i_2 = pd.DataFrame()

for i in range(len(individuals)):

    if individuals.loc[i, 'inds'] == 1:

        bin_i_1.loc[i, 'bin_i_1'] = 1

    else:

        bin_i_1.loc[i, 'bin_i_1'] = 0

for i in range(len(individuals)):

    if individuals.loc[i, 'inds'] == 2:

        bin_i_2.loc[i, 'bin_i_2'] = 1

    else:

        bin_i_2.loc[i, 'bin_i_2'] = 0

X_bin_i = np.concatenate((X, bin_i_1, bin_i_2), axis=1)
b_bin_i = np.linalg.multi_dot([np.linalg.inv(np.dot(X_bin_i.T, X_bin_i)), X_bin_i.T, y])

print('Parameter estimates for the above specification are', np.round(b_bin_i[1:5].T, 3))

Parameter estimates for the above specification are [[-33.484  36.876   0.219  -8.406]]


The estimation results for parts **(a)-(c)** show that different fixed effect methodologies can produce the same parameter estimates.

**Problem 2**. See problem set for variable information and different model specifications.

**(a)**

In [71]:
df = pd.read_csv('/home/akappes/WSU/512_MetricsII/jtrain1.csv')
dep_l = list(df['hrsemp'].dropna().index)
df = df.loc[dep_l, :]
df_sub = df.loc[df['year'].isin([1987, 1988])].reset_index()

y_1988 = pd.DataFrame({'y_1988': np.where(df_sub['year'] == 1988, 1, 0)})

**(a).i**

In [72]:
cont_bef_mean = df_sub[(df_sub['year'] == 1987) & (df_sub['grant'] == 0)]['hrsemp'].mean()
cont_af_mean = df_sub[(df_sub['year'] == 1988) & (df_sub['grant'] == 0)]['hrsemp'].mean()
treat_bef_mean = df_sub[(df_sub['year'] == 1987) & (df_sub['grant'] == 1)]['hrsemp'].mean()
treat_af_mean = df_sub[(df_sub['year'] == 1988) & (df_sub['grant'] == 1)]['hrsemp'].mean()

print('Control, before mean:', cont_bef_mean, '\n' 'Control, after mean:', cont_af_mean,
      '\n' 'Treatment, before mean:', treat_bef_mean, '\n' 'Treatment, before mean:', treat_af_mean)

treat_bef_mean = 0

Control, before mean: 8.886857601792313 
Control, after mean: 9.671083345252557 
Treatment, before mean: nan 
Treatment, before mean: 35.97834186015592


There is no data for $(hrsemp\ \rvert\ grant = 1, year = 1987)$. The before treatment mean for $hrsemp$ is effectively 0 becuase in 1987 there were no grants. The difference in difference estimate $\hat{\alpha}$ follows. The $hrsemp$ values for those firms receiving grants in 1988 were found when no grants were offered in 1987.

In [73]:
con_list = df_sub[(df_sub['year'] == 1988) & (df_sub['grant'] == 1)]['hrsemp'].index
join87 = df_sub.loc[df_sub['year'] == 1987, ['fcode', 'hrsemp']]
join88 = pd.DataFrame(df_sub.loc[con_list, 'fcode'])
treat_bef_dat = pd.merge(join88, join87, how='left', on='fcode')
treat_bef_mean = treat_bef_dat['hrsemp'].mean()

alpha = (treat_af_mean - treat_bef_mean) - (cont_af_mean - cont_bef_mean)

print('The diff-in-diff estimator, alpha is:', round(alpha, 3))

The diff-in-diff estimator, alpha is: 27.603


**(a).ii**

In [74]:
n = len(df_sub)
y = np.array(df_sub['hrsemp']).reshape(n, 1)

E_i = pd.DataFrame({'E_i': 0}, index=df_sub.index)
E_dat = pd.concat([df_sub[['year', 'fcode', 'grant']], E_i], axis=1)

E_sub88 = pd.DataFrame(E_dat[E_dat['grant'] == 1]['fcode'])
E_sub87 = pd.DataFrame(E_dat[E_dat['year'] == 1987]).reset_index()
l_idx = pd.merge(E_sub88, E_sub87, how='left', on='fcode')['index'].tolist()
E_dat.loc[l_idx, 'E_i'] = 1

X = np.concatenate((np.ones(n).reshape(n, 1), np.array(df_sub['grant']).reshape(n, 1),
                    np.array(y_1988['y_1988']).reshape(n, 1), np.array(E_dat['E_i']).reshape(n, 1)),
                   axis=1)

b = np.linalg.multi_dot([np.linalg.inv(np.dot(X.T, X)), X.T, y])

print('Parameter estimates the specified equation are', np.round(b.T, 3))

Parameter estimates the specified equation are [[ 9.297 26.307  0.374 -1.706]]


The estimated treatment effect is shown as:

In [75]:
print(b[1])

[26.30725851]


**(a).iii**

In [76]:
import statsmodels.formula.api as smf

X_iii = pd.concat([df_sub[['fcode', 'grant']], y_1988], axis=1)
X_iii['fcode'] = pd.Categorical(X_iii['fcode'])
y = pd.DataFrame(y)

mod_dat = pd.concat([y, X_iii], axis=1)
mod_dat = mod_dat.rename(columns = {0:'y'})

mod_form = 'y ~ fcode + grant + y_1988'
mod_params = smf.ols(mod_form, data=mod_dat).fit().params

print('Estimated parameters are:' '\n', np.array(round(mod_params, 3)))

Estimated parameters are:
 [  7.272   4.473  21.223  -7.527  -7.527  -7.527  -6.642  23.907  11.885
  -7.527  -7.227  15.384  -7.527  -5.977  -7.527  -6.902  -7.165  -7.527
  -7.527  -7.527  -6.304  -7.527  -5.256  -7.527  -3.161  -7.527  -7.527
  17.473  -6.052  31.299  -6.769  -7.527   3.473  24.723  -7.527  -4.848
  -7.527   4.473  -5.622  -7.521  -6.508  38.347  -5.115  20.448  -6.098
 -11.992 -13.466 -17.166 -14.466  88.937 -18.389  -2.466 -12.266 -20.207
  -1.527  -9.466   6.034 -20.543 -11.988  -5.272   2.096  -7.272  22.728
 -14.456 -19.909  -1.466  25.034  88.951  -7.403 -12.466  -1.313   5.034
  18.534   2.473  -1.466  32.473  -8.466  -6.072  -1.966 -16.466  -7.527
 -18.466  -1.181  -7.527  -7.527  32.417  -3.291  -0.894  -7.527  -1.384
  19.696  -7.527  15.882  -7.527  -7.244  -5.027  -7.527  -7.39    9.973
  37.473  -7.527  -3.753  -7.527  -3.677  -7.527  54.973  -6.815  19.178
  -7.527  -7.527  -5.384  -7.527  -6.443  63.307   6.346  10.807  20.678
  -7.527   5.236  -7.527

Ignoring the firm fixed effect estimation parameters, the above results show that the parameters are approximately the same.

In [77]:
print('Estimated parameters are:' '\n', np.array(round(mod_params, 3))[-2:])

Estimated parameters are:
 [27.878  0.509]
