Alex Kappes <br>
Problem Set 8 <br>
EconS 512

**Problem 1** 

Generated data for this problem represents quantity demanded of good $g$ for the $i$th individual, where $i=1,...,100$. Quantity demanded $\mathbf{y}$, own-price $\mathbf{x}_1$, cross-price $\mathbf{x}_2$ for good $j$, and income $\mathbf{I}$ are observed. However, it is assumed there exists an endogenous preference shifting parameter based on social influence $\gamma_i$. A strong and valid instrument $\phi_i$ exists and is characterized as the popularity of good $g$ within individual $i$'s network. The numerical values for this instrument can be thought of as the positive, nuetral, and negative mention-rate for good $g$ within the network.

In [33]:
import numpy as np
import pandas as pd
from scipy import stats


### data generation ###

# real parameter values
b0 = 3      # intercept
b_unobs = 1.5   # unobserved preference effect
b1 = -1.25  # own price effect
b2 = 1.75   # cross price effect
b3 = 2      # income effect

n = 100

# think of the endogenous treatement as some preference shifting parameter
x_unobs = np.concatenate((np.zeros(int(n/2)), stats.norm.rvs(size=int(n/2), loc=1, scale=.1)))

# think of the IV as some mention effect - i.e. positive, neutral, negative network mention
x_t_iv = np.concatenate((stats.norm.rvs(size=int(n/2), loc=0, scale=0.2),
                        stats.norm.rvs(size=int(n/2), loc=1, scale=0.2)))

# binary treatment specification
x_t = np.concatenate((np.zeros(int(n/2)), np.ones(int(n/2))))

x1 = stats.norm.rvs(size=n, loc=10, scale =0.5) # own price
x2 = stats.norm.rvs(size=n, loc=12, scale=0.5)  # cross price
x3 = stats.norm.rvs(size=n, loc=500, scale=10)  # income

y = b0 + b1*x1 + b2*x2 + b3*x3 + (b_unobs * x_unobs + stats.norm.rvs(n)) # quantity demanded
y = pd.DataFrame({'y': y})

X_iv = pd.DataFrame({'ones': np.ones(n),
                     'iv': x_t_iv,
                     'x1': x1,
                     'x2': x2,
                     'x3': x3})

X_endog = pd.DataFrame({'ones': np.ones(n),
                        'endog': x_unobs,
                        'x1': x1,
                        'x2': x2,
                        'x3': x3})

def est_b(y, X):
    X = np.matrix(X)
    y = np.matrix(y).reshape(n, 1)
    return np.linalg.inv(X.T * X) * X.T * y

def ehat(y, X):
    X = np.matrix(X)
    y = np.matrix(y).reshape(n, 1)
    e = y - X * est_b(y, X)
    return pd.DataFrame(e).rename(columns={0: 'e'})

def pred(y, X):
    X = np.matrix(X)
    y = np.matrix(y).reshape(n, 1)
    predict = X * est_b(y, X)
    return pd.DataFrame(predict).rename(columns={0:'predict'})

# bias estimation
iv_tilde = ehat(X_iv['iv'], X_iv[['ones', 'x1', 'x2', 'x3']]).rename(columns={'e': 'val'})

lam = (np.cov(iv_tilde['val'], y['y']) / np.var(iv_tilde['val']))[0, 1]

iv_hat = pred(X_iv['iv'], X_iv[['ones', 'x1', 'x2', 'x3']]).rename(columns={'predict': 'val'})
y_hat = pred(y, X_iv[['ones', 'x1', 'x2', 'x3']]).rename(columns={'predict': 'val'})

bias = ((np.var(X_iv['iv']) / (lam * np.var(iv_tilde['val']))) * np.var(ehat(y, X_endog))[0] *
        np.cov(iv_hat['val'], y_hat['val']) / np.var(y_hat['val']))[0, 1]

**(a)**

Bias is specified as plim$(\hat{\delta} - \delta) = \frac{Cov(\overset{\sim}{\boldsymbol{\phi}}, \boldsymbol{\varepsilon})}{\lambda var(\overset{\sim}{\boldsymbol{\phi}})}$, where
\begin{align*}
\overset{\sim}{\boldsymbol{\phi}} &= \boldsymbol{\phi} - \mathbf{X}_{IV}\hat{\boldsymbol{\beta}} \\[7pt]
\boldsymbol{\varepsilon} &= \mathbf{y} - \mathbf{X}_{endog}\hat{\boldsymbol{\beta}} \\[7pt]
\lambda &= \frac{Cov(\boldsymbol{\phi}, \mathbf{y})}{var(\boldsymbol{\phi})}.
\end{align*}

$\mathbf{X}_{IV}$ and $\mathbf{X}_{endog}$ represent the instrumental variable model matrix and the endogenous model matrix, respectively.

Extending the bias estimation from $\frac{Cov(\overset{\sim}{\boldsymbol{\phi}}, \boldsymbol{\varepsilon})}{\lambda var(\overset{\sim}{\boldsymbol{\phi}})}$, we can alternatively specify
\begin{equation*}
\frac{Cov(\overset{\sim}{\boldsymbol{\phi}}, \boldsymbol{\varepsilon})}{var(\overset{\sim}{\boldsymbol{\phi}})} = \frac{Cov(\hat{\boldsymbol{\phi}}, \hat{\mathbf{y}}_{-\gamma})}{var(\hat{\mathbf{y}}_{-\gamma})},
\end{equation*}

where $\hat{\boldsymbol{\phi}} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\boldsymbol{\phi}$ and $\hat{\mathbf{y}}_{-\gamma} = \mathbf{X}(\mathbf{X}'\mathbf{X})^{-1}\mathbf{X}'\mathbf{y}$, with $\mathbf{X} = [\mathbf{x}_1, \mathbf{x}_2, \mathbf{x}_3]$. Extending the notation further, we find that the bias can be represented as

\begin{equation*}
\text{plim}(\hat{\delta} - \delta) = \frac{Cov(\overset{\sim}{\boldsymbol{\phi}}, \boldsymbol{\varepsilon})}{\lambda var(\overset{\sim}{\boldsymbol{\phi}})} = \frac{var(\boldsymbol{\phi})}{\lambda var(\overset{\sim}{\boldsymbol{\phi}})}var(\boldsymbol{\varepsilon})\left[\frac{Cov(\hat{\boldsymbol{\phi}}, \hat{\mathbf{y}}_{-\gamma})}{var(\hat{\mathbf{y}}_{-\gamma})}\right].
\end{equation*}

The change in bias between using a binary treatment IV and a continuous IV is shown by the use of fitted values $\hat{\boldsymbol{\phi}}$ and $\hat{\mathbf{y}}_{-\gamma}$ in the $Cov(\cdot)$ calculation instead of the conditional binary treatment expectations $\text{E}[\mathbf{X}\boldsymbol{\beta}\ \vert\ \phi_i = 1] - \text{E}[\mathbf{X}\boldsymbol{\beta}\ \vert\ \phi_i = 0]$.

**(b)** IV estimates are produced below for the specification $y_i = \beta_0 + \alpha \phi_i + \beta_1 x_{1i} + \beta_2 x_{2i} + \beta_3 x_{3i} + \varepsilon_i$

In [34]:
print('The IV estimates are:', np.round(est_b(y, X_iv), 3).tolist())

The IV estimates are: [[103.805], [1.238], [-1.222], [1.741], [1.995]]


**(c)** The estimated bias for the endogenous parameter estimate is shown below.

In [35]:
print('The estimated bias is', bias)

The estimated bias is 1.0239752087392823e-25


Below, the estimated bias and endogenous parameter estimate $\hat{\delta}$ are shown to approximately equate the real parameter value of $\delta = 1.5$.

In [36]:
est_b(y, X_endog)[1, 0] + bias

1.4999999999983302