### Problem Set 2

Erick Ore Matos

In [3]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [4]:
from numpy.linalg import inv

In [5]:
df = pd.read_csv("../data/nls.csv")

In [6]:
df['const'] = 1

In [7]:
def get_regression(df, endog, exog):
    
    Y = df[endog].to_numpy()
    
    X = df[exog].to_numpy()
    
    n, k = X.shape

    b = inv(X.transpose() @ X) @ (X.transpose() @ Y)

    return dict(zip(exog, b))

##### (a)

In [8]:
b_omit = get_regression(df, ['luwe'], ['exper', 'const'])
b = get_regression(df, ['luwe'], ['exper', 'educ', 'const'])


We estimated the following model:

$$luwe = \alpha + \beta \times exper + \rho \times educ + \epsilon$$

In which $E(\epsilon|exper, educ) \ne 0$

The estimated omitted variable model is:

$$luwe = \alpha + \beta_o \times exper + \tilde \epsilon$$

In this second specification $E(\tilde \epsilon|exper) \ne 0$

The difference between these two estimations is explained by the omitted variable formula:

$$\hat \beta_o \to \beta + \rho \frac{Cov(exper, educ)}{Var(exper)}$$

We obtained the three terms required to get the bias:

##### Var(exp)

In [9]:
var = df[['exper', 'educ']].cov()['exper']['exper']
var

14.667985134182105

##### Cov(exp, educ)

In [10]:
cov = df[['exper', 'educ']].cov()['exper']['educ']
cov

-4.933275490887497

##### \rho

In [11]:
rho = b['educ']
rho

array([0.09111207])

In [12]:
bias = cov*rho/var
bias

array([-0.03064367])

Three components explain the difference between $\hat \beta_o$ and the true parameter:



In [13]:
dif = b_omit['exper'] - b['exper']
dif

array([-0.03064367])

##### (b)

In [14]:
Y = df[['luwe']].to_numpy()

In [15]:
X = df[['educ', 'exper', 'const']].to_numpy()

In [16]:
n, k = X.shape

##### Matrix version

The model we try to estimate is:

$$y_i = x_i'\beta + \epsilon_i $$

We can group the values $y_i$ in the matriox $Y$, and the vectors $x_i'$ in the matrix $X$.

We obtain the OLS estimates from $$\hat \beta = (X'X)^{-1} (X'Y)$$

In [17]:
b = inv(X.transpose() @ X) @ (X.transpose() @ Y)

In [18]:
print(f"Coef educ:{b[0]} ")
print(f"Coef exper:{b[1]} ")
print(f"Coef const:{b[2]} ")

Coef educ:[0.09111207] 
Coef exper:[0.02352861] 
Coef const:[4.39731821] 


We can define the annihilator matrix to obtain the errors:
$$M_X = I - X(X'X)^{-1}X'$$


In [19]:
M_X = (np.eye(n) - X @inv(X.transpose() @ X) @ X.transpose())

In [20]:
e = M_X @ Y

Under homokedasticity, we can estimate the variance of $\epsilon$ using:

$$\hat Var(\epsilon) = e'e/n$$

We are using the biased version, but asymptotically doesn't matter.

In [21]:
var_e = e.transpose() @ e / n

The variance of the estimator $\hat \beta$ will be obtained from:

$$\hat Var(\hat \beta) = \hat \sigma^2_{\epsilon} (X'X)^{-1}$$

We will use the definition of the problems, we get:

$$\hat H = (1/n)(\sum x_i x_i') = (1/n)X'X$$

In [22]:
H = (1/n) * X.transpose() @ X

Using this definition

$$Var(\hat \beta) = (1/n) \times H^{-1} \sigma^2$$

In [24]:
var_beta = (1/n) * inv(H) * var_e

In [25]:
print(f"Var Coef educ:{var_beta[0,0]} ")
print(f"Var Coef exper:{var_beta[1,1]} ")
print(f"Var Coef const:{var_beta[2,2]} ")

Var Coef educ:5.722686640515582e-05 
Var Coef exper:1.8883294753342663e-05 
Var Coef const:0.02113999536931594 


In [26]:
print(f"SE Coef educ:{var_beta[0,0]**0.5} ")
print(f"SE Coef exper:{var_beta[1,1]**0.5} ")
print(f"SE Coef const:{var_beta[2,2]**0.5} ")

SE Coef educ:0.0075648441097722445 
SE Coef exper:0.004345491313228305 
SE Coef const:0.14539599502502104 


##### (c)

In [27]:
var_beta

array([[ 5.72268664e-05,  1.92470810e-05, -1.03322041e-03],
       [ 1.92470810e-05,  1.88832948e-05, -5.16579469e-04],
       [-1.03322041e-03, -5.16579469e-04,  2.11399954e-02]])

##### The robust version of the variance is:

$$Var(\hat \beta) = (X'X)^{-1} (\sum_i X_i X_i' \hat e_i^2)  (X'X)^{-1}$$
$$\hat{Var(\hat \beta)} = (1/n)\hat H^{-1} \hat J \hat H^{-1}$$
$$\hat H = (1/n)(\sum x_i x_i') = (1/n)X'X$$
$$\hat J = (1/n)(\sum \hat e_i^2x_i x_i') = (1/n)X'diag(\hat e \hat e')X$$

In [28]:
J = (1/n) * np.transpose(X) @ (np.diag(e @ np.transpose(e) ) * np.eye(n)) @ X

In [30]:
var_b_robust_ = (inv(np.transpose(X) @ X)
                @ np.transpose(X) @ (np.diag(e @ np.transpose(e) ) * np.eye(n)) @ X
                @ inv(np.transpose(X) @ X))

In [31]:
var_b_robust = (1/n) * (inv(H)
                @ J
                @ inv(H))

In [33]:
print(f"Robust Var Coef educ:{var_b_robust[0,0]} ")
print(f"Robust Var Coef exper:{var_b_robust[1,1]} ")
print(f"Robust Var Coef const:{var_b_robust[2,2]} ")

Robust Var Coef educ:5.9373757475233656e-05 
Robust Var Coef exper:1.9104888911235196e-05 
Robust Var Coef const:0.02181470473918859 


In [34]:
print(f"Robust SE Coef educ:{var_b_robust[0,0]**0.5} ")
print(f"Robust SE Coef exper:{var_b_robust[1,1]**0.5} ")
print(f"Robust SE Coef const:{var_b_robust[2,2]**0.5} ")

Robust SE Coef educ:0.007705436877636054 
Robust SE Coef exper:0.004370913967494121 
Robust SE Coef const:0.14769801873819632 


##### Checking results

In [33]:
from statsmodels.api import OLS

In [40]:
results = OLS(Y, X, hasconst=True).fit()

In [41]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.138
Model:                            OLS   Adj. R-squared:                  0.136
Method:                 Least Squares   F-statistic:                     74.33
Date:                Sun, 14 Jan 2024   Prob (F-statistic):           1.15e-30
Time:                        01:45:01   Log-Likelihood:                -492.17
No. Observations:                 929   AIC:                             990.3
Df Residuals:                     926   BIC:                             1005.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0911      0.008     12.025      0.0

In [42]:
results = OLS(Y, X, hasconst=True).fit(cov_type="HC0")

In [43]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.138
Model:                            OLS   Adj. R-squared:                  0.136
Method:                 Least Squares   F-statistic:                     72.03
Date:                Sun, 14 Jan 2024   Prob (F-statistic):           8.38e-30
Time:                        01:45:01   Log-Likelihood:                -492.17
No. Observations:                 929   AIC:                             990.3
Df Residuals:                     926   BIC:                             1005.
Df Model:                           2                                         
Covariance Type:                  HC0                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0911      0.008     11.824      0.0