### Problem Set 1

Erick Ore Matos

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
from numpy.linalg import inv

In [3]:
df = pd.read_csv("../data/nls.csv")

In [4]:
df['const'] = 1

In [6]:
Y = df[['luwe']].to_numpy()

In [7]:
X = df[['educ', 'exper', 'const']].to_numpy()

In [8]:
n, k = X.shape

##### Matrix version

The model we try ton estimate is:

$$y_i = x_i'\beta + \epsilon_i $$

We can group the values $y_i$ in the matriox $Y$, and the vectors $x_i'$ in the matrix $X$.

We obtain the OLS estimates from $$\hat \beta = (X'X)^{-1} (X'Y)$$

In [9]:
b = inv(X.transpose() @ X) @ (X.transpose() @ Y)

In [10]:
print(f"Coef educ:{b[0]} ")
print(f"Coef exper:{b[1]} ")
print(f"Coef const:{b[2]} ")

Coef educ:[0.09111207] 
Coef exper:[0.02352861] 
Coef const:[4.39731821] 


We can define the annihilator matrix to obtain the errors:
$$M_X = I - X(X'X)^{-1}X'$$


In [11]:
M_X = (np.eye(n) - X @inv(X.transpose() @ X) @ X.transpose())

In [12]:
e = M_X @ Y

Under homokedasticity, we can estimate the variance of $\epsilon$ using:

$$\hat Var(\epsilon) = e'e/(n-k)$$

In [13]:
var_e = e.transpose() @ e / (n-k)

The variance of the estimator $\hat \beta$ will be obtained from:

$$\hat Var(\hat \beta) = \hat \sigma^2_{\epsilon} (X'X)^{-1}$$

In [14]:
var_beta = inv(X.transpose() @ X) * var_e

#@ var_e[0]

In [15]:
print(f"Var Coef educ:{var_beta[0,0]} ")
print(f"Var Coef exper:{var_beta[1,1]} ")
print(f"Var Coef const:{var_beta[2,2]} ")

Var Coef educ:5.741226662029314e-05 
Var Coef exper:1.8944471734185264e-05 
Var Coef const:0.021208483475264644 


In [16]:
print(f"SE Coef educ:{var_beta[0,0]**0.5} ")
print(f"SE Coef exper:{var_beta[1,1]**0.5} ")
print(f"SE Coef const:{var_beta[2,2]**0.5} ")

SE Coef educ:0.007577088267949183 
SE Coef exper:0.0043525247540002875 
SE Coef const:0.145631327245427 


In [17]:
var_beta

array([[ 5.74122666e-05,  1.93094365e-05, -1.03656778e-03],
       [ 1.93094365e-05,  1.89444717e-05, -5.18253053e-04],
       [-1.03656778e-03, -5.18253053e-04,  2.12084835e-02]])

##### The robust version of the variance is:

If we don't assume homoscedasticity and only uncorrelated observation, we can use the White/Huberman variance:

$$Var(\hat \beta) = (X'X)^{-1} (\sum_i X_i X_i' \hat e_i^2)  (X'X)^{-1}$$

In [18]:
var_b_robust = (inv(np.transpose(X) @ X)
                @ np.transpose(X) @ (np.diag(e @ np.transpose(e) ) * np.eye(n)) @ X
                @ inv(np.transpose(X) @ X))

In [19]:
print(f"Robust Var Coef educ:{var_b_robust[0,0]} ")
print(f"Robust Var Coef exper:{var_b_robust[1,1]} ")
print(f"Robust Var Coef const:{var_b_robust[2,2]} ")

Robust Var Coef educ:5.93737574752357e-05 
Robust Var Coef exper:1.9104888911235413e-05 
Robust Var Coef const:0.021814704739189415 


In [20]:
print(f"Robust SE Coef educ:{var_b_robust[0,0]**0.5} ")
print(f"Robust SE Coef exper:{var_b_robust[1,1]**0.5} ")
print(f"Robust SE Coef const:{var_b_robust[2,2]**0.5} ")

Robust SE Coef educ:0.007705436877636186 
Robust SE Coef exper:0.004370913967494145 
Robust SE Coef const:0.1476980187381991 


##### Checking results

In [33]:
from statsmodels.api import OLS

In [40]:
results = OLS(Y, X, hasconst=True).fit()

In [41]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.138
Model:                            OLS   Adj. R-squared:                  0.136
Method:                 Least Squares   F-statistic:                     74.33
Date:                Sun, 14 Jan 2024   Prob (F-statistic):           1.15e-30
Time:                        01:45:01   Log-Likelihood:                -492.17
No. Observations:                 929   AIC:                             990.3
Df Residuals:                     926   BIC:                             1005.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0911      0.008     12.025      0.0

In [42]:
results = OLS(Y, X, hasconst=True).fit(cov_type="HC0")

In [43]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.138
Model:                            OLS   Adj. R-squared:                  0.136
Method:                 Least Squares   F-statistic:                     72.03
Date:                Sun, 14 Jan 2024   Prob (F-statistic):           8.38e-30
Time:                        01:45:01   Log-Likelihood:                -492.17
No. Observations:                 929   AIC:                             990.3
Df Residuals:                     926   BIC:                             1005.
Df Model:                           2                                         
Covariance Type:                  HC0                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.0911      0.008     11.824      0.0