### Problem Set 3

Erick Ore Matos

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
from numpy.linalg import inv

In [3]:
df = pd.read_csv("../data/firms.csv")

In [4]:
df = df.set_index(['firm','year'])

In [5]:
df['const'] = 1

I define a regression function to report the point estimates.

In [6]:
def get_regression(df, endog, exog):
    
    Y = df[endog].to_numpy()
    
    X = df[exog].to_numpy()
    
    n, k = X.shape

    b = inv(X.transpose() @ X) @ (X.transpose() @ Y)

    return dict(zip(exog, b))

##### (a)

We are running an OLS:

$$y_{i,t} = \alpha + \beta x_{i,t} + v_i + e_{i,t}$$

The estimator will be biased if $E(v_i|x_{i,t}) \ne 0$:

$$\hat \beta_{OLS} \to \beta + \frac{Cov(v_i, x_{i,t})}{Var(x_{i,t})}$$

In [7]:
b = get_regression(df, ['logsalespercap'], ['logemployment', 'const'])

In [8]:
b

{'logemployment': array([-0.02765847]), 'const': array([-6.71417417e-07])}

##### (b)

We are running a FE model:

$$y_{i,t} - \bar y_i = \beta (x_{i,t} - \bar x_i) + e_{i,t} - \bar e_i$$

The estimator will be unbiased as long as $E(e_{i,t} - \bar e_i|x_{i,t}) \ne 0$:

$$\hat \beta_{FE} \to \beta $$

In this estimation, we are cleaning a potential bias for omitted variables.

Considering the actual setup, we are trying to get the relationship between the sales per unit of capital stock and the employment. we are omitting a crucial variable that is fixed by the firm: productivity. If we use the OVB formula:

$$\hat \beta_{OLS} \to \beta + \frac{Cov(v_i, x_{i,t})}{Var(x_{i,t})} = \beta + \frac{Cov(productivity_{i}, labor_{i,t})}{Var(labor_{i,t})}$$

If more productive firms require less labor, which seems reasonable, the estimation we obtained in (a) is biased downwards.

However, we obtain a positive relation if we control the firm's overall productivity by running the model with fixed effects. This might seem to be evidence of labor as a complement of capital, as it increases the mean productivity of capital (sales per unit of capital), once we control the effect by the overall productivity of the firm.



Demeaning variables

In [9]:
df_means = df.groupby(['firm']).mean()

In [10]:
df_demeaned = df - df_means

In [11]:
b_fe = get_regression(df_demeaned, ['logsalespercap'], ['logemployment'])

In [12]:
b_fe

{'logemployment': array([0.11743465])}

##### (c)

I expect the coefficient to be negative, as in the OLS case, because when running the regression using the means, we are not cleaning up the fixed effect, so the estimator for beta could be biased by the effect of unobserved components.

We are running a regression on averages:

$$\bar y_{i} = \alpha + \beta \bar x_{i} + v_i + \bar e_{i}$$

The estimator will be biased if $E(v_i|\bar x_{i}) \ne 0$:

$$\hat \beta_{means} \to \beta + \frac{Cov(v_i, \bar x_{i})}{Var(\bar x_{i})}$$

In [13]:
b_means = get_regression(df_means, ['logsalespercap'], ['logemployment', 'const'])

In [14]:
b_means

{'logemployment': array([-0.03037954]), 'const': array([-6.72869035e-07])}

##### Checking results

In [15]:
from statsmodels.api import OLS

In [16]:
results = OLS(df[['logsalespercap']], df[['logemployment', 'const']], hasconst=True).fit()

In [17]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:         logsalespercap   R-squared:                       0.004
Model:                            OLS   Adj. R-squared:                  0.004
Method:                 Least Squares   F-statistic:                     22.22
Date:                Sun, 04 Feb 2024   Prob (F-statistic):           2.48e-06
Time:                        23:50:32   Log-Likelihood:                -4450.6
No. Observations:                5733   AIC:                             8905.
Df Residuals:                    5731   BIC:                             8919.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
logemployment    -0.0277      0.006     -4.714

In [18]:
results = OLS(df_demeaned[['logsalespercap']], df_demeaned[['logemployment']], hasconst=False).fit()

In [19]:
print(results.summary())

                                 OLS Regression Results                                
Dep. Variable:         logsalespercap   R-squared (uncentered):                   0.010
Model:                            OLS   Adj. R-squared (uncentered):              0.010
Method:                 Least Squares   F-statistic:                              60.78
Date:                Sun, 04 Feb 2024   Prob (F-statistic):                    7.54e-15
Time:                        23:50:32   Log-Likelihood:                          1594.5
No. Observations:                5733   AIC:                                     -3187.
Df Residuals:                    5732   BIC:                                     -3180.
Df Model:                           1                                                  
Covariance Type:            nonrobust                                                  
                    coef    std err          t      P>|t|      [0.025      0.975]
--------------------------------------

In [20]:
results = OLS(df_means[['logsalespercap']], df_means[['logemployment', 'const']], hasconst=True).fit()

In [21]:
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:         logsalespercap   R-squared:                       0.005
Model:                            OLS   Adj. R-squared:                  0.003
Method:                 Least Squares   F-statistic:                     2.300
Date:                Sun, 04 Feb 2024   Prob (F-statistic):              0.130
Time:                        23:50:32   Log-Likelihood:                -313.32
No. Observations:                 441   AIC:                             630.6
Df Residuals:                     439   BIC:                             638.8
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
logemployment    -0.0304      0.020     -1.516