Benjamin Bierlein 7641866 Econ 125 Spring 2025 HW 6

1)

In [None]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf

# Load and prepare data
mroz_df = pd.read_csv("/content/mroz.csv")
mroz_df.rename(columns={
    'educ': 'EDU', 'exper': 'EXPER', 'mothereduc': 'MEDU',
    'fathereduc': 'FEDU', 'wage': 'WAGE'
}, inplace=True)

# Filter for valid log(wage)
mroz_valid = mroz_df[mroz_df['WAGE'] > 0].copy()
mroz_valid['EXPER2'] = mroz_valid['EXPER'] ** 2

# Part 1: OLS
ols_model = smf.ols('np.log(WAGE) ~ EXPER + EXPER2 + EDU', data=mroz_valid).fit()
print("OLS Results:\n", ols_model.summary())

# Part 2: First-stage regression
first_stage = smf.ols('EDU ~ EXPER + EXPER2 + MEDU + FEDU', data=mroz_valid).fit()
mroz_valid['EDU_hat'] = first_stage.fittedvalues

# Second-stage regression (TSLS)
second_stage = smf.ols('np.log(WAGE) ~ EXPER + EXPER2 + EDU_hat', data=mroz_valid).fit()
print("TSLS Results:\n", second_stage.summary())

# Part 3: Manual TSLS standard errors
X = sm.add_constant(mroz_valid[['EXPER', 'EXPER2', 'EDU']])
y = np.log(mroz_valid['WAGE'])
residuals = y - sm.OLS(y, X).fit().predict(X)
sigma_sq = (residuals @ residuals) / (X.shape[0] - X.shape[1])
X_hat = sm.add_constant(mroz_valid[['EXPER', 'EXPER2', 'EDU_hat']])
var_beta_hat = sigma_sq * np.linalg.inv(X_hat.T @ X_hat)
manual_tsls_se = np.sqrt(np.diag(var_beta_hat))

print("Manual TSLS Standard Errors:")
for name, se in zip(second_stage.params.index, manual_tsls_se):
    print(f"{name}: {se:.5f}")


OLS Results:
                             OLS Regression Results                            
Dep. Variable:           np.log(WAGE)   R-squared:                       0.157
Model:                            OLS   Adj. R-squared:                  0.151
Method:                 Least Squares   F-statistic:                     26.29
Date:                Sat, 24 May 2025   Prob (F-statistic):           1.30e-15
Time:                        03:09:22   Log-Likelihood:                -431.60
No. Observations:                 428   AIC:                             871.2
Df Residuals:                     424   BIC:                             887.4
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.5220      0.199     -

The OLS estimate for EDU is 0.108 and highly significant, meaning more education is linked to higher wages. But since EDU might be endogenous (like being tied to unobserved ability), the OLS estimate could be biased. The TSLS estimate, using parental education as instruments, drops to 0.061 and is only marginally significant. This suggests OLS may overstate the return to education. Both models show the expected concave effect of experience on wages—EXPER is positive and EXPER² is negative.

2)

In [None]:
# Load and prepare data
liquor_df = pd.read_csv("/content/liquor5.csv")
liquor_df['id'] = np.repeat(np.arange(1, 41), 3)
liquor_df['year'] = list(range(1, 4)) * 40
liquor_df.sort_values(by=['id', 'year'], inplace=True)

# First-difference transformation
liquor_df['D_liquor'] = liquor_df.groupby('id')['liquor'].diff()
liquor_df['D_income'] = liquor_df.groupby('id')['income'].diff()
fd_df = liquor_df.dropna(subset=['D_liquor', 'D_income'])

fd_model = smf.ols('D_liquor ~ D_income', data=fd_df).fit(
    cov_type='cluster', cov_kwds={'groups': fd_df['id']}
)
print("First-Difference Regression:\n", fd_model.summary())

# Mean-differencing transformation
mean_df = liquor_df.groupby('id')[['liquor', 'income']].transform(lambda x: x - x.mean())
mean_df['id'] = liquor_df['id']
mean_model = smf.ols('liquor ~ income', data=mean_df).fit(
    cov_type='cluster', cov_kwds={'groups': mean_df['id']}
)
print("Mean-Difference Regression:\n", mean_model.summary())

First-Difference Regression:
                             OLS Regression Results                            
Dep. Variable:               D_liquor   R-squared:                       0.022
Model:                            OLS   Adj. R-squared:                  0.009
Method:                 Least Squares   F-statistic:                     2.475
Date:                Sat, 24 May 2025   Prob (F-statistic):              0.124
Time:                        03:10:30   Log-Likelihood:                -140.41
No. Observations:                  80   AIC:                             284.8
Df Residuals:                      78   BIC:                             289.6
Df Model:                           1                                         
Covariance Type:              cluster                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.4029 

Both differencing methods control for fixed household traits. The FD estimate is 0.098 and the mean-diff estimate is 0.021, but neither are statistically significant. This means there’s weak evidence that income changes affect liquor spending. The FD result might be more reliable since it focuses on changes within households. Low R² in both models shows income alone doesn’t explain much of the variation.

3)

In [None]:
# Load data and prepare data
mexican_df = pd.read_csv("/content/mexican.csv")

mexican_df.rename(columns={
    'lnprice': 'LNPRICE',
    'nocondom': 'NOCONDOM',
    'rich': 'RICH',
    'regular': 'REGULAR',
    'alcohol': 'ALCOHOL'
}, inplace=True)
mexican_df['ID'] = mexican_df['id'].astype('category')

# OLS regression
ols_model = smf.ols('LNPRICE ~ RICH + REGULAR + ALCOHOL + NOCONDOM + bar + street + othersite', data=mexican_df).fit()

# Fixed Effects (omit sex worker characteristics, use ID dummies)
fe_model = smf.ols('LNPRICE ~ RICH + REGULAR + ALCOHOL + NOCONDOM + bar + street + othersite + C(ID)', data=mexican_df).fit()

# 95% CI for NOCONDOM in FE
risk_ci = fe_model.conf_int().loc['NOCONDOM']

# Output for Exercise 3
print("\nExercise 3 - OLS and Fixed Effects Coefficients:")
print("OLS Coefficients:\n", ols_model.params)
print("\nFixed Effects Coefficients:\n", fe_model.params.drop(labels=[x for x in fe_model.params.index if x.startswith("C(ID)")]))
print(f"\n95% CI for Risk Premium (NOCONDOM, FE): {risk_ci.values}")



Exercise 3 - OLS and Fixed Effects Coefficients:
OLS Coefficients:
 Intercept   -1.164438e+12
RICH         3.727328e-01
REGULAR     -1.325601e-01
ALCOHOL      2.595080e-01
NOCONDOM    -7.558661e-02
bar          1.164438e+12
street       1.164438e+12
othersite    1.164438e+12
dtype: float64

Fixed Effects Coefficients:
 Intercept    3.913135
RICH         0.082636
REGULAR      0.037219
ALCOHOL     -0.056856
NOCONDOM     0.170282
bar          1.351629
street       1.508333
othersite    1.053173
dtype: float64

95% CI for Risk Premium (NOCONDOM, FE): [0.11965418 0.22090984]


3 part 1

The OLS model shows rich clients pay about 37% more and not using a condom lowers the price by 7.5%. But this could be due to unobserved worker traits. After adding fixed effects, the rich effect drops to 8.3%, and the price for unprotected sex jumps to 17%, which is statistically significant. This suggests OLS may underestimate the risk premium because higher-charging workers might also use condoms more.

3 part 2

xIn the fixed effects model, we control for unobserved sex worker traits by including individual IDs. The price impact for rich clients drops to about 8.3%, and unprotected sex now increases the price by 17%. This shift shows that once we control for individual differences, the risk premium for unprotected sex becomes larger and statistically significant.



3 part 3

The 95% confidence interval for the risk premium (NOCONDOM) in the fixed effects model is [0.120, 0.221], meaning unprotected sex adds roughly 12% to 22% to the price. This is much higher than the OLS estimate, which supports the idea that failing to control for sex worker characteristics can lead to underestimating the true risk premium.