Benjamin Bierlein 76418644 Econ 125 HW 8 Spring 2025

In [None]:
#Problem 1

import pandas as pd
import numpy as np
import statsmodels.api as sm

# Load rwm88.csv
rwm = pd.read_csv("/content/rwm88.csv")
rwm = rwm[rwm['hhninc2'] > 0].copy().iloc[:3000]
rwm['linc'] = np.log(rwm['hhninc2'])
rwm['age2'] = rwm['age'] ** 2
rwm['post'] = ((rwm['fachhs'] == 1) | (rwm['univ'] == 1)).astype(int)

# Define response and regressors
y_rwm = rwm['docvis']
X_rwm = rwm[['female', 'age', 'age2', 'self', 'linc', 'post', 'public']]
X_rwm = sm.add_constant(X_rwm)

# Fit Poisson model
poisson_model = sm.GLM(y_rwm, X_rwm, family=sm.families.Poisson()).fit()
coef_dict = poisson_model.params

# Binary variable percent change
perc_change_bin = pd.DataFrame({
    "Variable": ["Female", "Self-employed", "Post-secondary", "Public Insurance"],
    "Percent Change in E[Visits]": (np.exp(coef_dict[['female', 'self', 'post', 'public']]) - 1).round(4) * 100
})

# Age marginal effects
b_age = coef_dict['age']
b_age2 = coef_dict['age2']
ages = np.array([30, 50, 70])
age_effects = pd.DataFrame({
    "Age": ages,
    "Marginal Effect on log(E[Visits]) (%)": (100 * (b_age + 2 * b_age2 * ages)).round(2)
})

# LINC coefficient interpretation
linc_effect = round(100 * coef_dict['linc'], 2)

(perc_change_bin, age_effects, f"A 1% increase in income changes expected visits by {linc_effect}%")


(                Variable  Percent Change in E[Visits]
 female            Female                        16.44
 self       Self-employed                       -44.82
 post      Post-secondary                       -18.42
 public  Public Insurance                        18.06,
    Age  Marginal Effect on log(E[Visits]) (%)
 0   30                                  -0.57
 1   50                                   2.84
 2   70                                   6.26,
 'A 1% increase in income changes expected visits by -7.69%')

1)

FEMALE: The coefficient indicates that being female increases the expected number of doctor visits by 16.44%, holding all else constant. This aligns with expectations, as women often utilize healthcare services more frequently.

SELF (self-employed): The coefficient is negative and statistically significant. Being self-employed is associated with a 44.82% decrease in the expected number of doctor visits. This may reflect less access to employer-provided health insurance or less time availability.

POST (post-secondary education): Having a post-secondary degree is associated with an 18.42% decrease in doctor visits. This could suggest that more educated individuals maintain better preventive care or are healthier overall.

PUBLIC: Being covered by public health insurance increases expected visits by 18.06%, possibly due to easier access to care or different utilization behavior compared to those with private insurance.

2)

At age 30: A one-year increase in age leads to a −0.57% change in expected doctor visits.

At age 50: The marginal effect becomes positive, with a +2.84% increase.

At age 70: The effect grows further, with a +6.26% increase.

3)

The coefficient on log income is −0.0769, meaning a 1% increase in income leads to a 0.077% decrease in expected doctor visits.

Interpreted elastically, this suggests that higher-income individuals might visit doctors less frequently (possibly due to better overall health or different utilization patterns).



In [None]:
#Problem 2

from scipy.stats import norm
from scipy.optimize import minimize

# Load and prepare data
cex = pd.read_csv("/content/cex5.csv")
X = cex[['income', 'smsa', 'advanced', 'college', 'older']]
X = sm.add_constant(X)
y = cex['appar']

# Linear regression (OLS)
ols_model = sm.OLS(y, X).fit()
ols_results = pd.DataFrame({
    "Variable": X.columns,
    "Coefficient": ols_model.params.round(4)
})

# Probit model
cex['shop'] = (cex['appar'] > 0).astype(int)
probit_model = sm.Probit(cex['shop'], X).fit()
probit_results = pd.DataFrame({
    "Variable": X.columns,
    "Coefficient": probit_model.params.round(4)
})

# Tobit model log-likelihood
def tobit_loglike(params):
    beta = params[:-1]
    sigma = np.exp(params[-1])
    xb = np.dot(X, beta)
    ll = np.where(y > 0,
                  np.log(norm.pdf((y - xb) / sigma)) - np.log(sigma),
                  np.log(norm.cdf(-xb / sigma)))
    return -np.sum(ll)

init_params = np.append(np.zeros(X.shape[1]), np.log(np.std(y)))
opt = minimize(tobit_loglike, init_params, method='BFGS')
b_hat = opt.x
se = np.sqrt(np.diag(opt.hess_inv))
sigma_hat = np.exp(b_hat[-1])

# Tobit coefficient table
tobit_results = pd.DataFrame({
    "Variable": X.columns,
    "Coefficient": b_hat[:-1].round(4),
    "Std. Error": se[:-1].round(4)
})

# Prediction using Tobit
x1 = np.array([1, 65000 / 100, 1, 1, 0, 0])
x2 = np.array([1, 125000 / 100, 1, 1, 0, 0])
mu1 = np.dot(x1, b_hat[:-1])
mu2 = np.dot(x2, b_hat[:-1])
Ey1 = norm.cdf(mu1 / sigma_hat) * (mu1 + sigma_hat * norm.pdf(mu1 / sigma_hat) / norm.cdf(mu1 / sigma_hat))
Ey2 = norm.cdf(mu2 / sigma_hat) * (mu2 + sigma_hat * norm.pdf(mu2 / sigma_hat) / norm.cdf(mu2 / sigma_hat))

predictions = pd.DataFrame({
    "Scenario": ["$65k income, urban, advanced degree", "$125k income, urban, advanced degree"],
    "Predicted Monthly Apparel Spending": [round(Ey1, 2), round(Ey2, 2)]
})

(ols_results, probit_results, tobit_results, predictions)


Optimization terminated successfully.
         Current function value: 0.619399
         Iterations 5


(          Variable  Coefficient
 const        const       2.8156
 income      income       0.1656
 smsa          smsa       6.3788
 advanced  advanced      12.9188
 college    college       2.6626
 older        older      -5.7041,
           Variable  Coefficient
 const        const      -0.0429
 income      income       0.0022
 smsa          smsa       0.1781
 advanced  advanced       0.4554
 college    college       0.3171
 older        older      -0.3200,
    Variable  Coefficient  Std. Error
 0     const     -24.7875      4.2716
 1    income       0.2206      0.0329
 2      smsa      11.1638      4.0074
 3  advanced      22.5224      3.4200
 4   college       9.7091      2.9620
 5     older     -14.0623      0.1843,
                                Scenario  Predicted Monthly Apparel Spending
 0   $65k income, urban, advanced degree                              152.40
 1  $125k income, urban, advanced degree                              284.65)

1)
INCOME: Each additional $100 of household income increases apparel spending by $0.17 per person, on average.

SMSA: Living in an urban area is associated with $6.38 more in apparel spending, likely reflecting access to more shopping or fashion-oriented culture.

ADVANCED: Households where someone holds an advanced degree spend about $12.92 more per person.

COLLEGE: College degree holders spend $2.66 more, though this effect is smaller and less significant than advanced degrees.

OLDER: Households with someone 65+ spend about $5.70 less, possibly due to different spending priorities or fixed incomes

2)
INCOME, SMSA, ADVANCED, and COLLEGE all positively affect the probability of spending on apparel.

OLDER has a negative effect: households with elderly members are less likely to purchase apparel.

These signs are consistent with the OLS model, supporting the same relationships.

3)
Interpretation of Key Coefficients:
Income: Strongly positive effect on apparel spending.

SMSA and ADVANCED: Both show large, significant positive effects, suggesting urban and highly educated households are bigger apparel spenders.

OLDER: Negative and significant, aligning with earlier models.

Predictions for Apparel Spending:
A household living in an urban area, with income of 65,000 and an advanced degree no one 65 above, is predicted to spend about $152.40/month.

The same household with 125,000 income is predicted to spend about $284.65/month.

