# Anton Melnychuk ECON 3385 - Problem Set 2

January 26th, 2026

In [17]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as smf

whale_data = pd.read_csv('whales_demand.csv')
whale_data.head()

Unnamed: 0,year,price_sperm,price_oil,price_sperm_real,price_oil_real,sperm,oil,shipwrecks
0,1804,1.4,0.5,1.11,0.4,8.636,114.1065,0.064167
1,1805,0.96,0.5,0.68,0.35,6.39,42.17,0.06609
2,1806,0.8,0.5,0.58,0.37,5.313,86.737,0.067864
3,1807,1.0,0.5,0.77,0.38,0.27,65.12,0.06441
4,1808,0.8,0.44,0.7,0.38,8.8,83.75,0.064172


## Question 1

**a)** For OLS to recover consistent estimates of $\beta_1$ when regressing equilibrium quantity on price, we need the error terms to be uncorrelated with price:
- $Cov(\epsilon_t^D, P_t^{oil}) = 0$ (demand shocks uncorrelated with price)
- $Cov(\epsilon_t^S, P_t^{oil}) = 0$ (supply shocks uncorrelated with price)

So that, if price moves independently of demand shocks, we can see how price affects quantity. Same regression will always recover consistent estimates of $\beta_0$.

**b)** Solving for equilibrium:

$$Q_{y}^{oil,D} = Q_{y}^{oil,S}$$

$$\beta_{0} + \beta_{1}P_{y}^{oil} + \epsilon_{y}^{D} = \gamma_{0} + \gamma_{1}P_{y}^{oil} + \epsilon_{y}^{S}$$

Rearranging:

$$(\beta_{1} - \gamma_{1})P_{y}^{oil} = \gamma_{0} - \beta_{0} + \epsilon_{y}^{S} - \epsilon_{y}^{D}$$

$$P_{y}^{oil} = \frac{\gamma_{0} - \beta_{0} + \epsilon_{y}^{S} - \epsilon_{y}^{D}}{\beta_{1} - \gamma_{1}}$$

Since $P_{y}^{oil}$ depends on both $\epsilon_{y}^{S}$ and $\epsilon_{y}^{D}$, the conditions fail. Price is endogenous.

## Question 2

Now we include shipwrecks ($Z_y$) in the supply equation:

**Demand:** $Q_{y}^{oil,D} = \beta_{0} + \beta_{1}P_{y}^{oil} + \epsilon_{y}^{D}$

**Supply:** $Q_{y}^{oil,S} = \gamma_{0} + \gamma_{1}P_{y}^{oil} + \gamma_{2}Z_{y} + \epsilon_{y}^{S}$

where $Z_{y}$ represents shipwrecks and $\epsilon_{y}^{S}$ captures other supply shifters.

Solving for equilibrium price:

$$P_{y}^{oil} = \frac{\gamma_{0} + \gamma_{2}Z_{y} - \beta_{0} + \epsilon_{y}^{S} - \epsilon_{y}^{D}}{\beta_{1} - \gamma_{1}}$$

Now price depends on shipwrecks $Z_y$ in addition to the error terms.

## Question 3

For 2SLS to recover consistent estimates, two conditions must hold:

1. Exclusion condition: $Cov(\epsilon_{y}^{D}, Z_{y}) = 0$
   - Shipwrecks must not affect demand directly
   - Shipwrecks only shifts supply

2. Relevance condition: $Cov(P_{y}^{oil}, Z_{y}) \neq 0$
   - Shipwrecks must affect price
   - From the equilibrium price equation, $Z_y$ enters through $\gamma_2$

Why shipwrecks satisfies these:

From the equilibrium price equation, $Z_{y}$ does not appear in the demand equation, so $Cov(\epsilon_{y}^{D}, Z_{y}) = 0$ holds. And, since $P_{y}^{oil}$ depends on $Z_{y}$ (through the $\gamma_2 Z_y$ term), shipwrecks affects price.

## Question 4

Now regress price on shipwrecks to get predicted price $\hat{P_{y}^{oil}}$:

In [23]:
stage_1 = smf.ols('price_oil_real ~ shipwrecks', data=whale_data).fit()
print(stage_1.summary())

                            OLS Regression Results                            
Dep. Variable:         price_oil_real   R-squared:                       0.266
Model:                            OLS   Adj. R-squared:                  0.259
Method:                 Least Squares   F-statistic:                     39.08
Date:                Mon, 26 Jan 2026   Prob (F-statistic):           8.30e-09
Time:                        12:35:59   Log-Likelihood:                 92.431
No. Observations:                 110   AIC:                            -180.9
Df Residuals:                     108   BIC:                            -175.5
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.2619      0.031      8.402      0.0

**Interpretation:**

- **Intercept**: 0.2619 units expected real price when shipwrecks = 0
- **Shipwrecks coefficient**: 2.524 change in real price per 1 unit increase in shipwrecks. This makes sense, since 100% of ships lost would mean a significant increase in prices per barrel.
- **Positive coefficient** makes sense: more shipwrecks -> fewer ships -> less supply -> higher prices
- **F-statistic**: should be > 10 for a strong instrument -> good

## Question 5

We regress quantity on predicted price from first stage

In [19]:
whale_data['p_hat_oil'] = stage_1.predict(whale_data)
stage_2 = smf.ols('oil ~ p_hat_oil', data=whale_data).fit()
print(stage_2.summary())

                            OLS Regression Results                            
Dep. Variable:                    oil   R-squared:                       0.055
Model:                            OLS   Adj. R-squared:                  0.046
Method:                 Least Squares   F-statistic:                     6.302
Date:                Mon, 26 Jan 2026   Prob (F-statistic):             0.0135
Time:                        12:13:36   Log-Likelihood:                -718.68
No. Observations:                 110   AIC:                             1441.
Df Residuals:                     108   BIC:                             1447.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    504.6173    114.904      4.392      0.0

In [24]:
whale_data.describe()

Unnamed: 0,year,price_sperm,price_oil,price_sperm_real,price_oil_real,sperm,oil,shipwrecks,p_hat_oil,p_hat_sperm
count,110.0,110.0,110.0,110.0,110.0,110.0,110.0,110.0,110.0,110.0
mean,1858.5,0.96532,0.480067,0.905273,0.446364,60.928918,218.971159,0.073098,0.446364,0.905273
std,31.898276,0.426035,0.225719,0.295013,0.122427,50.578117,171.985297,0.025007,0.063107,0.121369
min,1804.0,0.4,0.235,0.42,0.25,0.27,12.28,0.032195,0.343142,0.706753
25%,1831.25,0.65125,0.338125,0.68,0.35,17.794,80.01325,0.051755,0.392504,0.801688
50%,1858.5,0.85,0.3995,0.84,0.43,45.0365,144.2465,0.067925,0.433309,0.880166
75%,1885.75,1.245,0.53875,1.1,0.51,94.8825,349.8365,0.095575,0.503086,1.014363
max,1913.0,2.55,1.45,1.61,0.78,186.219,594.675,0.11621,0.555157,1.114509


## Question 6

Short summary:

| Parameter | Coefficient | Standard Error |
|-----------|-------------|----------------|
| $\beta_0$ (Intercept) | 504.62 | 114.90 |
| $\beta_1$ (Price) | -639.94 | 254.91 |


- $\beta_1 = -639.94$ is negative, which is the correct sign for a demand curve. Higher prices lead to lower quantities demanded, consistent with the law of demand.

- In other words, a 1-unit increase in real price (in the units of the price variable) reduces quantity demanded by approximately 640 units. This means, the coefficient is economically meaningful. Small price increase -> a substantial decrease in quantity demanded.

- $\beta_1$ is statistically significant at the 5% level ($p < 0.05$), meaning we can reject the null hypothesis that price has no effect on quantity with 95% confidence. This provides strong evidence that the negative relationship between price and quantity is not due to random chance.

- The low $R^2 = 0.055$ (5.5% can be explained by instrument) is expected—the focus is on causal identification rather than model fit. The 2SLS estimate successfully identifies the demand curve.

## Question 7

Now we run 2SLS on sperm whale oil.

In [25]:
# Reapeat Q4-Q6 for sperm whale oil
first_stage_sperm = smf.ols('price_sperm_real ~ shipwrecks', data=whale_data).fit()

whale_data['p_hat_sperm'] = first_stage_sperm.predict(whale_data)

second_stage_sperm = smf.ols('sperm ~ p_hat_sperm', data=whale_data).fit()

print("First Stage (Sperm):")
print(first_stage_sperm.summary())
print("\nSecond Stage (Sperm):")
print(second_stage_sperm.summary())

First Stage (Sperm):
                            OLS Regression Results                            
Dep. Variable:       price_sperm_real   R-squared:                       0.169
Model:                            OLS   Adj. R-squared:                  0.162
Method:                 Least Squares   F-statistic:                     22.00
Date:                Mon, 26 Jan 2026   Prob (F-statistic):           8.01e-06
Time:                        12:56:52   Log-Likelihood:                -11.101
No. Observations:                 110   AIC:                             26.20
Df Residuals:                     108   BIC:                             31.60
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.5505      0.08

Short summary:

| Parameter | Coefficient | Standard Error |
|-----------|-------------|----------------|
| $\beta_0$ (Intercept) | 182.04 | 34.68 |
| $\beta_1$ (Price) | -133.79 | 37.98 |

- $\beta_1 = -133.79$ is negative and significant at the 1% level ($p < 0.01$), consistent with the law of demand: higher prices lead to lower quantities demanded.

- A 1-unit increase in the real price of sperm whale oil reduces quantity demanded by approximately 134 units.

- The elasticity exceeds that of regular oil (-1.3), showing that sperm oil consumers are more responsive to price changes.

- The $R^2 = 0.103$ indicates that roughly 10% of the variation in sperm whale oil quantity is accounted for by the instrumented price – nearly double the $R^2$ for regular oil (5.5%).


The coefficient magnitude differs from regular oil, but the scales are different (mean quantity for oil ≈ 219 vs sperm ≈ 61). The price elasticity of sperm oil is higher than regular oil, indicating more price-sensitive demand. This makes sense as sperm is a premium good with more substitutes available.

If non-sperm and sperm oil are substitutes, an increase in sperm price should increase demand for regular oil (positive cross-price effect). To model this, we could extend the demand equation:

$$Q^{oil}_{t} = \beta_{0} + \beta_{1}P^{oil}_{t} + \beta_{2}P^{sperm}_{t} + \epsilon^{D}_{t}$$

This would require instrumenting both prices with supply shifters.