# Open Midterm 2

## FINM 36700 - 2025

### UChicago Financial Mathematics

* Mark Hendricks
* hendricks@uchicago.edu

***

Jacopo Michelacci

12570213

## Scoring

| Problem | Points |
|---------|--------|
| 1       | 30     |
| 2       | 20     |
| 2       | 20     |
| 2       | 30     |

**Numbered problems are worth 5pts unless specified otherwise.**

## Submission

You should submit a **single** Jupyter notebook (`.ipynb`) file containing all of your code and answers to Canvas. 

Note: If any other files are required to run your notebook, please include them **and only them** in a single `.zip` file.

## AI DISCLOSURE

I used AI as a support tool throughout this exam to assist with coding, organization and languange fluency. I provided the data, ran the analyses, and verified all results, while AI helped refine the structure of the notebook, suggest code implementations. The overall reasoning and conclusions were developed by me, with AI serving mainly as a technical and explanatory aid.

## Data

**All data files are found in at the course web-book.**

https://markhendricks.github.io/finm-portfolio/.

The exam uses the data found in `commodity_factors.xlsx`
* sheet `factors`
* sheet `returns`

Both tabs contain **daily** returns for a set of commodity futures from `January 2010` to `October 2025`.
* approximate 252 observations per year for purposes of annualization

Factors are
* **LVL**: A level factor of commodity data
* **HMS**: Hard Minus Soft commodities
* **IMO**: Input Minus Output commodities

Returns are
* various commodity futures across energy, metals, livestock, and agriculture

In [164]:
import pandas as pd
import numpy as np

In [165]:
factors = pd.read_excel('/Users/jacopomichelacci/FINM_32500/data/commodity_factors.xlsx', sheet_name='factors', index_col=0, parse_dates=True)
returns = pd.read_excel('/Users/jacopomichelacci/FINM_32500/data/commodity_factors.xlsx', sheet_name='returns', index_col=0, parse_dates=True)

In [166]:
factors

Unnamed: 0_level_0,LVL,HMS,IMO
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2010-01-05,0.001741,-0.003471,0.006597
2010-01-06,0.012889,0.016399,-0.020703
2010-01-07,-0.011484,0.011919,-0.001153
2010-01-08,0.000436,0.007519,0.000570
2010-01-11,-0.007377,0.006306,-0.000623
...,...,...,...
2025-10-27,-0.001392,-0.004590,0.007534
2025-10-28,-0.002979,-0.012895,0.002998
2025-10-29,0.008762,0.006220,0.006162
2025-10-30,0.019519,0.027967,-0.011074


In [167]:
returns

Unnamed: 0_level_0,CL,GC,HO,LE,NG,PL,RB,SB,SI,ZC,ZL,ZM,ZS
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
2010-01-05,0.003190,0.000358,0.001643,0.011127,-0.041978,0.008897,0.009789,0.000724,0.019553,0.000597,-0.004646,0.010759,0.002620
2010-01-06,0.017244,0.015920,0.004148,-0.004344,0.065993,0.013980,0.005459,0.027858,0.021484,0.007164,-0.000983,-0.004696,-0.001663
2010-01-07,-0.006251,-0.002465,-0.008896,-0.000291,-0.033783,0.000515,-0.000796,-0.014432,0.009360,-0.010077,-0.016720,-0.034287,-0.031176
2010-01-08,0.001089,0.004501,0.007648,-0.001164,-0.009817,0.007469,0.009555,-0.016786,0.006818,0.013174,-0.011503,-0.000652,-0.004667
2010-01-11,-0.002779,0.010982,-0.009181,-0.009030,-0.051313,0.015148,-0.005846,-0.028333,0.012190,-0.001182,-0.008601,-0.006845,-0.011106
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2025-10-27,-0.003089,-0.028288,0.013732,-0.021070,0.041768,-0.009725,-0.001196,-0.034068,-0.037518,0.012995,0.009946,0.013941,0.024478
2025-10-28,-0.018920,-0.008921,-0.020073,-0.005790,-0.028181,-0.000887,0.002499,-0.006224,0.012091,0.007580,-0.010045,0.027834,0.010307
2025-10-29,0.005486,0.004412,0.015541,0.017143,0.009268,0.009068,0.025192,0.003479,0.012647,0.004630,-0.001990,0.007178,0.001855
2025-10-30,0.001488,0.004418,0.014726,0.016746,0.171801,0.010683,0.015048,-0.009709,0.014815,-0.008641,-0.010167,0.022352,0.010183


# 1. Commodity Returns and Factors

### 1.1 Factor Summary Statistics

For each of the three factors, report only the following summary statistics (rounded to at least 6 decimal places):
- Annualized Mean Return
- Annualized Volatility
- Annualized Sharpe Ratio

In [168]:
# Annualization constants
trading_days = 252

# Summary stats
summary = pd.DataFrame({
    'Ann. Mean': factors.mean() * trading_days,
    'Ann. Vol': factors.std() * np.sqrt(trading_days),
})
summary['Ann. Sharpe'] = summary['Ann. Mean'] / summary['Ann. Vol']

# Round to 6 decimals
summary = summary.round(6)
print(summary)

     Ann. Mean  Ann. Vol  Ann. Sharpe
LVL   0.064439  0.154613     0.416775
HMS   0.008842  0.226259     0.039078
IMO   0.007281  0.153281     0.047500


### 1.2 Factor Correlations

Calculate and report the correlation matrix between the three factors (rounded to at least 6 decimal places).

In [169]:
corr_matrix = factors.corr().round(6)
print(corr_matrix)

          LVL       HMS       IMO
LVL  1.000000  0.400202  0.010439
HMS  0.400202  1.000000 -0.071747
IMO  0.010439 -0.071747  1.000000


### 1.3 Interpretation

Does the factor construction make sense given the correlations you observe?


Yes the correlations make sense. LVL and HMS are moderately correlated (0.40), suggesting some shared exposure (e.g., common risk drivers like economic cycles). IMO, however, is almost uncorrelated with both (≈0), implying it captures a distinct source of variation.

### 1.4 Tangency Portfolio Weights

Build a tangency portfolio using the three factors as assets.

Report the weights of each factor in the tangency portfolio, rounded to at least 6 decimal places. You may assume that the factors use *excess* return.

In [170]:
mu = factors.mean() * 252                     # annualized mean (excess returns)
Sigma = factors.cov() * 252                   # annualized covariance matrix

# Tangency portfolio (no risk-free rate)
w_tan = np.linalg.inv(Sigma) @ mu
w_tan = w_tan / w_tan.sum()                   # normalize weights to sum to 1

w_tan = pd.Series(w_tan, index=factors.columns).round(6)
print(w_tan)

LVL    1.171917
HMS   -0.250927
IMO    0.079011
dtype: float64


### 1.5 Interpretation

What do the tangency portfolio weights suggest about the relative importance of each factor?


The tangency portfolio assigns the largest positive weight to LVL (1.17), showing it’s the dominant contributor to risk-adjusted returns. HMS receives a negative weight (-0.25), meaning it likely worsens the portfolio’s Sharpe ratio or serves as a hedge. IMO (0.08) adds minor diversification.

### 1.6. 

Estimate an autoregression of the factor `LVL`...

$$r_t = \gamma + \rho\, r_{t-1} + \epsilon_t$$

Only report $\rho$ (rounded to at least 6 decimal places).

Does the `LVL` factor exhibit momentum?

In [171]:
lvl = factors['LVL'].dropna()
rho = np.corrcoef(lvl[1:], lvl[:-1])[0,1]
print(round(rho, 6))


0.017041


it does not exhibit meaningful momentum as it is very close to 0.

# 2. Single Factor Model

### 2.1 LVL Factor

We want to test the hypothesis that:

$$
\mathbb{E}[r_{i}] = \beta_{i,LVL} \cdot \mathbb{E}[r_{LVL}]
$$

Regress each commodity's returns against the LVL factor, and report the mean absolute alpha, $\bar{\alpha}$ and $r^2$ across all commodities. 

Annualize the alpha.

Output *exactly* the following two numbers rounded to 6 decimal places:
- Mean Absolute Annualized Alpha across all commodities
- Mean R-squared across all commodities

In [172]:
results = []

for col in returns.columns:
    y = returns[col].dropna()
    x = factors.loc[y.index, 'LVL']
    X = np.column_stack((np.ones(len(x)), x))
    beta = np.linalg.inv(X.T @ X) @ X.T @ y
    y_pred = X @ beta
    residuals = y - y_pred
    r2 = 1 - (residuals.var() / y.var())
    results.append((abs(beta[0]) * 252, r2))  # annualized alpha

results_df = pd.DataFrame(results, columns=['alpha_ann', 'r2'])
mean_abs_alpha = results_df['alpha_ann'].mean().round(6)
mean_r2 = results_df['r2'].mean().round(6)

print(mean_abs_alpha, mean_r2)


0.034876 0.25989


### 2.2. Interpretation

If our hypothesis were true, what would you expect the values of $\bar{\alpha}$ and $\bar{r^2}$ to be?

If the hypothesis were true, we’d expect ᾱ ≈ 0, meaning LVL fully explains expected returns. 
Here, R² only measures how much of each commodity’s return variation LVL explains, not pricing accuracy hence R2 is not meaningful.

### 2.3 Cross-Sectional Test

Let's test the one-factor `LVL` model directly. From `2.1`, we already have what we need:

- The dependent variable, (y): mean excess returns from each of the commodities.
- The regressor, (x): the market beta for each commodity from the time-series regressions.

Then we can estimate the following equation:

$$
\underbrace{\mathbb{E}\left[\tilde{r}^{i}\right]}_{n\times 1\text{ data}} = \textcolor{ForestGreen}{\underbrace{\eta}_{\text{regression intercept}}} + \underbrace{{\beta}^{i,\text{LVL}};}_{n\times 1\text{ data}}~ \textcolor{ForestGreen}{\underbrace{\lambda_{\text{LVL}}}_{\text{regression estimate}}} + \textcolor{ForestGreen}{\underbrace{\upsilon}_{n\times 1\text{ residuals}}}
$$

Report exactly the following 3 numbers (rounded to at least 6 decimal places):
* The R-squared of this regression
* The intercept estimate, $\hat{\eta}$
* The regression coefficient $\lambda_{LVL}$

In [173]:
# Cross-sectional test
mean_returns = returns.mean() * 252          # annualized mean excess returns
betas = []                                   # from time-series in 2.1

for col in returns.columns:
    y = returns[col].dropna()
    x = factors.loc[y.index, 'LVL']
    X = np.column_stack((np.ones(len(x)), x))
    beta = np.linalg.inv(X.T @ X) @ X.T @ y
    betas.append(beta[1])

betas = pd.Series(betas, index=returns.columns)

# Cross-sectional regression
X_cs = np.column_stack((np.ones(len(betas)), betas))
y_cs = mean_returns.loc[betas.index]
b_cs = np.linalg.inv(X_cs.T @ X_cs) @ X_cs.T @ y_cs
y_pred = X_cs @ b_cs
r2_cs = 1 - np.var(y_cs - y_pred) / np.var(y_cs)

print(round(r2_cs,6), round(b_cs[0],6), round(b_cs[1],6))


0.093357 0.041137 0.023302


### 2.4.

Does your time-series or cross-sectional estimate give a higher premium to the `LVL` factor?

The time-series estimate gives a higher premium to the LVL factor, since its implied average annual return (~0.064) exceeds the cross-sectional risk premium (0.023302)

# 3. Trading the Model

#### 3.1 Beta Estimation

For each commodity, report their `LVL` beta rounded to at least 6 decimal places. 

Display a table of these betas, sorted from lowest to highest.

#### Remember
We estimated the betas in the time-series regression in `2.1`.

#### Hint
Use `df.sort_values(by=<YOUR_DF>, ascending=True)` to sort your results.

In [174]:
beta_df = pd.DataFrame({'Commodity': returns.columns, 'LVL_beta': betas.round(6)})
beta_df = beta_df.sort_values(by='LVL_beta', ascending=True).reset_index(drop=True)
print(beta_df)


   Commodity  LVL_beta
0         LE  0.243876
1         GC  0.441604
2         ZM  0.698967
3         SB  0.742221
4         ZS  0.770218
5         ZC  0.823626
6         PL  0.834771
7         ZL  0.850391
8         SI  1.038445
9         NG  1.407854
10        HO  1.533688
11        RB  1.695943
12        CL  1.918395


#### 3.2 Portfolio Formation

Regardless of your answer to `3.1`, allocate your portfolio as follows:

- Go long `GC`, `LE`, and `ZM`
- Go short `CL`, `HO`, and `RB`

Go long `1` and short `0.25`.

That is, your portfolio should be:
$$
r_{port} =  1 \cdot \left(
    \frac{1}{3} \cdot r_{GC} + \frac{1}{3} \cdot r_{LE} + \frac{1}{3} \cdot r_{ZM}
\right) - 0.25 \cdot \left(\frac{1}{3} \cdot r_{CL} + \frac{1}{3} \cdot r_{HO} + \frac{1}{3} \cdot r_{RB}\right)
$$
Report the last 5 daily returns of your betting against beta portfolio, rounded to at least 6 decimal places.

In [175]:
# Portfolio construction
long_leg = returns[['GC', 'LE', 'ZM']].mean(axis=1)
short_leg = returns[['CL', 'HO', 'RB']].mean(axis=1)

r_port = 1 * long_leg - 0.25 * short_leg
last5 = r_port.tail(5).round(6)
print(last5)


Date
2025-10-27   -0.012593
2025-10-28    0.007415
2025-10-29    0.005726
2025-10-30    0.011900
2025-10-31    0.007463
dtype: float64


#### 3.3 Performance Evaluation

For your portfolio, report the following performance statistics (rounded to at least 6 decimal places):

- Annualized Return
- Annualized Volatility
- Annualized Sharpe Ratio

In [176]:
ann_ret = r_port.mean() * 252
ann_vol = r_port.std() * np.sqrt(252)
ann_sharpe = ann_ret / ann_vol

print(f"Annualized Return:   {ann_ret:.6f}")
print(f"Annualized Volatility: {ann_vol:.6f}")
print(f"Annualized Sharpe Ratio: {ann_sharpe:.6f}")



Annualized Return:   0.053383
Annualized Volatility: 0.144502
Annualized Sharpe Ratio: 0.369425


#### 3.4 

For your portfolio, test the hypothesis that its premium can be explained by the `LVL` factor alone:

$$
\mathbb{E}[r_{port}] = \beta_{port,LVL} \cdot \mathbb{E}[r_{LVL}]
$$

Report exactly the following 3 numbers (rounded to at least 6 decimal places):
- Annualized Alpha
- `LVL` Beta
- R-squared for the regression

In [177]:
x = factors['LVL'].loc[r_port.index]
X = np.column_stack((np.ones(len(x)), x))
beta = np.linalg.inv(X.T @ X) @ X.T @ r_port
y_pred = X @ beta
resid = r_port - y_pred
r2 = 1 - resid.var() / r_port.var()

alpha_ann = beta[0] * 252
beta_lvl = beta[1]

print(round(alpha_ann,6), round(beta_lvl,6), round(r2,6))


0.05129 0.03248 0.001208


# 4. Multi-Factor Model

We now want to test a multi-factor model using `LVL` and `HMS` and `IMO` as factors:

$$
\mathbb{E}[r_{i}] = \beta_{i,WTI} \cdot \mathbb{E}[r_{LVL}] + \beta_{i,HMS} \cdot \mathbb{E}[r_{HMS}] + \beta_{i,IMO} \cdot \mathbb{E}[r_{IMO}]
$$

#### 4.1 Time Series Test
Estimate the *time series* test of this pricing model. Regress each commodity's returns against the three factors, and report the following for each commodity (rounded to at least 6 decimal places):
- Annualized Alpha
- `LVL`, `HMS`, and `IMO` Betas
- R-squared

In [178]:
results = []

for col in returns.columns:
    y = returns[col].dropna()
    X = factors.loc[y.index, ['LVL','HMS','IMO']]
    X = np.column_stack((np.ones(len(X)), X))
    beta = np.linalg.inv(X.T @ X) @ X.T @ y
    y_pred = X @ beta
    resid = y - y_pred
    r2 = 1 - resid.var() / y.var()
    results.append([abs(beta[0])*252, beta[1], beta[2], beta[3], r2])  # annualized alpha

results_df = pd.DataFrame(results, columns=['Alpha_ann','Beta_LVL','Beta_HMS','Beta_IMO','R2']).round(6)
results_df.index = returns.columns

print(results_df)



    Alpha_ann  Beta_LVL  Beta_HMS  Beta_IMO        R2
CL   0.035599  1.527030  0.656709  0.653682  0.637664
GC   0.071595  0.373286  0.123614 -0.393933  0.348973
HO   0.023058  1.205122  0.543109  1.014183  0.759600
LE   0.059421  0.328136 -0.148049  0.236226  0.119500
NG   0.080357  1.035345  0.662592 -1.501411  0.373050
PL   0.009842  0.711887  0.217593 -0.439602  0.364752
RB   0.015995  1.243184  0.749484  1.335847  0.786582
SB   0.053905  1.065017 -0.542152 -0.510662  0.321713
SI   0.056488  0.901595  0.246072 -0.701687  0.434255
ZC   0.031637  1.246405 -0.717257 -0.262703  0.514722
ZL   0.026659  1.055894 -0.356494  0.316702  0.436060
ZM   0.028435  1.159886 -0.789761  0.154944  0.511819
ZS   0.042733  1.147212 -0.645459  0.098414  0.707169


#### 4.2 

Report the annualized Sharpe ratio (rounded to at least 6 decimal places) of the tangency portfolio formed from 
* the individual commodities.
* the three factors

What should be true of the Sharpe ratios if the factor pricing model is accurate?

In [179]:
# Tangency from commodities
mu_c = returns.mean() * 252
cov_c = returns.cov() * 252
w_c = np.linalg.inv(cov_c) @ mu_c
w_c = w_c / w_c.sum()
SR_c = np.sqrt(mu_c.T @ np.linalg.inv(cov_c) @ mu_c)

# Tangency from factors
mu_f = factors.mean() * 252
cov_f = factors.cov() * 252
SR_f = np.sqrt(mu_f.T @ np.linalg.inv(cov_f) @ mu_f)

print(f"Sharpe ratio (commodities tangency): {SR_c:.6f}")
print(f"Sharpe ratio (factors tangency):     {SR_f:.6f}")



Sharpe ratio (commodities tangency): 0.852239
Sharpe ratio (factors tangency):     0.440601


If the factor pricing model is accurate, the two Sharpe ratios should be equal (or very close) meaning the factor space fully spans the commodities’ risk–return opportunities

#### 4.3 Cross-Sectional Test

Run the cross-sectional test of this multi-factor model:

$$
\mathbb{E}\left[\tilde{r}^{i}\right] = \lambda_{0} + \lambda_{LVL} \cdot \beta_{i,LVL} + \lambda_{HMS} \cdot \beta_{i,HMS} + \lambda_{IMO} \cdot \beta_{i,IMO} + \nu_{i}
$$

Report exactly the following numbers (rounded to at least 6 decimal places):
- $\lambda_{0}$
- $\lambda_{LVL}$
- $\lambda_{HMS}$
- $\lambda_{IMO}$
- $r^2$ of the cross-sectional regression
- MAE of the cross-sectional regression

Annualize the MAE.

In [180]:
mean_ret = returns.mean() * 252

X = results_df[['Beta_LVL','Beta_HMS','Beta_IMO']]
X = np.column_stack((np.ones(len(X)), X))
y = mean_ret

b_cs = np.linalg.inv(X.T @ X) @ X.T @ y
y_pred = X @ b_cs
resid = y - y_pred
r2 = 1 - resid.var() / y.var()
mae_ann = abs(resid).mean() * 252

print(round(b_cs[0],6), round(b_cs[1],6), round(b_cs[2],6), round(b_cs[3],6),
      round(r2,6), round(mae_ann,6))


0.080852 -0.016413 0.050369 -0.016984 0.646543 3.947389


#### 4.4 Interpretation

Do the results of the cross-sectional test support the multi-factor model?

Yes the results support the multi-factor model reasonably well:
the R² = 0.6465 shows strong explanatory power, and the MAE (3.947) is moderate.
Each factor’s λ is meaningful (some positive, some negative), suggesting all three contribute to explaining cross-sectional returns rather than LVL alone.

#### 4.5 Risk Premia

Compare the risk premia ($\lambda$'s) from the cross-sectional test to the average returns of each factor. Report exactly the following table (rounded to at least 6 decimal places):

- Average return of each factor
- Risk premia from the cross-sectional test

Annualize the estimates.

In [181]:
avg_factors = (factors.mean() * 252).round(6)
risk_premia = pd.Series({
    'LVL': b_cs[1],
    'HMS': b_cs[2],
    'IMO': b_cs[3]
}).round(6)

comparison = pd.DataFrame({
    'Average Return': avg_factors,
    'Risk Premium (λ)': risk_premia
})
print(comparison)


     Average Return  Risk Premium (λ)
LVL        0.064439         -0.016413
HMS        0.008842          0.050369
IMO        0.007281         -0.016984


#### 4.6 Interpretation

What do you observe from the comparison of average returns and risk premia? 

Theoretically, what could cause the risk premium of a factor to deviate from its average return?

The comparison shows some clear mismatches for instance, LVL has a positive average return but a negative estimated premium, meaning the model sees it as a hedge rather than a source of reward. 

Differences like this usually come from sampling noise, missing factors, or unstable beta estimates that distort how risk and return line up in the cross-section.