In [None]:
pip install otter-grader

In [None]:
pip install openpyxl

In [1]:
# Initialize Otter
# If you need to install Otter, please uncomment and run the previous cell
import otter
grader = otter.Notebook("ps5.ipynb")

In [None]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from linearmodels.panel import PanelOLS
from linearmodels.panel import PooledOLS
from linearmodels.iv import IV2SLS
from scipy.stats import f
from scipy.stats import chi2
from scipy.stats import t

# Econ 144 – Problem Set 5

In this problem set, you will empirically test the conditional CAPM, conditional and unconditional APT (based on the Fama-French 3-Factor Model), and construct  and evaluate portfolio risk forecasts using both CAPM and APT.

Throughout the entire problem set, please feel free to add code and markdown cells as you need them.


## Problem 1. Testing the Unconditional Fama-French APT

In this problem, you will conduct an empirical test of the (unconditional) Fama-French version of APT. The test is based on monthly return data for twenty-five portfolios over the period 1974 to 2023 (i.e., 50 years of monthly data). Stock listed on the New York Stock Exchange (NYSE), American Stock Exchange (ASE), and the NASDAQ, are allocated to the portfolios based on market capitalization and book-to-market ratio, and are value-weighted within each portfolio. The CRSP value-weighted index (**vwretd**) is used as a proxy for the market portfolio, and the 30-day T-Bill return (**t30ret**) is used for the risk-free return. 

You will conduct tests for the overall period, two twenty-five year subperiods, and five ten-year subperiods.

In [None]:
idxdata = pd.read_excel('monthlyindex1974-2023.xlsx')
idxdata.head()

In [None]:
ffmdata = pd.read_excel('ffdata1969-2023.xlsx')
ffmdata = ffmdata.loc[ffmdata['caldt'] >= 19740101]
ffmdata.head()

In [None]:
rawdata = pd.read_excel('ffm_25port1969-2023.xlsx')
rawdata = rawdata.loc[rawdata['OBSDATE']>=19740101]
rawdata.head()

In [None]:
begdt = np.array([19740101,19840101,19940101,20040101,20140101,19740101,19990101,19740101])
enddt = np.array([19831231,19931231,20031231,20131231,20231231,19981231,20231231,20231231])
pdnum = np.arange(1, len(begdt)+1, 1, dtype=int)

testpd = pd.DataFrame({'pdnum' : pdnum,
                       'begdt' : begdt,
                       'enddt' : enddt
                      })
npd = len(testpd)
npd

<!-- BEGIN QUESTION -->

**Question 1.a.**
Using the entire 50-year sample, regress excess returns of each portfolio on the excess (value-weighted) market return and perform tests (at the 1% level of significance) that the intercept is zero. For each portfolio, report the point estimates (of the intercept), $t$-statistics, and whether or not you reject the Fama-French APT.



<!--
BEGIN QUESTION
name: q1_a
manual: true
-->

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.b.**
For each of the two 25 year subperiods, regress excess returns of each portfolio on the excess (value-weighted) market return and perform tests (at the 1% level of significance) that the intercept is zero. For each portfolio, report the point estimates (of the intercept), $t$-statistics, and whether or not you reject the Fama-French APT in each subperiod.



<!--
BEGIN QUESTION
name: q1_b
manual: true
-->

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.c.**
For the entire 50-year period, each of the 25-year subperiods, and each of the 10-year subperiods, jointly test (at the 1% level of significance) that the intercepts for all ten portfolios are zero using the $F$-test statistic $J_1$. For the overall period and each subperiod, report the $F$-test statistic $J_1$ and their $p$-values, and whether or not you reject Fama-French APT in the overall period and each subperiod.

<!--
BEGIN QUESTION
name: q1_c
manual: true
-->

## Problem 2. Testing the Unconditional CAPM and APT in the Cross-Section

In this problem, you will use cross-sectional regressions to conduct empirical tests of both the (unconditional) CAPM and the Fama-French version of APT. The test is based on the same portfolio data and market data as in problem 1.

Using cross-sectional regressions allows testing of two additional implications of either CAPM or APT (beyond "the intercept is zero"):
1. **CAPM**: $\beta$ completely captures the cross-sectional variation of expected returns, **APT**: the factor loadings completely capture the cross-sectional variation of expected returns.
2. **CAPM**: the market risk premium is positive, **APT**: the factor risk premia are positive.

This testing approach involves two-steps:
1. In any given subperiod, use a time-series regression to find the $\beta$'s (CAPM), or factor loadings (APT), and the variance of the residuals for each portfolio.
2. In each period within the subperiod, run a cross-sectional regression of portfolio returns on the the estimated $\beta$'s (CAPM), or factor loadings (APT), and the residual variances.

Using the results of the cross-sectional regressions, you can calculate point estimates, estimate $t$-statistics, and conduct inference as outlined in class.


In [None]:
idxdata = pd.read_excel('monthlyindex1974-2023.xlsx')
idxdata.head()

In [None]:
ffmdata = pd.read_excel('ffdata1969-2023.xlsx')
ffmdata = ffmdata.loc[ffmdata['caldt'] >= 19740101]
ffmdata.head()

In [None]:
rawdata = pd.read_excel('ffm_25port1969-2023.xlsx')
rawdata = rawdata.loc[rawdata['OBSDATE']>=19740101]
rawdata.head()

In [None]:
begdt = np.array([19740101,19840101,19940101,20040101,20140101,19740101,19990101,19740101])
enddt = np.array([19831231,19931231,20031231,20131231,20231231,19981231,20231231,20231231])
pdnum = np.arange(1, len(begdt)+1, 1, dtype=int)

testpd = pd.DataFrame({'pdnum' : pdnum,
                       'begdt' : begdt,
                       'enddt' : enddt
                      })
npd = len(testpd)
npd

<!-- BEGIN QUESTION -->

**Question 2.a.**
For the overall period (1974-2023), the two twenty-five year subperiods, and the five ten-year subperiods. Use the approach just outlined to test the following three implications of the CAPM:
1. The intercept is zero.
2. The market risk premium is positive.
3. $\beta$ completely captures the cross-sectional variation of expected returns. In this exercise, you can do this by conducting an individual hypothesis test on the coefficient of residual (i.e., idiosyncratic) variance.

What do you conclude?

<!--
BEGIN QUESTION
name: q2_a
manual: true
-->

<!-- END QUESTION -->
    
<!-- BEGIN QUESTION -->

**Question 2.b.**
For the overall period (1974-2023), the two twenty-five year subperiods, and the five ten-year subperiods. Use the approach just outlined to test the following three implications of the Fama-French APT:
1. The intercept is zero.
2. The factor risk premia are positive.
3. The factor loadings completely capture the cross-sectional variation of expected returns. In this exercise, you can do this by conducting an individual hypothesis test on the coefficient of residual (i.e., idiosyncratic) variance.

What do you conclude?

<!--
BEGIN QUESTION
name: q2_b
manual: true
-->

## Panel Data Approach

For the next two parts of the problem, instead of running period-by-period cross-sectional regressions (in the second step) you will use all of the data (within any given subperiod) in a single panel data regression using time fixed effects and within entity (i.e., portfolio) clustered standard errors.

<!-- END QUESTION -->
    
<!-- BEGIN QUESTION -->

**Question 2.c.**
For the overall period (1974-2023), the two twenty-five year subperiods, and the five ten-year subperiods. Use the approach just outlined to test the following three implications of the CAPM:
1. The intercept is zero.
2. The market risk premium is positive.
3. $\beta$ completely captures the cross-sectional variation of expected returns. In this exercise, you can do this by conducting an individual hypothesis test on the coefficient of residual (i.e., idiosyncratic) variance.

What do you conclude? Which approach do you prefer (i.e., period-by-period estimations or panel data)? Why?

<!--
BEGIN QUESTION
name: q2_c
manual: true
-->

<!-- END QUESTION -->
    
<!-- BEGIN QUESTION -->

**Question 2.d.**
For the overall period (1974-2023), the two twenty-five year subperiods, and the five ten-year subperiods. Use the approach just outlined to test the following three implications of the Fama-French APT:
1. The intercept is zero.
2. The factor risk premia are positive.
3. The factor loadings completely capture the cross-sectional variation of expected returns. In this exercise, you can do this by conducting an individual hypothesis test on the coefficient of residual (i.e., idiosyncratic) variance.

What do you conclude? Which approach do you prefer (i.e., period-by-period estimations or panel data)? Why?

<!--
BEGIN QUESTION
name: q2_d
manual: true
-->

## Problem 3. Testing the Conditional CAPM and APT in the Cross-Section

In this problem, you will use cross-sectional regressions to conduct empirical tests of both the **conditional** CAPM and the Fama-French version of APT. The test is based on monthly return data for 100 portfolios over the period 1974 to 2023 (i.e., 50 years of monthly data). Stock listed on the New York Stock Exchange (NYSE), American Stock Exchange (ASE), and the NASDAQ, are allocated to the portfolios based on market capitalization and book-to-markte ratio, and are value-weighted within each portfolio. The CRSP value-weighted index (**vwretd**) is used as a proxy for the market portfolio, and the 30-day T-Bill return (**t30ret**) is used for the risk-free return.

Using cross-sectional regressions allows testing of two additional implications of either CAPM or APT (beyond "the intercept is zero"):
1. **CAPM**: $\beta$ completely captures the cross-sectional variation of expected returns, **APT**: the factor loadings completely capture the cross-sectional variation of expected returns.
2. **CAPM**: the market risk premium is positive, **APT**: the factor risk premia are positive.

This testing approach involves two-steps:
1. In any given period $t$, use a time-series regression to find the $\beta$'s (CAPM), or factor loadings (APT), and the variance of the residuals for each portfolio. The time-series regression is done using a 60-month rolling window from time $t-60$ to time $t-1$. **This step has already been done for you.**
2. In each period within a subperiod, run a cross-sectional regression of portfolio returns on the the estimated $\beta$'s (CAPM), or factor loadings (APT), and the residual variances.

Using the results of the cross-sectional regressions, you can calculate point estimates, estimate $t$-statistics, and conduct inference as outlined in class.

In [None]:
capmdata = pd.read_csv('monthly_ffm100port_beta1974-2023.csv')
capmdata = capmdata.drop('Unnamed: 0', axis=1)
capmdata.head()

In [None]:
ffmdata = pd.read_csv('monthly_ffm100port_ffm1974-2023.csv')
ffmdata = ffmdata.drop('Unnamed: 0', axis=1)
ffmdata.head()

<!-- BEGIN QUESTION -->

**Question 3.a.**
For the overall period (1974-2023), the two twenty-five year subperiods, and the five ten-year subperiods. Use the approach just outlined to test the following three implications of the CAPM:
1. The intercept is zero.
2. The market risk premium is positive.
3. $\beta$ completely captures the cross-sectional variation of expected returns. In this exercise, you can do this by conducting an individual hypothesis test on the coefficient of residual (i.e., idiosyncratic) variance.

What do you conclude?

<!--
BEGIN QUESTION
name: q3_a
manual: true
-->

<!-- END QUESTION -->
    
<!-- BEGIN QUESTION -->

**Question 3.b.**
For the overall period (1974-2023), the two twenty-five year subperiods, and the five ten-year subperiods. Use the approach just outlined to test the following three implications of the Fama-French APT:
1. The intercept is zero.
2. The factor risk premia are positive.
3. The factor loadings completely capture the cross-sectional variation of expected returns. In this exercise, you can do this by conducting an individual hypothesis test on the coefficient of residual (i.e., idiosyncratic) variance.

What do you conclude?

<!--
BEGIN QUESTION
name: q3_b
manual: true
-->

## Panel Data Approach

For the next two parts of the problem, instead of running period-by-period cross-sectional regressions (in the second step) you will use all of the data (within any given subperiod) in a single panel data regression using time fixed effects and within entity (i.e., portfolio) clustered standard errors.

<!-- END QUESTION -->
    
<!-- BEGIN QUESTION -->

**Question 3.c.**
For the overall period (1974-2023), the two twenty-five year subperiods, and the five ten-year subperiods. Use the approach just outlined to test the following three implications of the CAPM:
1. The intercept is zero.
2. The market risk premium is positive.
3. $\beta$ completely captures the cross-sectional variation of expected returns. In this exercise, you can do this by conducting an individual hypothesis test on the coefficient of residual (i.e., idiosyncratic) variance.

What do you conclude? Which approach do you prefer (i.e., period-by-period estimations or panel data)? Why?

<!--
BEGIN QUESTION
name: q3_c
manual: true
-->

<!-- END QUESTION -->
    
<!-- BEGIN QUESTION -->

**Question 3.d.**
For the overall period (1974-2023), the two twenty-five year subperiods, and the five ten-year subperiods. Use the approach just outlined to test the following three implications of the Fama-French APT:
1. The intercept is zero.
2. The factor risk premia are positive.
3. The factor loadings completely capture the cross-sectional variation of expected returns. In this exercise, you can do this by conducting an individual hypothesis test on the coefficient of residual (i.e., idiosyncratic) variance.

What do you conclude? Which approach do you prefer (i.e., period-by-period estimations or panel data)? Why?

<!--
BEGIN QUESTION
name: q3_d
manual: true
-->

## Problem 4. Constructing and Evaluating Portfolio Risk Forecasts

In this problem, you will construct and evaluate portfolio risk forecasts using both the CAPM and the Fama-French version of APT as the underlying factor models.


In [None]:
idxdata = pd.read_excel('ffdata1969-2023.xlsx')
idxdata.head(25)

In [None]:
ffmodel = False
numport = 100 # 25 or 100

if ffmodel:
    rawdata = pd.read_csv('monthly_ffm' + str(numport) + 'port_ffm1974-2023.csv')
else:
    rawdata = pd.read_csv('monthly_ffm' + str(numport) + 'port_beta1974-2023.csv')
rawdata = rawdata.drop('Unnamed: 0', axis=1)
rawdata.info()

In [None]:
begdt = np.array([19740101,19840101,19940101,20040101,20140101,19740101,19990101,19740101])
enddt = np.array([19831231,19931231,20031231,20131231,20231231,19981231,20231231,20231231])
pdnum = np.arange(1, len(begdt)+1, 1, dtype=int)

testpd = pd.DataFrame({'pdnum' : pdnum,
                       'begdt' : begdt,
                       'enddt' : enddt
                      })
npd = len(testpd)
npd

<!-- BEGIN QUESTION -->

**Question 4.a.**
We first work with CAPM as the underlying factor model. Hence, in this case we have a single factor model. Use the half-life approach (with $h$ = 3, 6, 12, $\infty$) outlined in class to calculate weighted means and covariance matrices. 

For each date from 19740131 through 20231231:
1. Use **all** of the index data **strictly prior** to that date and a given half-life to calculate the weighted mean and variance of the factor returns (i.e., the market excess return **vwrexc**).
2. Form an equal weighted portfolio of the (25 or 100) portfolios in the data set and using your results from step 1, estimate the variance and standard deviation (volatility). This is your portfolio risk forecast for the date (constructed using data observed **strictly prior** to that date).
3. Calculate the realized portfolio return and standardized return (realized return divided by forecast volatility). 

For the overall period (1974-2023), the two twenty-five year subperiods, and the five ten-year subperiods, estimate the bias statistic (standard deviation of the standardized realized returns) and compare it to 1. Recall that a bias statistic of 1 reflects an overall **directionally unbiased** risk forecast. If the bias statistic is greater than 1, this means you are systematically underestimating the portfolio risk, and a bias statistic less than 1 means you are systematically overestimating the portfolio risk. Based on your results, which half-life to gives the "best" risk forecasts overall? Explain.

For the overall period (1974-2023), and for each half-life, plot the bias statistics, along with their approximate 95% confidence intervals. Comment on the plots.

<!--
BEGIN QUESTION
name: q4_a
manual: true
-->

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 4.b.**
We can evaulate risk forecsts for **accuracy** via a liner regression method suggested by Mincer and Zarnowitz (1969, "The Evaluation of Economic Forecasts"). It is posited that realized portfolio return variance can be desricbed by the following model: $$ \tilde{\sigma}_t^2 = a + b \hat{\sigma}_t^2 + \epsilon_t $$ where $\hat{\sigma}_t^2$ (right-hand side) is forecast return variance and $\tilde{\sigma}_t^2$ (left-hand side) is realized return variance. However, realized return varaince is not observable **ex post**. Therefore, realized return variance is **proxied** by the squared (demeaned) realized return at time $t$. This is an unbiased, but noisy, proxy for the "true" variance at time $t$. Assuming this model of realized return variance is correct, the joint null hypothesis (that our forecast is accurate) is $H_0: a= 0, b=1$.

For the overall period (1974-2023), the two twenty-five year subperiods, and the five ten-year subperiods, formally test this null hypothesis (at the 5% level). Based on your results, which half-life to gives the "best" risk forecasts overall?

For the overall period (1974-2023), and for each half-life, plot the slope coefficient estimates, along with their approximate 95% confidence intervals. Comment on the plots.

<!--
BEGIN QUESTION
name: q4_b
manual: true
-->

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 4.c.**
Run the same analysis as in **4.a**, using the Fama-French version of APT as the underlying factor model.

Based on these results, which model (CAPM or Fama-French) do you prefer for risk forecasting? Why?

<!--
BEGIN QUESTION
name: q4_c
manual: true
-->

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 4.d.**
Run the same analysis as in **4.b**, using the Fama-French version of APT as the underlying factor model.

Based on these results, which model (CAPM or Fama-French) do you prefer for risk forecasting? Why?

<!--
BEGIN QUESTION
name: q4_d
manual: true
-->

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a pdf file for you to submit. **Please save before exporting!**

In [3]:
# Save your notebook first, then run this cell to export your submission.
grader.to_pdf(pagebreaks=False, display_link=True)