In [None]:
pip install otter-grader

In [None]:
pip install openpyxl

In [1]:
# Initialize Otter
# If you need to install Otter, please uncomment and run the previous cell
import otter
grader = otter.Notebook("ps4.ipynb")

# Econ 144 – Problem Set 4

In this problem set, you will conduct an event study analysis, construct mean-variance efficient frontiers, and empirically test the CAPM. 

Throughout the entire problem set, please feel free to add code and markdown cells as you need them.

In [None]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from scipy.stats import f
from scipy.stats import chi2 

## Problem 1. Quarterly Earnings Announcements

In this problem, we conduct an event study using quarterly earnings announcements. Our goal is to analyze the information content of quarterly earnings announcements for companies in the DJIA from September 2018 to August 2023.

The dataset contains quarterly earnings announcement data from  29/30 of the DJIA stocks for the 20 quarters in the sample period, yielding a total of 580 announcements.

If earnings announcements convey information to investors, we expect the impact on the market's valuation of a company to depend on the magnitude of the **unexpected** component of the announcement. Hence, we need to measure the deviation of actual earnings from the market's expectation of earnings.

As a proxy for the market's expectation of earnings, we will use earnings forecasts from the Institutional Brokers' Estimate System (IBES). IBES compiles earnings forecasts from analysts for a large number of companies.
Our proxy will be the **average** of all of the analyst forecasts for a particular company in a given quarter.

The IBES data is in the file `IBES_EPS.xlsx`. The variables wil will use in the analysis are the following:

1. **oftic** -- the ticker symbol for the companies in the dataset.
2. **ACTDATS_ACT** -- the date the actual earnings were recorded. This can be different than **ANNDATS_ACT** (the announcement date for actual earnings). If **ANNDATS_ACT** is on a weekend or holiday, then **ACTDATS_ACT** is the next day the market is open (we will use this date as our announcement date).
3. **VALUE** -- the earnings forecast made by an analyst (individual analysts are identified by their **ESIMATOR** number). Earnings **forecasts** are announced on **ANNDATS** and recorded on **ACTDATS**.
4. **ACTUAL** -- actual earnings (announced on **ANNDATS_ACT** and recorded on **ACTDATS_ACT**).

If you are interested in more information on the IBES data, please see the file `IBES_RUI.pptx`.

In [None]:
rawdata = pd.read_excel('IBES_EPS.xlsx')
rawdata = rawdata.dropna(subset=['ACTDATS_ACT']).reset_index() # drop rows with NaN values
rawdata.head()

In [None]:
rawdata.info()

In [None]:
rawdjia = pd.read_excel('djia_dlyreturns.xlsx')
rawdjia.head()

<!-- BEGIN QUESTION -->

**Question 1.a.**
Construct a dataframe with 4 columns: **TICKER**, **ANNDAT**, **FORECAST**, **ACTUAL**. There should be a row for each unique combination of **oftic** (which becomes **TICKER** in the new dataframe) and **ACTDATS_ACT** (which becomes **ANNDAT** in the new dataframe) in `rawdata`. **FORECAST** should be set equal to the **average** forecast for that **TICKER** and **ANNDAT**, **ACTUAL** should be set equal to the actual earnings for that **TICKER** and **ANNDAT**. Your new dataframe should contain 580 rows (29 stocks x 20 quarters).



<!--
BEGIN QUESTION
name: q1_a
manual: true
-->

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.b.**
Construct a dataframe with 5 columns: **ANNDAT**, **ESTWIN1**, **ESTWIN2**, **EVWIN1**, **EVWIN2**. There should be a row for each unique **ACTDATS_ACT** (which becomes **ANNDAT** in the new dataframe) in `rawdata`. Relative to **ANNDAT**, **ESTWIN1** is the first date in the estimation window, **ESTWIN2** is the last date in the estimation window, **EVWIN1** is the first date in the event window, and **EVWIN2** is the last date in the event window. Hint, you may want to use the dates in the daily returns file to help you here -- the estimation and event windows are defined in terms of **trading** days. Your new dataframe should contain 344 rows (there are 344 unique announcement dates in `rawdata`).

<!--
BEGIN QUESTION
name: q1_b
manual: true
-->

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.c.**
Augment your dataframe from **1.a** with estimates of $\alpha$, $\beta$ (market model), and $\mu$ (constant mean return model). For each **TICKER** and **ANNDAT** use the return data in the estimation window for your estimates. Further, categorize each of the announcements as good news (**GOODNEWS**), bad news (**BADNEWS**), or no news (**NONEWS**) -- see class notes.

<!--
BEGIN QUESTION
name: q1_c
manual: true
-->


<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.d.**
Create a new dataframe that contains the normal and abnormal return estimates (under each of our three normal return models) for each **TICKER/ANNDAT** in your dataframe from **1.c** and for each day in the event window. This dataframe should have 23,780 rows (580 **TICKER/ANNDAT**'s x 41 days in the event window).

<!--
BEGIN QUESTION
name: q1_d
manual: true
-->

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.e.**
For each normal return model and each category of announcement. Run a Sharp RD, where the **cumulative abnormal return (CAR)** over the event window is the dependent variable. Run the regression allowing for the possibility of a different **linear** trend before and after the announcement date. You will need to run 9 separate regressions for each combination of normal return model (constant mean, market, market return as proxy) and anncouncement category (good news, bad news, and no news). Your results should match (or very closely resemble) the results in `event_study_figs.pdf` (see class notes folder on bCourses).

<!--
BEGIN QUESTION
name: q1_e
manual: true
-->

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.f.**
Reproduce the result tables (one for each of the normal return models) in `event_study_figs.pdf`. You do not need to reproduce the table exactly as presented on the slide, but you should display the numerical results in some readable fashion. In your submission, you must include the code you used to do this.

<!--
BEGIN QUESTION
name: q1_f
manual: true
-->

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.g.**
Finally, reproduce the plots (one for each of the normal return models) in `event_study_figs.pdf` and briefly summarize your overall findings.

<!--
BEGIN QUESTION
name: q1_g
manual: true
-->

## Problem 2. Mean-Variance Efficient Frontiers

Based on the notes in `efficientset_mathematics.pdf` and the weekly return data for 26 stocks in the DJIA (over the period 1988 to 2023), follow the series of steps outlined below to reproduce the following plot:

<img src="mvfront.png" width="400"/>

The plot is based on annualized returns (in whole %). Since this is weekly data, the annualization factor is 52 for returns and variances (the square root of 52 for standard deviations). The horizontal axis is standard deviation and the vertical axis is mean return.


In [None]:
rawdjia = pd.read_excel('djia_weeklyret1988-2023.xlsx')
rawdjia.tail()

<!-- BEGIN QUESTION -->

**Question 2.a.**
Recall that for any target portoflio return $\mu_p$, we can write the variance minimizing portfolio weights as $$w_p = g + h \mu_p$$ Determine $g$ and $h$.



<!--
BEGIN QUESTION
name: q2_a
manual: true
-->

<!-- END QUESTION -->
    
<!-- BEGIN QUESTION -->

**Question 2.b.**
Determine the portfolio weights for the global minimum variance portfolio. What is the standard deviation and mean return of the global minimm variance portfolio?



<!--
BEGIN QUESTION
name: q2_b
manual: true
-->

<!-- END QUESTION -->
    
<!-- BEGIN QUESTION -->

**Question 2.c.**
Using your vectors $g$ and $h$ from **2.a**, let $\mu_p$ range from 0 to 0.01, in increments of 0.00001, plot the resulting minimum variance frontier of risky assets (in mean return standard deviation space).

**Note**: do the calculations using weekly returns, then annualize and multiply by 100 for plotting.

<!--
BEGIN QUESTION
name: q2_c
manual: true
-->

<!-- END QUESTION -->
    
<!-- BEGIN QUESTION -->

**Question 2.d.**
Now introduce a risk-free asset with net (annual) simple return of $R_f = 0.04$ (i.e., net (weekly) simple return of $R_f = 0.04/52$). Find the portfolio weights for the tangency portfolio. What is the standard deviation and mean return of the tangency portoflio? Identify the tangency portfolio on your plot from **2.c**.

<!--
BEGIN QUESTION
name: q2_d
manual: true
-->

<!-- END QUESTION -->
    
<!-- BEGIN QUESTION -->

**Question 2.e.**
Let $\mu_p$ range from 0.04/52 to 0.01, in increments of 0.000001 and plot the resulting global efficient frontier on your plot from **2.c**.

<!--
BEGIN QUESTION
name: q2_e
manual: true
-->

<!-- END QUESTION -->
    
<!-- BEGIN QUESTION -->

**Question 2.f.**
Determine the zero-covariance portfolio weights for the tangency portfolio (i.e., the weights of the portfolio that has zero-covariance with the tangency portfolio). What is the standard deviation and mean return of this portfolio? Identify this portfolio on your plot from **2.c**. Verify that the covariance of this portfolio and the tangency portfolio is indeed zero.

<!--
BEGIN QUESTION
name: q2_f
manual: true
-->

<!-- END QUESTION -->
    
<!-- BEGIN QUESTION -->

**Question 2.g.**
What is the Sharpe ratio of the tangency portfolio? Calculate the Sharpe ratios for all of the portfolios from **2.c**. Verify that they are all less than or equal to the Sharpe ratio of the tangency portfolio.

<!--
BEGIN QUESTION
name: q2_g
manual: true
-->

## Problem 3. Testing the CAPM

In this problem, you will conduct an emprical test of the Sharpe-Lintner version of the CAPM. The test is based on monthly return data for ten portfolios over the period 1974 to 2023 (i.e., 50 years of monthly data). Stock listed on the New York Stock Exchange (NYSE), American Stock Exchange (ASE), and the NASDAQ, are allocated to the portfolios based on market capitalization and are value-weighted within the portfolio. The CRSP value-weighted index (**vwretd**) is used as a proxy for the market portfolio, and the 30-day T-Bill return (**t30ret**) is used for the risk-free return. 

You will conduct tests for the overall period, two twenty-five year subperiods, and five ten-year subperiods.


In [None]:
rwidata = pd.read_excel('monthlyindex1974-2023.xlsx')
rwidata.head()

In [None]:
rawdata = pd.read_excel('monthly10port1974-2023.xlsx')
rawdata.head()

In [None]:
# set up the beginning end ending dates for the study periods
begdt = np.array([19740101,19840101,19940101,20040101,20140101,19740101,19990101,19740101])
enddt = np.array([19831231,19931231,20031231,20131231,20231231,19981231,20231231,20231231])
pdnum = np.arange(1, len(begdt)+1, 1, dtype=int)

testpd = pd.DataFrame({'pdnum' : pdnum,
                       'begdt' : begdt,
                       'enddt' : enddt
                      })
npd = len(testpd)
npd

<!-- BEGIN QUESTION -->

**Question 3.a.**
Using the entire 50-year sample, regress excess returns of each portfolio on the excess (value-weighted) market return and perform tests (at the 1% level of significance) that the intercept is zero. For each portfolio, report the point estimates (of the intercept), $t$-statistics, and whether or not you reject the CAPM.



<!--
BEGIN QUESTION
name: q3_a
manual: true
-->

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 3.b.**
For each of the two 25 year subperiods, regress excess returns of each portfolio on the excess (value-weighted) market return and perform tests (at the 1% level of significance) that the intercept is zero. For each portfolio, report the point estimates (of the intercept), $t$-statistics, and whether or not you reject the CAPM in each subperiod.



<!--
BEGIN QUESTION
name: q3_b
manual: true
-->

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 3.c.**
For the entire 50-year period, each of the 25-year subperiods, and each of the 10-year subperiods, jointly test (at the 1% level of significance) that the intercepts for all ten portfolios are zero using the $F$-test statistic $J_1$. For the overall period and each subperiod, report the $F$-test statistic $J_1$ and their $p$-values, and whether or not you reject the CAPM in the overall period and each subperiod.

<!--
BEGIN QUESTION
name: q3_c
manual: true
-->

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a pdf file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.to_pdf(pagebreaks=False, display_link=True)