**Fin 585R**  
**Diether**  
**Problem Set**  
**Dispersion Portfolios**  

**Purpose/Goal**

The purpose of this problem set is to give you a portfolio formation task that makes you go through the first four steps of our portfolio formation framework.

1. Data Preparation.<br><br>

2. Create portfolio formation or criterion variable.<br><br>

3. Bin the data based on the formation variable.<br><br>

4. Portfolio creeation using the bins.<br><br>

5. Testing the historical performance.<br><br>

You should be able to adapt a lot of code we've used before, and apply it this situation.

**Overview**

In this problem set you reproduce your second seminal empirical result in academic finance. Specifically, you reproduce the **dispersion effect** (or the analyst disgreement effect) of Diether, Malloy, and Scherbina (2002). This empirical result spawned a large literature in academic finance, and certainly some quant funds have tried to trade on this effect.

Dispersion (or analyst disagreement) portfolios are formed based on the standard deviation of analyst eps (earnings per share) forecasts over a given period. Here the standard deviation of analyst eps forecasts is the standard deviation across analysts for a given stock and month. Diether, Malloy, and Scherbina don't use raw standard deviation. Instead, they scale standard deviation of analyst forecasts by the absolute value of the mean forecast. Therefore for a given month ($t$), dispersion for stock $i$ is defined as the following:

$$
disp_{it} = \frac{stdev_{it}}{|mean_{it}|}
$$

DMS form dispersion portfolios using $disp_{i,t-1}$; in other words, they lag dispersion one month. In this homework you will do the same. Additionally, you will form dispersion portfolio based on laggin dispersion 3 months.

There are two datasets for this problem set. The first is the CRSP data (security prices and returns) during the period from July of 1982 to December of 2000. The second is the analyst earnings per share data from IBES. It also covers the period of July of 1982 to December of 2000. The frequency for both datasets is monthly. The stock level identifier in the IBES data is called a CUSIP. Consequently, I also included CUSIPs in the CRSP data. The CUSIP and the calender month uniquely identify the analyst earnings per share observations.

You can download the CRSP data directly using the following link: [the CRSP data](http://diether.org/prephd/08-mstk_82-00.csv). There is also a link on *Learning Suite*. The data contain the following variables:

|Variable | Description                                              |
|---------|----------------------------------------------------------|
|permno   | stock identifier                                         |
|cusip    | stock identifier also in IBES data                       |
|caldt    | calendar date (the day is not truncated to 1)            |
|ret      | monthly return                                           |
|prc      | stock price (not lagged, contemporaneous with returns)   |   


You can download the IBES data directly using the following link: [the IBES data](http://diether.org/prephd/08-ibes_eps_analyst.csv). There is also a link on *Learning Suite*. The data contain the following variables:

|Variable | Description                                          |
|---------|------------------------------------------------------|
|cusip    | stock identifier also in IBES data                   |
|caldt    | calendar date (the day is not truncated to 1)        |
|meanest  | average analyst forecast for that month/stock        |
|stdev    | standard deviation of forecasts for that month/stock |


**Tasks**

1. Form quintile based equal-weight dispersion portfolios where dispersion is lagged one month. Report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio). Note, you should exclude low price stocks from your portfolios (price below $5). <br><br>

2. Add a spread portfolio to your dataframe of dispersion portfolios. Report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio).<br><br>

3. Compute the average number of stocks that are in each portfolio.<br><br>

4. Form quintile based equal-weight dispersion portfolios where dispersion is lagged three month instead of one.  Report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio). Note, you should exclude low price stocks from your portfolios (price below $5).<br><br>

5. Compare the results from (1) and (4). What do either the differences or similarities in the average return pattern tell you about the nature of this dispersion effect?

In [None]:
import pandas as pd
import numpy as np

In [None]:
stk = pd.read_csv('08-mstk_82-00.csv',parse_dates=['caldt'])
stk.head(5)

In [None]:
ibes = pd.read_csv("08-ibes_eps_analyst.csv",parse_dates=['caldt'])
ibes.head(5)

**Hint About Merging the two Datasets**

In the datasets I've include the full calender dates of the observations. Even though the frequency for both is monthly, the timing is not the same. The CRSP data is from the last trading day in the month, and the IBES data tends to be around the middle of the month. Therefore, to merge these dataframes you need to ctreate a new date variable that only preserve uniqueness at the year-month level. Here is a shortcut way to accomplish that:

In [None]:
stk['mdt'] = stk['caldt'].values.astype('datetime64[M]')
stk.head(5)

In [None]:
ibes['mdt'] = ibes['caldt'].values.astype('datetime64[M]')
ibes.head(5)

What is the code above doing? Pandas stores all dates with precision to the nanosecond. But numpy (the library pandas uses for its date functionality) actually includes date types for varying levels of precision (including monthly). So the above code changes the original nanosecond datetype to a monthly datetype; this causes all the information about time beyond a month to be lost and when pandas automatically reconverts the date to a nanosecond datetype the day gets set equal to one for all observations.

Now you should be able to merge the two datasets. I suggest you merge the ibes dataframe into the stk dataframe using a left merge.

In [None]:
ibes = ibes.drop(columns=['caldt'])

stk = stk.merge(ibes,on=['cusip','mdt'],how='left')
stk.head(5)

**Create Dispersion Variable and Lagged Variables**


In [None]:
stk['disp'] = stk['stdev'] / np.abs(stk['meanest'])

stk['displag'] = stk.groupby('permno')['disp'].shift()
stk['displag3'] = stk.groupby('permno')['disp'].shift(3)
stk['prclag'] = stk.groupby('permno')['prc'].shift()

stk.head()

**Task 1 and 2**

+ Form equal-weight portfolios based on lagged dispersion.<br><br>

+ Add a spread portfolio. Report summary statistics.<br><br>


**Working off a new copy of the data**

I'm going to work off sub-selection of the data I have the non-queried/selected data to go back to when I to the lagged three months dispersion based portfolios. 

In [None]:
df = stk.query("displag == displag and prclag >= 5").reset_index(drop=True)
df.head()

In [None]:
df['bins'] = df.groupby('caldt')['displag'].transform(pd.qcut,5,labels=False)

ew = (df.groupby(['caldt','bins'])['ret'].mean().unstack(level='bins')
      .rename('p{:.0f}'.format,axis='columns')*100)
ew.head()

In [None]:
from finance_byu.summarize import summary

ew['spr'] = ew['p0'] - ew['p4']
summary(ew).loc[['count','mean','std','tstat','pval'],].round(3)

**Task 3**

+ Compute the average number of stocks in the portfolios.<br><br>

In [None]:
df.groupby(['caldt','port'])['ret'].count().unstack(level='port').mean()

**Task 4**

+ Form equal-weight portfolios based on three month lagged dispersion.<br><br>

In [None]:
df = stk.query("displag3 == displag3 and prclag >= 5").reset_index(drop=True)

df['bins3'] = df.groupby('caldt')['displag3'].transform(pd.qcut,5,labels=False)

ew = (df.groupby(['caldt','bins3'])['ret'].mean().unstack(level='bins3')
      .rename('p{:.0f}'.format,axis='columns')*100 )

In [None]:
ew['spr'] = ew['p0'] - ew['p4']
summary(ew).loc[['count','mean','std','tstat','pval'],].round(3)