**Fin 585R**  
**Diether**  
**Problem Set**  
**Momentum Portfolios**  

**Overview**

In this problem set you reproduce your second seminal empirical result in academic finance. Specifically, you reproduce and extend (the original sample was about 1963 to 1990) **the momentum effect** of Jegadeesh and Titman (1993) (see "Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency"). This empirical result spawned a huge literature in academic finance, and has been a critical core strategy for quant hedge funds (and others) for the last 30 years. You will find out in the next couple of weeks that models like the CAPM can't explain this portfolio return pattern at all. 

Momentum portfolios are formed based on past returns. Specifically, momentum portfolios are most commonly formed based on the cumulative return from months $t-12$ to $t-2$ (you should use this past return window for your portfolios):

$$
r_{i,t-12:t-2} \approx \sum_{x=2}^{12} \log(1+r_{i,t-x})
$$

Note, it's common practice to cumulate (or compound) the returns using the log approximation (as above). You certainly can do the following if you want (well, not for this problem set ... use log returns for the problem set):

$$
r_{i,t-12:t-2} = \left[ \prod_{x=2}^{12} \bigl(1+r_{i,t-x} \bigr) \right]  - 1
$$

The log approximation is traditionally used in this situation because it's less computational intensive. 

The data for this problem set are monthly observations for all stocks on the NYSE, AMEX, and Nasdaq from July of 1962 to  September of 2022. You can download the data directly using the following link: [the data](https://diether.org/prephd/06-mstk_62-22.csv). There is also a link on *Learning Suite*. The data contain the following variables that you will need for the assignment (it also contains som additional variables):

|Variable | Description                                              |
|---------|----------------------------------------------------------|
|permno   | stock identifier                                         |
|caldt    | calendar date                                            |
|ticker   | ticker symbol                                            |
|prc      | stock price (not lagged, contemporaneous with returns)   |
|me       | market equity (not lagged, contemporaneous with returns) |
|ret      | monthly return                                           |
|shr      | shares outstanding in 1000s                              |


**Tasks**

1. Form quintile based equal-weight momentum portfolios and report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio). Note, you should exclude low price stocks from your portfolios (price below $5). We will discuss the code for creating the portfolio formation variable in the class before the assignment. <br><br>

2. Compute the average number of stocks that are in each portfolio.<br><br>

3. Add a spread portfolio (long portfolio 4 and short portfolio 0 $\leftarrow$ it's a zero cost L/S portfolio) to your dataframe of equal-weight momentum portfolios and then compute the summary statistics.<br><br>

4. Form quintile based value-weight momentum portfolios and report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio). You should once again have five portfolios (note, the only difference between your equal-weight and value-weight portfolios will be the weights). Note, a value weight portfolio is defined as the following ($me$ refers to the marke value of equity): <br><br>
$$
r_{pt} = \sum_{i=1}^{n} \omega_{i}r_{it} = \sum_{i=1}^{n} \left(\frac{me_{i,t-1}}{\sum_{j=1}^{n} me_{j,t-1}} \right) r_{it}
$$<br><br>
Hint: think about splitting the formula into the following parts delineated by the parentheses:<br><br>
\begin{align*}
r_{pt} &= \left( \frac{1}{\sum_{j=1}^{n} me_{j,t-1}} \right) \left( \sum_{i=1}^{n} me_{i,t-1} r_{it}
          \right)
 \end{align*}<br><br>
And then compute each part as a separate groupby. Finally, just multiple the resulting dataframes together and you will have computed the value-weight portfolio returns. <br><br>

In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv('06-mstk_62-23.csv',parse_dates=['caldt'])
df

**Portfolio Formation Variable: $r_{t-12,t-2}$**

In [None]:
df['logret'] = np.log(1 + df['ret'])
df['mom'] = df.groupby('permno')['logret'].rolling(11,11).sum().reset_index(drop=True)
df['mom'] = df.groupby('permno')['mom'].shift(2)
df.head(15)

**Lag Variables and Remove Low Priced Stocks**

In [None]:
df['prclag'] = df.groupby('permno')['prc'].shift()
df['melag'] = df.groupby('permno')['me'].shift(1)

df = df.query("mom == mom and prclag >= 5").reset_index(drop=True)
df

**Create Portfolio Bins**

In [None]:
df['bins'] = df.groupby('caldt')['mom'].transform(pd.qcut,5,labels=False)
df

**Task 1**

Form quintile based equal-weight momentum portfolios and report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio). <br><br>


In [None]:
ew = df.groupby(['caldt','bins'])['ret'].mean().unstack(level='bins')*100
ew

In [None]:
from finance_byu.summarize import summary
summary(ew).loc[['count','mean','std','tstat','pval']].round(3)

**Renaming the Columns**

+ Can do it compactly with string formatter or an fstring<br>

In [None]:
ew = (df.groupby(['caldt','bins'])['ret'].mean().unstack(level='bins')
      .rename('p{:.0f}'.format,axis='columns')*100)

ew

In [None]:
ew = (df.groupby(['caldt','bins'])['ret'].mean().unstack(level='bins')
      .rename(lambda x: f'p{x:.0f}',axis='columns')*100)    
ew

In [None]:
summary(ew).loc[['count','mean','std','tstat','pval']].round(3)

**Task 2**

Compute the average number of stocks that are in each portfolio.<br><br>

In [None]:
(df.groupby(['caldt','bins'])['ret'].count().unstack(level='bins')
      .rename('p{:.0f}'.format,axis='columns'))


In [None]:
(df.groupby(['caldt','bins'])['ret'].count().unstack(level='bins')
      .rename('p{:.0f}'.format,axis='columns')).mean()

**Task 3**

Add a spread portfolio (long portfolio 4 and short portfolio 0 $\leftarrow$ it's a zero cost L/S portfolio) to your dataframe of equal-weight momentum portfolios and then compute the summary statistics.<br><br>

In [None]:
ew['spread'] = ew['p4'] - ew['p0']

summary(ew).loc[['count','mean','std','tstat','pval']].round(3)

**Task 4**

Create value-weight momentum portfolios.<br><br>
\begin{align*}
r_{pt} &= \left( \frac{1}{\sum_{j=1}^{n} me_{j,t-1}} \right) \left( \sum_{i=1}^{n} me_{i,t-1} r_{it}
          \right)
 \end{align*}<br><br>

In [None]:
mcapsum = df.groupby(['caldt','bins'])['melag'].sum()

df['rme'] = df['ret']*df['melag']
vw = df.groupby(['caldt','bins'])['rme'].sum() / mcapsum
vw

In [None]:
vw = vw.unstack(level='bins').rename('p{:.0f}'.format,axis='columns')*100
vw

In [None]:
vw['spread'] = vw['p4'] - vw['p0']

summary(vw).loc[['count','mean','std','tstat','pval']].round(3)

<br>

**Alternate Value-Weighting Method**

+ Conceptually Straight Forward, but Relatively Slow Execution.<br><br>

+ I think the conceptually easiest way to create these value weight portfolios is to essentially compute the weights, multiple the weights, and then sum everything up in one function as part of a `groupby` call. The function in this case will have a pretty much direct correspondence with the mathematical formula:
<br><br>
\begin{align*}
r_{pt} &= \sum_{i=1}^{n} \left(\frac{me_{i,t-1}}{\sum_{j=1}^{n} me_{j,t-1}} \right) r_{it}
\end{align*}<br><br>

+ We will use `apply` instead of `transform` or `aggregate` because we are going to send in an Nx2 dataframe (returns and lagged market-cap) and then for each date/port group we are going to return a scalar portfolio return (so the dimensionality of the return is 1x1).<br><br>

In [None]:
from tqdm.auto import tqdm
tqdm.pandas()

def vw_port(x):
    wtotal = x['melag'].sum()
    return ((x['melag']/wtotal)*x['ret']).sum()

vw = df.groupby(['caldt','bins'])[['ret','melag']].progress_apply(vw_port)
vw = vw.unstack(level='bins')*100
vw

In [None]:
summary(vw).loc[['count','mean','std','tstat','pval']].round(3)

**C. A Little Speed Testing**

Jupyter notebooks (really the underlying IPython kernel) make it easy to test the speed of different approaches. IPython has a magic function named `%timeit` that tests the speed of a function or single line of code. We can use `%timeit` to compare the speed of the two approaches for creating value-weight portfolios.

In [None]:
def vw_one():
    df['rme'] = df['ret']*df['melag']
    vw = df.groupby(['caldt','bins'])['rme'].sum() / df.groupby(['caldt','bins'])['melag'].sum()
    
%timeit -n 1 -r 3 vw_one()

In [None]:
def vw_two():
    vw = df.groupby(['caldt','bins'])[['ret','melag']].apply(vw_port)

%timeit -n 1 -r 3 vw_two()

**Why is the first approach faster?**

The first approach is much more efficient (fast) because it relies only on native `pandas'` methods. In general, relying as much as you can on methods from the groupby object that `pandas` provides natively will result in faster code. The reason for the faster speed is because most of `pandas'` methods and functions are written in the `C` programming language, and the C programming language is very fast. Technically most of `pandas'` functions and methods are written in a language called `Cython` which can be translated or "compiled" into `C`. When you supply a custom function to `pandas` (even a simple one like `vw_weight`) `pandas` usually has to execute much more of the process in `python` rather than `C`.

Also, any operations you can compute on a whole column at once rather than as part of a function called by a `groupby` will be faster (we did that with one operation above).