**Fin 585**  
**Diether**  
**Problem Set**  
**Momentum Portfolios**  

**1 Overview**

In this problem set you reproduce your second important empirical result in academic finance. Specifically, you reproduce and extend (the original sample was about 1963 to 1990) **the momentum effect** of Jegadeesh and Titman (1993) (see "Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency"). This empirical result spawned a huge literature in academic finance, and has been a critical core strategy for quant hedge funds (and others) for the last 30 years. You will find out in the next couple of weeks that models like the CAPM can't explain this portfolio return pattern at all. 

Momentum portfolios are formed based on past returns. Specifically, momentum portfolios are most commonly formed based on the cumulative return from months $t-12$ to $t-2$ (you should use this past return window for your portfolios):
$$
r_{i,t-12:t-2} \approx \sum_{x=2}^{12} \log(1+r_{i,t-x})
$$
Note, it's common practice to cumulate (or compound) the returns using the log approximation (as above). You certainly can do the following if you want (well, not for this problem set ... use log returns for the problem set):
$$
r_{i,t-12:t-2} = \left[ \prod_{x=2}^{12} \bigl(1+r_{i,t-x} \bigr) \right]  - 1
$$
The log approximation was initially used because it was less computational intensive, and a little easier to program. It's not an issue now, but the convention stuck.  

Data for this problem set are monthly observations for all stocks on the NYSE, AMEX, and Nasdaq from July of 1962 to September of 2024. You can download the data directly using the following link: [data ](https://diether.org/prephd/06-mstk_62-24.csv). There is also a link on *Learning Suite*. The data contain the following variables that you will need for the assignment:

|Variable | Description                                              |
|---------|----------------------------------------------------------|
|permno   | stock identifier                                         |
|caldt    | calendar date                                            |
|ticker   | ticker symbol                                            |
|prc      | stock price (not lagged, contemporaneous with returns)   |
|me       | market equity (not lagged, contemporaneous with returns) |
|ret      | monthly return                                           |
|shr      | shares outstanding in 1000s                              |


**2 Tasks**

1. Form quintile based equal-weight momentum portfolios and report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio). Note, you should exclude low price stocks from your portfolios (price below $5). We will discuss the code for creating the portfolio formation variable in the class before the assignment.

2. Compute the average number of stocks that are in each portfolio.

3. Add a spread portfolio (100% long portfolio 4 and 100% short portfolio 0 $\leftarrow$ it's a zero cost long/short (L/S) portfolio) to your dataframe of equal-weight momentum portfolios and then compute the summary statistics.

4. Form quintile based value-weight momentum portfolios and report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio). You should once again have five portfolios. The only difference between your equal-weight and value-weight portfolios will be the weights. A value weight portfolio is defined as the following ($me$ refers to the market value of equity):
$$
r_{pt} = \sum_{i=1}^{n} \omega_{i}r_{it} = \sum_{i=1}^{n} \left(\frac{me_{i,t-1}}{\sum_{j=1}^{n} me_{j,t-1}} \right) r_{it}
$$
Hint: think about splitting the formula into the following parts delineated by the parentheses:
\begin{align*}
r_{pt} &= \left( \frac{1}{\sum_{j=1}^{n} me_{j,t-1}} \right) \left( \sum_{i=1}^{n} me_{i,t-1} r_{it}
          \right)
 \end{align*}
And then compute each part as a separate groupby. Finally, just multiple the resulting dataframes together and you will have computed the value-weight portfolio returns. 

In [None]:
import pandas as pd
import numpy as np

In [None]:
df = pd.read_csv('06-mstk_62-24.csv',parse_dates=['caldt'])
df

<br>

**3 Portfolio Formation Variable: $r_{t-12,t-2}$**

In [None]:
df['logret'] = np.log(1 + df['ret'])
df['mom'] = df.groupby('permno')['logret'].rolling(11,11).sum().reset_index(drop=True)
df['mom'] = df.groupby('permno')['mom'].shift(2)

df['prclag'] = df.groupby('permno')['prc'].shift()
df['melag'] = df.groupby('permno')['me'].shift(1)

df = df.query("mom == mom and prclag >= 5").reset_index(drop=True)
df

<br>

**4 Create Portfolio Bins**

In [None]:
df['bins'] = df.groupby('caldt')['mom'].transform(pd.qcut,5,labels=False)
df

<br>

**5 Finish Portfolio Construction and do Tasks**

**Task 1**

Form quintile based equal-weight momentum portfolios and report summary statistics (including a t-test one the mean). 

In [None]:
ew = df.groupby(['caldt','bins'])['ret'].mean().unstack(level='bins')*100
ew

In [None]:
from finance_byu.summarize import summary
summary(ew).loc[['count','mean','std','tstat','pval']].round(3)

+ **Extra: renaming the portfolio columns**

+ Can do it compactly with string formatter or an fstring.

+ Numeric column names can be annoying to work with; I change them to p0, p1, etc.

In [None]:
ew = (df.groupby(['caldt','bins'])['ret'].mean().unstack(level='bins')
      .rename('p{:.0f}'.format,axis='columns')*100)
ew

In [None]:
ew = (df.groupby(['caldt','bins'])['ret'].mean().unstack(level='bins')
      .rename(lambda x: f'p{x:.0f}',axis='columns')*100)    
ew

In [None]:
summary(ew).loc[['count','mean','std','tstat','pval']].round(3)

<br>

**Task 2**

Compute the average number of stocks that are in each portfolio.

In [None]:
(df.groupby(['caldt','bins'])['ret'].count().unstack(level='bins')
      .rename('p{:.0f}'.format,axis='columns'))

In [None]:
(df.groupby(['caldt','bins'])['ret'].count().unstack(level='bins')
      .rename('p{:.0f}'.format,axis='columns')).mean().round(1)

<br>

**Task 3**

Add a spread portfolio that goes long portfolio 4 and short portfolio 0 $\leftarrow$ it's a zero cost L/S portfolio.

In [None]:
ew['spread'] = ew['p4'] - ew['p0']

summary(ew).loc[['count','mean','std','tstat']].round(3)

<br>

**Task 4**

Create value-weight momentum portfolios.
\begin{align*}
r_{pt} &= \sum_{i=1}^{n} \left(\frac{me_{i,t-1}}{\sum_{j=1}^{n} me_{j,t-1}} \right) r_{it} \\[1.05ex]
       &= \left( \frac{1}{\sum_{j=1}^{n} me_{j,t-1}} \right) \left( \sum_{i=1}^{n} me_{i,t-1} r_{it}
          \right)
\end{align*}

In [None]:
mcapsum = df.groupby(['caldt','bins'])['melag'].sum()

df['rme'] = df['ret']*df['melag']
vw = df.groupby(['caldt','bins'])['rme'].sum() / mcapsum
vw

In [None]:
vw = vw.unstack(level='bins').rename('p{:.0f}'.format,axis='columns')*100
vw

In [None]:
vw['spread'] = vw['p4'] - vw['p0']
summary(vw).loc[['count','mean','std','tstat','pval']].round(3)

<br>

**6. Alternate Value-Weighting Method**

+ Conceptually straight forward, but relatively slow execution.

+ Write a function using the logic of the way the value-weight formula is normally written.
\begin{align*}
r_{pt} &= \sum_{i=1}^{n} \left(\frac{me_{i,t-1}}{\sum_{j=1}^{n} me_{j,t-1}} \right) r_{it}
\end{align*}

+ Logic $\rightarrow$ compute the weights, multiple the weights and returns together, and then sum everything up. $\leftarrow$ this function is called by the `groupby`.

+ The function in this case will have a close correspondence with the way we most naturally write the portfolio return formula.

+ Coding note: need to use `apply` with the `groupby` instead of `transform` or `aggregate`.

  - Why? Sending in an Nx2 dataframe (need returns and lagged market-cap) and then for each date/bin group we are going to return a scalar portfolio return (1x1).

  - It's a type of aggregation but the `aggregate` method expects an Nx1 vector and returns a 1x1.

In [None]:
from tqdm.auto import tqdm
tqdm.pandas()

def vw_port(x):
    wtotal = x['melag'].sum()
    return ((x['melag']/wtotal)*x['ret']).sum()

vw = df.groupby(['caldt','bins'])[['ret','melag']].progress_apply(vw_port)
vw = vw.unstack(level='bins')*100
vw

In [None]:
summary(vw).loc[['count','mean','std','tstat','pval']].round(3)

<br>

**7 A Little Speed Testing**

+ The Jupyter notebook environment contain a special function for testing the speed of code $\rightarrow$ `%timeit`

+ In a normal Python environment, you can use the `time` module.

+ But given we're using a notebook, so let's use `%timeit` to compare the speed of the two approaches for creating value-weight portfolios.

In [None]:
def vw_one():
    df['rme'] = df['ret']*df['melag']
    vw = df.groupby(['caldt','bins'])['rme'].sum() / df.groupby(['caldt','bins'])['melag'].sum()
    
%timeit -n 1 -r 3 vw_one()

In [None]:
def vw_two():
    vw = df.groupby(['caldt','bins'])[['ret','melag']].apply(vw_port)

%timeit -n 1 -r 3 vw_two()

<br>

**7.1 Why is the First Approach Faster?**

+ The first approach is more efficient (fast) because it relies solely on native `pandas'` methods.

+ Relying as much as you can on built in Pandas' methods will generally result in faster code.

+ The speed advantage comes from the built in methods and functions being written in the `C` programming language.

+ Technically most of `pandas'` functions and methods are written in a language called `Cython`. It is easy to convert `Cython` into `C`.

+ When you supply a custom function to `Pandas` (even a simple one like `vw_weight`), `Pandas` usually has to execute more of the process in `Python` rather than `C`.

+ Also, any operations you can compute on a whole column at once rather than as part of a function called by a `groupby` will be faster (we did that with one operation above).<br><br>


**7.2 Extra: Using the Time Module**

In [None]:
import time

start_time = time.time()
for r in range(3): vw_one()
end_time = time.time()

print(f"Average execution: {(end_time-start_time)/3:.3f} seconds\n")