**Fin 585**  
**Diether**  
**Problem Set**  
**Momentum Portfolios**  

**1 Overview**

In this problem set you reproduce your second important empirical result in academic finance. Specifically, you reproduce and extend (the original sample was about 1963 to 1990) **the momentum effect** of Jegadeesh and Titman (1993) (see "Returns to Buying Winners and Selling Losers: Implications for Stock Market Efficiency"). This empirical result spawned a huge literature in academic finance, and has been a critical core strategy for quant hedge funds (and others) for the last 30 years. You will find out in the next couple of weeks that models like the CAPM can't explain this portfolio return pattern at all. 

Momentum portfolios are formed based on past returns. Specifically, momentum portfolios are most commonly formed based on the cumulative return from months $t-12$ to $t-2$ (you should use this past return window for your portfolios):
$$
r_{i,t-12:t-2} \approx \sum_{x=2}^{12} \log(1+r_{i,t-x})
$$
Note, it's common practice to cumulate (or compound) the returns using the log approximation (as above). You certainly can do the following if you want (well, not for this problem set ... use log returns for the problem set):
$$
r_{i,t-12:t-2} = \left[ \prod_{x=2}^{12} \bigl(1+r_{i,t-x} \bigr) \right]  - 1
$$
The log approximation was initially used because it was less computational intensive, and a little easier to program. It's not an issue now, but the convention stuck.  

Data for this problem set are monthly observations for all stocks on the NYSE, AMEX, and Nasdaq from July of 1962 to September of 2024. You can download the data directly using the following link: [data ](https://diether.org/prephd/06-mstk_62-24.csv). There is also a link on *Learning Suite*. The data contain the following variables that you will need for the assignment:

|Variable | Description                                              |
|---------|----------------------------------------------------------|
|permno   | stock identifier                                         |
|caldt    | calendar date                                            |
|ticker   | ticker symbol                                            |
|prc      | stock price (not lagged, contemporaneous with returns)   |
|me       | market equity (not lagged, contemporaneous with returns) |
|ret      | monthly return                                           |
|shr      | shares outstanding in 1000s                              |


**2 Tasks**

1. Form quintile based equal-weight momentum portfolios and report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio). Note, you should exclude low price stocks from your portfolios (price below $5). We will discuss the code for creating the portfolio formation variable in the class before the assignment.

2. Compute the average number of stocks that are in each portfolio.

3. Add a spread portfolio (100% long portfolio 4 and 100% short portfolio 0 $\leftarrow$ it's a zero cost long/short (L/S) portfolio) to your dataframe of equal-weight momentum portfolios and then compute the summary statistics.

4. Form quintile based value-weight momentum portfolios and report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio). You should once again have five portfolios. The only difference between your equal-weight and value-weight portfolios will be the weights. A value weight portfolio is defined as the following ($me$ refers to the market value of equity):
$$
r_{pt} = \sum_{i=1}^{n} \omega_{i}r_{it} = \sum_{i=1}^{n} \left(\frac{me_{i,t-1}}{\sum_{j=1}^{n} me_{j,t-1}} \right) r_{it}
$$
Hint: think about splitting the formula into the following parts delineated by the parentheses:
\begin{align*}
r_{pt} &= \left( \frac{1}{\sum_{j=1}^{n} me_{j,t-1}} \right) \left( \sum_{i=1}^{n} me_{i,t-1} r_{it}
          \right)
 \end{align*}
And then compute each part as a separate groupby. Finally, just multiple the resulting dataframes together and you will have computed the value-weight portfolio returns. 

In [1]:
import pandas as pd
import numpy as np

<br>

**3 Creating the Portfolio Formation Variable**

**3.1 Using the `rolling` Method**  

+ To create the portfolio formation variable, We need to sum log returns for a past return window.

+ Pandas' built in rolling functions are not the fastest (as I've shown before), but it's really not an issue for dataset the size of the monthly stock data (about 3.5 million observations).

+ Let's take a look at how `rolling().sum()` works in `Pandas`.

In [2]:
df = pd.DataFrame({'id':['a','b','c','d','e','f','g'],
                   'val':range(1,8)})
df

Unnamed: 0,id,val
0,a,1
1,b,2
2,c,3
3,d,4
4,e,5
5,f,6
6,g,7


In [3]:
df['rsum'] = df['val'].rolling(3).sum()
df

Unnamed: 0,id,val,rsum
0,a,1,
1,b,2,
2,c,3,6.0
3,d,4,9.0
4,e,5,12.0
5,f,6,15.0
6,g,7,18.0


+ Note the timing of the `rolling().sum` above. The current observation is included in each sum.

+ so it's val(t-2) + val(t-1) + val(t) for rolling(3).sum()

+ We'll have to take this timing into account to compute return windows from t-12 to t-2 correctly. <br><br>


**3.2 Rolling Windows and GroupBy**

+ Rolling Sum is also built into groupby objects $\rightarrow$

In [4]:
df = pd.DataFrame({'id':['a','b','c','d','e','f','g','h'],
                   'g':['1','1','1','1','2','2','2','2'],
                   'val':range(1,9)})
df

Unnamed: 0,id,g,val
0,a,1,1
1,b,1,2
2,c,1,3
3,d,1,4
4,e,2,5
5,f,2,6
6,g,2,7
7,h,2,8


In [5]:
df.groupby('g')['val'].rolling(2).sum()

g   
1  0     NaN
   1     3.0
   2     5.0
   3     7.0
2  4     NaN
   5    11.0
   6    13.0
   7    15.0
Name: val, dtype: float64

+ **Using rolling sum and shift with a groupby**

  - Can't just add shift at the end of our code statement.

  - Appending `shift` to the end of the statement shifts the whole dataframe.

  - Need to shift/lag within groups

  - Right way $\rightarrow$ use two separate groupbys.

1. The wrong way, just shifts the resulting dataframe $\rightarrow$

In [6]:
df.groupby('g')['val'].rolling(2).sum().shift(1)

g   
1  0     NaN
   1     NaN
   2     3.0
   3     5.0
2  4     7.0
   5     NaN
   6    11.0
   7    13.0
Name: val, dtype: float64

2. The right way, the rolling.sum and shift in separate groupbys $\rightarrow$

In [7]:
df['roll'] = df.groupby('g')['val'].rolling(2).sum().reset_index(drop=True)
df['roll_lag'] = df.groupby('g')['roll'].shift()
df

Unnamed: 0,id,g,val,roll,roll_lag
0,a,1,1,,
1,b,1,2,3.0,
2,c,1,3,5.0,3.0
3,d,1,4,7.0,5.0
4,e,2,5,,
5,f,2,6,11.0,
6,g,2,7,13.0,11.0
7,h,2,8,15.0,13.0


<br>

**3.3 Reminder: Portfolio Formation Framework**  

1. Data Preparation. $\leftarrow$ I gave you clean data. Nothing to do here.

2. Create portfolio formation variable.

3. Bin the data

4. Create the portfolios based on bins and weighting scheme.

5. Test a model or benchmark performance.<br><br>


**3.4 Formation Variable: Cumulative Past Returns**

1. Create log returns. $\leftarrow$ use `numpy.log`

2. Create 12 period cumulative log return windows: t-11 to t-0. $\leftarrow$ needs to be done stock by stock in a `groupby` using a rolling sum.

3. Lag/shift two periods $\leftarrow$ also needs a `groupby`.

In [8]:
df = pd.read_csv('06-mstk_62-24.csv',parse_dates=['caldt'])
df

Unnamed: 0,permno,caldt,ticker,prc,me,ret,shr
0,10000,1986-01-31,OMFGA,4.37500,16.1000,,3680.0
1,10000,1986-02-28,OMFGA,3.25000,11.9600,-0.257143,3680.0
2,10000,1986-03-31,OMFGA,4.43750,16.3300,0.365385,3680.0
3,10000,1986-04-30,OMFGA,4.00000,15.1720,-0.098592,3793.0
4,10000,1986-05-30,OMFGA,3.10938,11.7939,-0.222656,3793.0
...,...,...,...,...,...,...,...
3406370,93436,2024-05-31,TSLA,178.08000,567932.0000,-0.028372,3189200.0
3406371,93436,2024-06-28,TSLA,197.88000,632155.0000,0.111186,3194640.0
3406372,93436,2024-07-31,TSLA,232.07000,741380.0000,0.172781,3194640.0
3406373,93436,2024-08-30,TSLA,214.11000,684004.0000,-0.077391,3194640.0


In [9]:
df['logret'] = np.log(1 + df['ret'])
df['mom'] = df.groupby('permno')['logret'].rolling(11,11).sum().reset_index(drop=True)
df['mom'] = df.groupby('permno')['mom'].shift(2)

df.head(15)

Unnamed: 0,permno,caldt,ticker,prc,me,ret,shr,logret,mom
0,10000,1986-01-31,OMFGA,4.375,16.1,,3680.0,,
1,10000,1986-02-28,OMFGA,3.25,11.96,-0.257143,3680.0,-0.297252,
2,10000,1986-03-31,OMFGA,4.4375,16.33,0.365385,3680.0,0.311436,
3,10000,1986-04-30,OMFGA,4.0,15.172,-0.098592,3793.0,-0.103797,
4,10000,1986-05-30,OMFGA,3.10938,11.7939,-0.222656,3793.0,-0.251872,
5,10000,1986-06-30,OMFGA,3.09375,11.7346,-0.005025,3793.0,-0.005038,
6,10000,1986-07-31,OMFGA,2.84375,10.7863,-0.080808,3793.0,-0.08426,
7,10000,1986-08-29,OMFGA,1.09375,4.14859,-0.615385,3793.0,-0.955512,
8,10000,1986-09-30,OMFGA,1.03125,3.91153,-0.057143,3793.0,-0.058841,
9,10000,1986-10-31,OMFGA,0.78125,3.00234,-0.242424,3843.0,-0.277631,


<br>

**3.5 Lag Variables Before Removing Any Observations**

+ Need to remove missing `mom` observations before binning.

+ Lag both price and market-cap before removing any observations.

+ Can also remove low priced stocks at the same time.

  + Remember, you must always impose this restriction using **lagged price**.

  + Otherwise, you will create a look ahead bias in your portfolio formation.

In [10]:
df['prclag'] = df.groupby('permno')['prc'].shift()
df['melag'] = df.groupby('permno')['me'].shift(1)

df = df.query("mom == mom and prclag >= 5").reset_index(drop=True)
df.head(10)

Unnamed: 0,permno,caldt,ticker,prc,me,ret,shr,logret,mom,prclag,melag
0,10001,1987-02-27,GFGC,6.25,6.19375,-0.074074,991.0,-0.076961,0.196692,6.75,6.68925
1,10001,1987-03-31,GFGC,6.375,6.31763,0.0368,991.0,0.036139,0.140122,6.25,6.19375
2,10001,1987-04-30,GFGC,6.125,6.06987,-0.039216,991.0,-0.040006,0.038273,6.375,6.31763
3,10001,1987-05-29,GFGC,5.6875,5.63631,-0.071429,991.0,-0.074108,0.06456,6.125,6.06987
4,10001,1987-06-30,GFGC,5.875,5.82212,0.051429,991.0,0.05015,0.034407,5.6875,5.63631
5,10001,1987-07-31,GFGC,6.0,5.946,0.021277,991.0,0.021054,-0.026546,5.875,5.82212
6,10001,1987-08-31,GFGC,6.5,6.4415,0.083333,991.0,0.080042,0.03386,6.0,5.946
7,10001,1987-09-30,GFGC,6.25,6.2,-0.022308,992.0,-0.022561,-0.014766,6.5,6.4415
8,10001,1987-10-30,GFGC,6.375,6.324,0.02,992.0,0.019803,0.068358,6.25,6.2
9,10001,1987-11-30,GFGC,6.1875,6.138,-0.029412,992.0,-0.029853,0.007331,6.375,6.324


<br>

**4. Bin the Data/Create Portfolio Breakpoints**

+ For the short selling loan fee portfolios we used `cut` to create bins.

+ Use `qcut` here $\leftarrow$ because it creates bins based on the quintiles of the `mom` variable.

+ Need to use `groupby` by date with `qcut` $\leftarrow$ to allow quintiles to change month by month based on the distribution of `mom`.

+ Methods and functions needed $\rightarrow$ **groupby, transform, qcut** 

+ Use `transform` with a groupby when mapping a Nx1 variable (`mom`) into a new Nx1 variable (`bins`).

In [11]:
df.groupby('caldt')['mom'].transform(pd.qcut,5,labels=False)

0          3
1          2
2          1
3          2
4          2
          ..
2291984    1
2291985    1
2291986    0
2291987    0
2291988    0
Name: mom, Length: 2291989, dtype: int64

In [12]:
df['bins'] = df.groupby('caldt')['mom'].transform(pd.qcut,5,labels=False)
df

Unnamed: 0,permno,caldt,ticker,prc,me,ret,shr,logret,mom,prclag,melag,bins
0,10001,1987-02-27,GFGC,6.2500,6.19375,-0.074074,991.0,-0.076961,0.196692,6.7500,6.68925,3
1,10001,1987-03-31,GFGC,6.3750,6.31763,0.036800,991.0,0.036139,0.140122,6.2500,6.19375,2
2,10001,1987-04-30,GFGC,6.1250,6.06987,-0.039216,991.0,-0.040006,0.038273,6.3750,6.31763,1
3,10001,1987-05-29,GFGC,5.6875,5.63631,-0.071429,991.0,-0.074108,0.064560,6.1250,6.06987,2
4,10001,1987-06-30,GFGC,5.8750,5.82212,0.051429,991.0,0.050150,0.034407,5.6875,5.63631,2
...,...,...,...,...,...,...,...,...,...,...,...,...
2291984,93436,2024-05-31,TSLA,178.0800,567932.00000,-0.028372,3189200.0,-0.028782,0.067537,183.2800,584516.00000,1
2291985,93436,2024-06-28,TSLA,197.8800,632155.00000,0.111186,3194640.0,0.105428,-0.106760,178.0800,567932.00000,1
2291986,93436,2024-07-31,TSLA,232.0700,741380.00000,0.172781,3194640.0,0.159378,-0.385232,197.8800,632155.00000,0
2291987,93436,2024-08-30,TSLA,214.1100,684004.00000,-0.077391,3194640.0,-0.080550,-0.301196,232.0700,741380.00000,0


In [13]:
df['bins'].describe().round(2)

count    2291989.00
mean           2.00
std            1.41
min            0.00
25%            1.00
50%            2.00
75%            3.00
max            4.00
Name: bins, dtype: float64

<br>

**Finish portfolio construction and homework tasks**

1. Form quintile based equal-weight momentum portfolios and report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio). Note, you should exclude low price stocks from your portfolios (price below $5). We will discuss the code for creating the portfolio formation variable in the class before the assignment.


In [33]:
port = df.groupby(['caldt', 'bins'], observed=True).ret.mean()*100

port = port.unstack(level='bins')

In [15]:
from finance_byu.summarize import summary
summary(port)

bins,0,1,2,3,4
count,735.0,735.0,735.0,735.0,735.0
mean,0.421442,0.967692,1.149184,1.3007,1.609396
std,6.797588,5.166622,4.673402,4.805009,6.294645
tstat,1.680842,5.077783,6.666535,7.338823,6.931631
pval,0.093219,4.846059e-07,5.148489e-11,5.72574e-13,9.120726e-12
min,-27.982687,-23.90457,-25.30421,-28.5142,-31.34644
25%,-3.218192,-1.659693,-1.373365,-1.374265,-1.639486
50%,0.604792,1.241191,1.631933,1.740596,1.9951
75%,3.991941,3.734872,3.86976,4.388075,5.393881
max,31.709633,23.42758,20.21075,17.76558,31.6033


2. Compute the average number of stocks that are in each portfolio.

avg number of stocks in each portfolio = 735 stocks.

3. Add a spread portfolio (100% long portfolio 4 and 100% short portfolio 0 $\leftarrow$ it's a zero cost long/short (L/S) portfolio) to your dataframe of equal-weight momentum portfolios and then compute the summary statistics.

In [28]:
port['Spread Weight Port'] = port[4] - port[0]
summary(port)

bins,0,1,2,3,4,Spread Weight Port
count,735.0,735.0,735.0,735.0,735.0,735.0
mean,0.421442,0.967692,1.149184,1.3007,1.609396,1.187954
std,6.797588,5.166622,4.673402,4.805009,6.294645,4.533096
tstat,1.680842,5.077783,6.666535,7.338823,6.931631,7.104743
pval,0.093219,4.846059e-07,5.148489e-11,5.72574e-13,9.120726e-12,2.857137e-12
min,-27.982687,-23.90457,-25.30421,-28.5142,-31.34644,-27.13165
25%,-3.218192,-1.659693,-1.373365,-1.374265,-1.639486,-0.6229662
50%,0.604792,1.241191,1.631933,1.740596,1.9951,1.515765
75%,3.991941,3.734872,3.86976,4.388075,5.393881,3.344216
max,31.709633,23.42758,20.21075,17.76558,31.6033,29.43874


4. Form quintile based value-weight momentum portfolios and report summary statistics (including a t-test of whether the average return is statistically different from zero for each portfolio). You should once again have five portfolios. The only difference between your equal-weight and value-weight portfolios will be the weights. A value weight portfolio is defined as the following ($me$ refers to the market value of equity):
$$

In [38]:
df['total me'] = df.groupby(['caldt', 'bins'])['me'].transform('sum')
df['weight'] = df.me / df['total me']
df['w_ret'] = df.ret * df.weight

weightedPortfolio = df.groupby(['caldt', 'bins'], observed = True).w_ret.sum() * 100
weightedPortfolio = weightedPortfolio.unstack('bins')

In [39]:
summary(weightedPortfolio)

bins,0,1,2,3,4
count,735.0,735.0,735.0,735.0,735.0
mean,1.488933,1.393963,1.321409,1.506524,2.149854
std,6.683458,4.798755,4.319391,4.430793,5.761128
tstat,6.039732,7.875287,8.293892,9.218033,10.11685
pval,2.451999e-09,1.223885e-14,5.226464e-16,3.1381159999999997e-19,1.302835e-22
min,-21.76323,-18.43129,-19.62553,-21.45252,-25.68543
25%,-2.342522,-1.310175,-1.298867,-1.117009,-1.060366
50%,1.174441,1.463762,1.477646,1.770453,2.250111
75%,4.847298,3.940033,4.019669,4.16071,5.6292
max,32.63112,19.49274,15.7495,20.2411,31.59625
