In [2]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%config InlineBackend.figure_formats = ['svg']
%matplotlib inline

## Homework 4 - Replication of Cosemans and Frehen (2021) 

#### Aman Krishna, Tim Taylor, Yazmin Ramirez Delgado

*Note: for questions 2-3, it is possible you will not obtain the exact numbers in the paper, which is okay as long as you are able to describe the ways in which you might have deviated from the authors (in question 4).*

***
1. **In your own words, describe what the authors mean by “salience theory” and how it affects investor’s portfolio choice decisions.**

    From the paper: *"BGS (2012) argue that because of these cognitive limitations, decision-makers’ attention is drawn to the most unusual attributes of the options they face. These salient attributes are consequently overweighted in their decisions, and nonsalient attributes are neglected."*

In effect, the authors argue that because investors have limited capacity to fully consider all of the relevant facts that factor into a robust and fair valuation of an equity or other investment, investors instead make decisions by replacing complex analysis with heuristically selecting investments that have characteristics that are *comparatively* more or less impressive than its peers. For instance, with salience theory Tesla's future stock price after a huge rally in price, outpacing competitors, would lead many investors to purchase the stock, as it performed unusually well recently, implicitly assuming that "past returns *are* indicitive of future results," which those in the financial industry warn against. In the wider context of the market, this means that firms that perform obviously better or worse, or have a more prominent position in a investors mind due to, for instance, overall prominence in pop culture, will experience valuations inconsistent with what are suggested by a more methodical, fundamental approach.




***

2. **Following Section 3 of the paper, download the relevant variables from CRSP and Compustat (both available through WRDS). Use this data to replicated Table 2.**

Data consists of daily and monthly return, book and market value of equity, and trading volume for firms listed on the NYSE, Amex, and Nasdaq. 
- Sample period January 1926 to December 2015
- Stocks with a closing price less than $5 at the end of the previous month are excluded
- Stocks with less than 15 observations in a month are excluded
- Firms with negative book equity are excluded
- Investment and profitability factors used from FF

Compustat data:
- Book value of equity as of December the previous year
- Book equity data is retrieved from FF's website earlier

FF 3-factor data is used to obtain some of the table values.

The columns in Table 2 are the following, and calculated using the following metrics:

- ST        - $S$ = $\sigma(r_{is}, {r_{bar}}) = \frac{|r_{is} - r_{bar}|}{|r_{is}|+|r_{bar}|+\theta}$ using daily returns, where $r_{bar}$ is the equally-weighted return.
- PRICE     - equal-weighted average of prices
- ME        - equal-weighted average of log market equity
- BM        - equal-weighted average of book-to-market ratio
- MOM       - cumulative return over an 11-month period ending two months prior to the current month.
- ILLIQ     - absolute daily return divided by the daily dollar trading volume, averaged over all trading days in a month
- BETA      - a regression of daily excess stock returns on the daily excess market return over a one-month window.
- IVOL      - standard deviation of residuals from the beta regression
- REV       - last month's returns
- MAX       - a stock's maximum return in a month
- MIN       - a stock's minimum return in a month
- TK        - "prospect theory" value using 5 years of returns
- SKEW      - skewness of daily stock returns
- COSKEW    - coskewness of daily stock returns with market returns, over a 1-year window
- ISKEW     - skewness of residuals in a 3-factor FF model
- DBETA     - regression of stock and market returns when market return was below average


In [2]:
""" 
    Creating the dataset
    - memory efficiency is paramount considering these datasets are so large
"""
# importing the data, changing the types, and dropping unneeded columns for memory efficiency.
# Changing columns from int64 / float 64 to int32 / float32 / uint8

link_data_types = {
    'gvkey': 'int32',
    'LPERMNO': 'int64',
    'LPERMCO': 'int32',
}


data_types = {
    'PERMNO': 'int64',
    'EXCHCD': 'uint8',
    'PERMCO': 'int32',
    'PRC': 'float32',
    'VOL': 'float32',
#    'RET': 'float32', # the RET column has additional characters which make it difficult to parse
    'SHROUT': 'float32'
}


data = pd.read_csv('data/CRSP Daily Stock Data.csv.gz', compression='gzip', parse_dates=['date']) #, converters={'RET': lambda x: float('0') if not x or not any(c.isnumeric() or c == '.' for c in x) else float(x)})
data = data.drop(columns=['RETX'])
data = data.dropna().astype(data_types)

link_data = pd.read_csv('data/Compustat CRSP Link.csv.gz', compression='gzip', parse_dates=['LINKDT','LINKENDDT'], dtype=link_data_types)
link_data = link_data.drop(columns=['LINKPRIM', 'LIID', 'LINKTYPE'])



In [4]:
fundamental_data_post_1950_types = {
    'gvkey': 'int32',
    'bkvlps': 'float32',
    'csho': 'float32'
}

fundamental_data_post_1950 = pd.read_csv('data/Compustat Fundamental Data.csv', parse_dates=['datadate'],dtype = fundamental_data_post_1950_types)
fundamental_data_post_1950 = fundamental_data_post_1950.drop(columns=['indfmt', 'consol', 'popsrc', 'datafmt', 'curcd', 'costat'])

fundamental_data_pre_1950 = pd.read_table('data/DFF_BE_With_Nonindust.txt', header=None, sep=r'\s+')

In [3]:
data.info() # cut down from 5 GB

<class 'pandas.core.frame.DataFrame'>
Index: 80821474 entries, 1 to 88978269
Data columns (total 8 columns):
 #   Column  Dtype         
---  ------  -----         
 0   PERMNO  int64         
 1   date    datetime64[ns]
 2   EXCHCD  uint8         
 3   PERMCO  int32         
 4   PRC     float32       
 5   VOL     float32       
 6   RET     object        
 7   SHROUT  float32       
dtypes: datetime64[ns](1), float32(3), int32(1), int64(1), object(1), uint8(1)
memory usage: 3.7+ GB


In [4]:
display(data.head(4), data.shape)
display(link_data.head(4), link_data.shape)
display(fundamental_data_post_1950.head(4), fundamental_data_post_1950.shape)
display(fundamental_data_pre_1950.head(4), fundamental_data_pre_1950.shape)

Unnamed: 0,PERMNO,date,EXCHCD,PERMCO,PRC,VOL,RET,SHROUT
1,10000,1986-01-07,3,7952,-2.5625,1000.0,C,3680.0
2,10000,1986-01-08,3,7952,-2.5,12800.0,-0.024390,3680.0
3,10000,1986-01-09,3,7952,-2.5,1400.0,0.000000,3680.0
4,10000,1986-01-10,3,7952,-2.5,8500.0,0.000000,3680.0


(80821474, 8)

Unnamed: 0,gvkey,LPERMNO,LPERMCO,LINKDT,LINKENDDT
0,1000,25881,23369,1970-11-13,1978-06-30
1,1001,10015,6398,1983-09-20,1986-07-31
2,1002,10023,22159,1972-12-14,1973-06-05
3,1003,10031,6672,1983-12-07,1989-08-16


(31710, 5)

Unnamed: 0,gvkey,datadate,fyear,bkvlps,csho
0,1000,1961-12-31,1961.0,2.4342,0.152
1,1000,1962-12-31,1962.0,3.0497,0.181
2,1000,1963-12-31,1963.0,2.9731,0.186
3,1000,1964-12-31,1964.0,3.0969,0.196


(477656, 5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,69,70,71,72,73,74,75,76,77,78
0,10006,1926,1953,67.743,71.245,70.139,70.139,70.139,70.139,69.445,...,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
1,10014,1926,1961,13.005,12.787,12.63,13.871,14.896,15.705,16.282,...,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
2,10022,1926,1960,13.567,13.996,14.326,14.552,14.025,14.081,13.314,...,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99
3,10030,1926,1966,15.924,17.487,18.771,20.508,20.488,21.1,18.499,...,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99,-99.99


(1794, 79)

In [5]:
# Applying the filters BEFORE merging the datasets
# keeping only stocks that were traded on exchanges 1, 2, or 3
data = data[data.EXCHCD.isin([1, 2, 3])]

# remove any stock in a month with fewer than 15 observations -- will this cause survivor bias?
data["month"] = data['date'].dt.to_period("M")
data['monthly_observations'] = data.groupby(['PERMNO', data['month']])['PERMNO'].transform('count').astype('int32')
data = data[(data['monthly_observations'] >= 15)]

In [7]:
last_close = data.groupby(['PERMNO', 'month'])['PRC'].last().shift().bfill()
last_close = last_close.reset_index()

In [9]:
# applying the filter
data = data.merge(
    last_close[last_close.PRC >= 5][['PERMNO', 'month']], # remove any stock whose price is less than $5 at the end of the previous month
    on=['PERMNO', 'month'],
    how='inner'
)

# freeing up memory explicitly
del last_close 

display(data.head(4), data.info(), data.shape)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56043490 entries, 0 to 56043489
Data columns (total 10 columns):
 #   Column                Dtype         
---  ------                -----         
 0   PERMNO                int64         
 1   date                  datetime64[ns]
 2   EXCHCD                uint8         
 3   PERMCO                int32         
 4   PRC                   float32       
 5   VOL                   float32       
 6   RET                   object        
 7   SHROUT                float32       
 8   month                 period[M]     
 9   monthly_observations  int32         
dtypes: datetime64[ns](1), float32(3), int32(2), int64(1), object(1), period[M](1), uint8(1)
memory usage: 2.8+ GB


Unnamed: 0,PERMNO,date,EXCHCD,PERMCO,PRC,VOL,RET,SHROUT,month,monthly_observations
0,10001,1986-10-01,3,7953,6.75,1600.0,0.058824,991.0,1986-10,23
1,10001,1986-10-02,3,7953,6.375,5437.0,-0.055556,991.0,1986-10,23
2,10001,1986-10-03,3,7953,6.75,750.0,0.058824,991.0,1986-10,23
3,10001,1986-10-06,3,7953,6.75,181.0,0.0,991.0,1986-10,23


None

(56043490, 10)

In [None]:
# converting the return to a float and force removing the strings, not the most robust method
data['RET'] = pd.to_numeric(data['RET'], errors='coerce').fillna(0.00).astype('float32')

In [10]:
# reducing the link data by including only the firms in the filtered data to eliminate excess rows
link = link_data[(link_data['LPERMNO'].isin(data['PERMNO'])) | (link_data['LPERMCO'].isin(data['PERMCO']))][["gvkey","LPERMNO","LPERMCO"]].drop_duplicates(subset=['LPERMNO']) # , 'LPERMCO', "gvkey"

In [12]:
# Merging the daily stock data together and the linking dataset
data = data.merge(link, left_on=['PERMNO', 'PERMCO'], right_on=['LPERMNO', 'LPERMCO'], how='left')

In [5]:
last_bve = fundamental_data_post_1950.dropna(subset=["bkvlps", "csho"]).groupby(["gvkey", fundamental_data_post_1950["datadate"].dt.to_period("Y")])["bkvlps"].last().shift().bfill().reset_index()
last_bve = last_bve.astype({'datadate' : 'int64'})
last_bve['datadate'] = last_bve['datadate'] + 1970 # turning the int into a year value
last_bve = last_bve.rename(columns={'datadate': 'year', 'bkvlps': 'last_bkvlps'})

# dropping duplicate data points and adding in a shifted bve column
fundamental_data_post_1950 = fundamental_data_post_1950.dropna(subset=["bkvlps", "csho"]).merge(last_bve,
                                 left_on = ["gvkey", 'fyear'], right_on = ["gvkey", 'year'], how = 'left').drop(columns=['fyear', 'bkvlps'])

In [None]:
# removing redundant columns
data = data.drop(columns = ['LPERMNO', 'LPERMCO', 'datadate'])

In [44]:
# adding index data of the equal-weighted returns across exchanges 1, 2, and 3
index_data = pd.read_csv('data/CRSP Index.csv.gz', compression='gzip', parse_dates=['caldt'])
index_data.head(4)

Unnamed: 0,caldt,ewretd
0,1926-01-02,0.009516
1,1926-01-04,0.00578
2,1926-01-05,-0.001927
3,1926-01-06,0.001182


In [63]:
theta = 0.1
delta = 0.7
count = 0

for permno in data['PERMNO'].unique():

    sample = data[data['PERMNO'] == permno].copy()
    sample = sample.merge(index_data, left_on = 'date', right_on = 'caldt', how = 'left')
    sample['salience'] = abs(sample['RET'] - sample['ewretd']) / (
                abs(sample['ewretd']) + abs(sample['RET']) + theta)
    
    for name, group in sample.groupby(['month']):

        group['salience_rank'] = group['salience'].rank(ascending=False)
        group['salience_weight'] = delta / (group['salience_rank'] * delta * (1 / len(group)))
        sample.loc[group.index, 'salience_weight'] = group['salience_weight']

        # Calculating Salience Theory value ST
        cov_matrix = np.cov(group['RET'], group['salience_weight'])
        sample.loc[group.index, 'ST'] = cov_matrix[0][1]

    # Making the index of the sample same as data['PERMNO'] == permno
    sample.set_index(data[data['PERMNO'] == permno].index, inplace=True)
    data.loc[data[data['PERMNO'] == permno].index, 'ST'] = sample['ST']

    # deleting the last sample
    del sample

    count += 1
    if count % 1000 == 0:
        print("Processed PERMNO: ", count)

Processed PERMNO:  1000
Processed PERMNO:  2000
Processed PERMNO:  3000
Processed PERMNO:  4000
Processed PERMNO:  5000
Processed PERMNO:  6000
Processed PERMNO:  7000
Processed PERMNO:  8000
Processed PERMNO:  9000
Processed PERMNO:  10000
Processed PERMNO:  11000
Processed PERMNO:  12000
Processed PERMNO:  13000
Processed PERMNO:  14000
Processed PERMNO:  15000
Processed PERMNO:  16000
Processed PERMNO:  17000
Processed PERMNO:  18000
Processed PERMNO:  19000
Processed PERMNO:  20000
Processed PERMNO:  21000
Processed PERMNO:  22000
Processed PERMNO:  23000
Processed PERMNO:  24000
Processed PERMNO:  25000
Processed PERMNO:  26000


In [64]:
data['decile'] = pd.qcut(data['ST'], q=[0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
                       labels=['Low', '2', '3', '4', '5', '6', '7', '8', '9', 'High'])

In [65]:
#  only month end dates and group by decile and take average of ST
data[data['date'].dt.is_month_end].groupby(['decile'])['ST'].mean()*24

  data[data['date'].dt.is_month_end].groupby(['decile'])['ST'].mean()*24


decile
Low    -2.349450
2      -1.063092
3      -0.659209
4      -0.357794
5       0.019891
6       0.415580
7       0.733201
8       1.101721
9       1.651833
High    3.364712
Name: ST, dtype: float64

In [66]:
# saving the data so far. The code takes nearly 3 hours to run.
data.to_csv('data_checkpoint.csv.gz', index=False, compression='gzip')

***

In [1]:
#_______________________________________________________________
# loading the checkpoint data because VSCode crashed
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
%config InlineBackend.figure_formats = ['svg']
%matplotlib inline

types = {
    'PERMNO' : 'int32', 
    'EXCHCD' : 'uint8', 
    'PERMCO' : 'int32', 
    'PRC' : 'float32', 
    'VOL' : 'float32', 
    'RET' : 'float32', 
    'SHROUT' : 'float32',
    'monthly_observations' : 'int32', 
    'gvkey' : 'int32', 
    'csho' : 'float32', 
    'year' : 'int32', 
    'last_bkvlps' : 'float32', 
    'ST' : 'float32',
}
data = pd.read_csv('data_checkpoint.csv.gz', compression='gzip', parse_dates=['date', 'month'],)
data.head()

Unnamed: 0,PERMNO,date,EXCHCD,PERMCO,PRC,VOL,RET,SHROUT,month,monthly_observations,gvkey,csho,year,last_bkvlps,ST,decile
0,10001,1986-10-01,3,7953,6.75,1600.0,0.058824,991.0,1986-10-01,23,12994.0,1.001,1989.0,5.5565,0.057377,9
1,10001,1986-10-02,3,7953,6.375,5437.0,-0.055556,991.0,1986-10-01,23,12994.0,1.001,1989.0,5.5565,0.057377,9
2,10001,1986-10-03,3,7953,6.75,750.0,0.058824,991.0,1986-10-01,23,12994.0,1.001,1989.0,5.5565,0.057377,9
3,10001,1986-10-06,3,7953,6.75,181.0,0.0,991.0,1986-10-01,23,12994.0,1.001,1989.0,5.5565,0.057377,9
4,10001,1986-10-07,3,7953,6.75,400.0,0.0,991.0,1986-10-01,23,12994.0,1.001,1989.0,5.5565,0.057377,9


In [2]:
fundamental_data_post_1950_types = {
    'gvkey': 'int32',
    'bkvlps': 'float32',
    'csho': 'float32'
}

fundamental_data_post_1950 = pd.read_csv('data/Compustat Fundamental Data.csv', parse_dates=['datadate'],dtype = fundamental_data_post_1950_types)
fundamental_data_post_1950 = fundamental_data_post_1950.drop(columns=['indfmt', 'consol', 'popsrc', 'datafmt', 'curcd', 'costat'])

fundamental_data_pre_1950 = pd.read_table('data/DFF_BE_With_Nonindust.txt', header=None, sep=r'\s+')

In [4]:
# narrowing the data to only that missing in the original dataframe
fundamental_data_pre_1950 = fundamental_data_pre_1950[fundamental_data_pre_1950[0].isin(data[data.last_bkvlps.isna()].PERMNO)]
# replacing useless observations with NaN
fundamental_data_pre_1950.replace(-99.99, np.nan, inplace=True)
fundamental_data_pre_1950.head(4)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,69,70,71,72,73,74,75,76,77,78
0,10006,1926,1953,67.743,71.245,70.139,70.139,70.139,70.139,69.445,...,,,,,,,,,,
1,10014,1926,1961,13.005,12.787,12.63,13.871,14.896,15.705,16.282,...,,,,,,,,,,
2,10022,1926,1960,13.567,13.996,14.326,14.552,14.025,14.081,13.314,...,,,,,,,,,,
3,10030,1926,1966,15.924,17.487,18.771,20.508,20.488,21.1,18.499,...,,,,,,,,,,


In [3]:
# adding in the post 1950 data
data = data.merge(fundamental_data_post_1950, left_on=['gvkey', 'date'], right_on  = ['gvkey', 'datadate'], how = 'left')

In [7]:
data['BVE'] = data.last_bkvlps * data.csho_y
data['MVE'] = np.log(data.PRC * data.VOL) # taking the log
# book to equity
data['BE'] = (data.last_bkvlps * data.csho_y) / (data.PRC * data.VOL)

In [22]:
# Momentum
# cumulative return of each month portfolio
# equally weighted daily return portfolios by month
daily_rets_decile = data.groupby(['date', 'decile'])['RET'].mean().reset_index()

# making each column their own thing
daily_rets_decile = daily_rets_decile.pivot_table(index='date', columns='decile', values='RET')

momentum_decile = pd.DataFrame(index=daily_rets_decile.resample('M').mean().index)

# Loop through each decile column and calculate the 11-month rolling cumulative returns
for decile_col in daily_rets_decile.columns:
    momentum_decile[f'{decile_col}'] = ((1 + daily_rets_decile[decile_col])
                                        .resample('M')
                                        .apply(lambda x: x.prod())
                                        .rolling(window=11)
                                        .apply(lambda x: x.prod()) - 1).shift(-2)
momentum_decile

Unnamed: 0_level_0,2,3,4,5,6,7,8,9,High,Low
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1926-01-31,,,,,,,,,,
1926-02-28,,,,,,,,,,
1926-03-31,,,,,,,,,,
1926-04-30,,,,,,,,,,
1926-05-31,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...
2015-08-31,-0.469064,-0.296359,-0.144025,0.032498,0.153052,0.275574,0.518847,0.995247,4.366150,-0.802161
2015-09-30,-0.465698,-0.292776,-0.144748,0.027121,0.158892,0.292977,0.534376,1.031393,4.630452,-0.802924
2015-10-31,-0.465706,-0.296996,-0.150835,0.015697,0.143832,0.289012,0.511953,0.970964,4.245638,-0.803431
2015-11-30,,,,,,,,,,


In [24]:
momentum_means = momentum_decile.mean(axis=0)
momentum = pd.DataFrame({'MOM': momentum_means})
momentum

Unnamed: 0,MOM
2,-0.366715
3,-0.197156
4,-0.047037
5,0.088183
6,0.179972
7,0.319201
8,0.581261
9,1.100271
High,3.649733
Low,-0.651768


In [26]:
data.columns

Index(['PERMNO', 'date', 'EXCHCD', 'PERMCO', 'PRC', 'VOL', 'RET', 'SHROUT',
       'month', 'monthly_observations', 'gvkey', 'csho_x', 'year',
       'last_bkvlps', 'ST', 'decile', 'datadate', 'fyear', 'bkvlps', 'csho_y',
       'BVE', 'MVE'],
      dtype='object')

In [None]:
daily_rets_decile = data.groupby(['date', 'decile'])['RET'].mean().reset_index()

In [39]:
# Illiq
# absolute return over dollar trading vol, averaged by trading month
data['liquidity'] = (np.abs(data.RET) / (data.VOL * data.PRC)) * 1_000_000

# getting the monthly illiquidity measure by decile
illiq_decile = data.groupby(['decile', pd.Grouper(key='date', freq='M')])['liquidity'].mean().reset_index()
illiq_decile = illiq_decile.pivot_table(index='date', columns='decile', values='liquidity')
illiq = pd.DataFrame({'ILLIQ': illiq_decile.mean(axis=0)})

del illiq_decile

illiq

MemoryError: Unable to allocate 5.43 GiB for an array with shape (13, 56043490) and data type float64

In [None]:
# REV
# last month's return
# equally weighted daily return portfolios by month
daily_rets_decile = data.groupby(['date', 'decile'])['RET'].mean().reset_index()

# making each column their own thing
daily_rets_decile = daily_rets_decile.pivot_table(index='date', columns='decile', values='RET')

reversal_decile = pd.DataFrame(index=daily_rets_decile.resample('M').mean().index)

# Loop through each decile column and calculate the 11-month rolling cumulative returns
for decile_col in daily_rets_decile.columns:
    reversal_decile[f'{decile_col}'] = ((1 + daily_rets_decile[decile_col])
                                        .resample('M')
                                        .apply(lambda x: x.prod())
                                        .apply(lambda x: x.prod()) - 1).shift(-1)

reversal = pd.DataFrame({'REV': reversal_decile.mean(axis=0)})

del daily_rets_decile
del reversal_decile

reversal

In [8]:
# ST, Price, MVE, BE
data[data['date'].dt.is_month_end].groupby(['decile'])[['PRC','MVE', 'BE']].mean()

Unnamed: 0_level_0,ST
decile,Unnamed: 1_level_1
2,-1.063092
3,-0.659209
4,-0.357794
5,0.019891
6,0.41558
7,0.733201
8,1.101721
9,1.651833
High,3.364712
Low,-2.34945


In [None]:
# beta, ivol, dbeta
def calc_beta(df):
    x = df.PRC
    y = df.ewretd
    return (x.T @ x)**(-1) * (x.T@y)

In [None]:
# coskew

In [None]:
# max, min
# maximum or minimum daily return in a month

In [None]:
# skew, iskew
# iskew is using ff 3-factor model

*text*

***
3. **From Tables 3-10, choose two other tables and replicate them**

#### Possible tables: 5, 9?

***
4. **If the numbers you obtain in questions 2 and 3 deviate from those in the paper, why do you think this is? What parts of the data construction and replication were difficult? Was there any additional information the authors could have given you to make this process simpler?**

Assumptions made in constructing the dataset:
- backfilling book value and market value data
- companies were aggressively removed based on their price and the number of observations in a month, this could have been different for different datasets
- sample bias leading to skewed results

***
5. **In your view, what are the key takeaways of this paper? How did the results in the tables you replicated contribute to the paper as a whole?**

*text*

***