# FINM 35000 Problem Set 3: Equity Valuation Stress Testing

<span style="color:blue">Aman Krishna </span> <br>
<br>
<span style="color:#406A5F">Tim Taylor </span> <br>
<br>
<span style="color:purple">Yazmin Ramirez Delgado </span>

In [14]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import math as m 
import scipy.stats as stats
from statsmodels.regression.rolling import RollingOLS
import seaborn as sns
import warnings
from scipy.stats import norm
pd.set_option("display.precision", 2)
pd.set_option('display.float_format', '{:.3f}'.format)
warnings.filterwarnings("ignore")

# 1. Replication of Cosemans and Frehen (2021) (100 points)

Note: for questions 2-3, it is possible you will not obtain the exact numbers in the paper, which is okay as long as you are able to describe the ways in which you might have deviated from the authors (in question 4).

### 1. 
In your own words, describe what the authors mean by “salience theory” and how it affects investor’s portfolio choice decisions.

<span style="color:purple">"Salience theory," as discussed the paper, refers to the idea that investors tend to give disproportionate attention and importance to the most prominent or striking features of an investment, particularly past returns. This theory is grounded in the broader understanding of how cognitive biases (something like behavioural economics) influence decision-making. In the context of stock market investments, salience theory suggests that investors are drawn to stocks that have had notably high or low returns in the past, as these returns are more "salient" or noticeable. Famous stocks like Apple and Tesla come to mind when thinking about this theory from a US Stock market perspective. </span> <br>
<br>

<span style="color:blue"> According to salience theory, investors do not evaluate potential investments in a completely rational or comprehensive manner. That is, investors are unsophisticated in their decision making and, they are more likely to focus on the most memorable or striking aspects of an asset's history, especially its past performance. For example, if a stock has experienced a significant upsurge in value in the recent past, this positive performance becomes a salient feature that attracts investors, leading them to overvalue such stocks. This overvaluation, in turn, means that these stocks are likely to have lower future returns because their current prices are inflated due to high demand based on salient past performance.</span> <br>
<br>

<span style="color:#406A5F"> Going the other way, stocks with notably poor past performance can become undervalued, as investors overlook them due to their salient negative returns. These undervalued stocks, according to the theory, are likely to yield higher future returns as their current lower prices do not reflect their potential value. </span> <br>
<br>

<span style="color:#406A5F"> Overall, we saw a lot of similarity between Fama French's fourth factor (out of 5) - Profitability (RMW - Robust Minus Weak). This factor captures the historical outperformance of profitable companies compared to less profitable ones. It measures the return difference between a portfolio of companies with high profitability and a portfolio of companies with low profitability. </span> <br>
<br>

### 2. 
Following Section 3 of the paper, download the relevant variables from CRSP and Compustat (both available through WRDS). Use this data to replicated Table 2.

Load CRSP Daily, Monthly, and Compustat Fundamentals Data

In [15]:
# Read "C:\Users\Aman\Downloads\Compressed\crsp_us_equity.csv"
crsp_daily = pd.read_csv("C:/Users/Aman/Downloads/Compressed/crsp_us_equity.csv")

In [16]:
# Convert date to datetime format
crsp_daily['date'] = pd.to_datetime(crsp_daily['date'])

# Sort the DataFrame by 'TICKER' and 'date'
crsp_daily = crsp_daily.sort_values(['TICKER', 'date'])

# Remove all rows with missing TICKER or RET
crsp_daily.dropna(subset=['TICKER', 'RET'], inplace=True)

#Drop COMNAM and PERMNO columns
crsp_daily.drop(columns=['COMNAM'], inplace=True)

In [17]:
# Create a month and year column like 2005-01
crsp_daily['month'] = crsp_daily['date'].dt.strftime('%Y-%m')

In [18]:
# Use groupby and transform to calculate the number of days in each month
crsp_daily['days_in_month'] = crsp_daily.groupby(['TICKER', 'month'])['RET'].transform('count')

# # Set data and ticker as index
# crsp_daily = crsp_daily.set_index(['date', 'TICKER'])

# # Remove all dates before 2000-01-01
# crsp_daily = crsp_daily[crsp_daily['date'] >= '2005-01-01']

In [19]:
crsp_daily

Unnamed: 0,PERMNO,date,TICKER,VOL,RET,month,days_in_month
1709955,10495,1962-07-02,A,2600.000,0.021739,1962-07,21
1709956,10495,1962-07-03,A,2100.000,0.006079,1962-07,21
1709957,10495,1962-07-05,A,3600.000,-0.003021,1962-07,21
1709958,10495,1962-07-06,A,2600.000,-0.018182,1962-07,21
1709959,10495,1962-07-09,A,4000.000,0.006173,1962-07,21
...,...,...,...,...,...,...,...
85532128,91205,2013-03-11,ZZ,407000.000,0.000000,2013-03,11
85532129,91205,2013-03-12,ZZ,159900.000,0.004545,2013-03,11
85532130,91205,2013-03-13,ZZ,308900.000,0.000000,2013-03,11
85532131,91205,2013-03-14,ZZ,274900.000,0.000000,2013-03,11


In [20]:
# Read "C:\Users\Aman\Downloads\Compressed\crsp_us_equity_monthly.csv"
crsp_monthly = pd.read_csv("C:/Users/Aman/Downloads/Compressed/crsp_us_equity_monthly.csv")

In [21]:
# Convert 'date' column to datetime
crsp_monthly['date'] = pd.to_datetime(crsp_monthly['date'])

# Sort the DataFrame by 'TICKER' and 'date' columns
crsp_monthly.sort_values(by=['TICKER', 'date'], inplace=True)

# Remove all rows with missing TICKER
crsp_monthly.dropna(subset=['TICKER'], inplace=True)

# Convert negative PRC values to positive
crsp_monthly['PRC'] = crsp_monthly['PRC'].abs()

# Fill missing PRC values with 0
crsp_monthly['PRC'].fillna(0, inplace=True)

# Shift the indexes by 1 for crsp_monthly so that the PRC, VOL and RET values are for the previous month
crsp_monthly['PRC'] = crsp_monthly.groupby(['TICKER'])['PRC'].shift(1)

# # Remove all dates before 2000-01-01
# crsp_monthly = crsp_monthly[crsp_monthly['date'] >= '2005-01-01']

# Backfill the missing PRC values with next available PRC value
crsp_monthly['PRC'].fillna(method='bfill', inplace=True)

#Drop COMNAM and PERMNO columns
crsp_monthly.drop(columns=['COMNAM'], inplace=True)

# # Set data and ticker as index
# crsp_monthly = crsp_monthly.set_index(['date', 'TICKER'])

In [22]:
crsp_monthly

Unnamed: 0,PERMNO,date,TICKER,PRC,VOL,RET
80206,10495,1962-07-31,A,40.375,852.000,0.003106
80207,10495,1962-08-31,A,40.375,967.000,0.024768
80208,10495,1962-09-28,A,40.875,1525.000,-0.094801
80209,10495,1962-10-31,A,37.000,1396.000,0.033784
80210,10495,1962-11-30,A,38.250,1895.000,0.117647
...,...,...,...,...,...,...
4083922,91205,2012-11-30,ZZ,2.230,111189.000,-0.026906
4083923,91205,2012-12-31,ZZ,2.170,116706.000,0.000000
4083924,91205,2013-01-31,ZZ,2.170,71494.000,-0.004608
4083925,91205,2013-02-28,ZZ,2.160,97674.000,0.009259


In [23]:
# Read "C:\Users\Aman\Downloads\Compressed\compustat_us_equity.csv"
compustat_yearly = pd.read_csv("C:/Users/Aman/Downloads/Compressed/compustat_us_equity.csv")

In [24]:
compustat_yearly

Unnamed: 0,gvkey,datadate,fyear,indfmt,consol,popsrc,datafmt,tic,conm,curcd,bkvlps,csho,costat,mkvalt
0,1000,1961-12-31,1961.000,INDL,C,D,STD,AE.2,A & E PLASTIK PAK INC,USD,2.434,0.152,I,
1,1000,1962-12-31,1962.000,INDL,C,D,STD,AE.2,A & E PLASTIK PAK INC,USD,3.050,0.181,I,
2,1000,1963-12-31,1963.000,INDL,C,D,STD,AE.2,A & E PLASTIK PAK INC,USD,2.973,0.186,I,
3,1000,1964-12-31,1964.000,INDL,C,D,STD,AE.2,A & E PLASTIK PAK INC,USD,3.097,0.196,I,
4,1000,1965-12-31,1965.000,INDL,C,D,STD,AE.2,A & E PLASTIK PAK INC,USD,2.384,0.206,I,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
515983,328795,2013-12-31,2013.000,INDL,C,D,STD,ACA,ARCOSA INC,USD,,,A,
515984,328795,2014-12-31,2014.000,INDL,C,D,STD,ACA,ARCOSA INC,USD,,,A,
515985,328795,2015-12-31,2015.000,INDL,C,D,STD,ACA,ARCOSA INC,USD,,,A,
515986,335466,2015-12-31,2015.000,INDL,C,D,STD,HOFSQ,HERMITAGE OFFSHORE SERVICES,USD,,,I,


In [25]:
# Merge crsp_daily and crsp_monthly on TICKER, date, PERMNO
crsp = pd.merge(crsp_daily, crsp_monthly, on=['TICKER','PERMNO', 'date'], how='outer')

In [26]:
del crsp_daily, crsp_monthly

In [27]:
# group by TICKER and backfill the PRC values
crsp['PRC_x'] = crsp.groupby(['TICKER'])['PRC'].fillna(method='bfill')
# Filter dataframe where PRC_x is >= 5 and days_in_month >15
crsp = crsp[(crsp['PRC_x'] >= 5) & (crsp['days_in_month'] > 15)]

In [28]:
crsp

Unnamed: 0,PERMNO,date,TICKER,VOL_x,RET_x,month,days_in_month,PRC,VOL_y,RET_y,PRC_x
0,10495,1962-07-02,A,2600.000,0.021739,1962-07,21.000,,,,40.375
1,10495,1962-07-03,A,2100.000,0.006079,1962-07,21.000,,,,40.375
2,10495,1962-07-05,A,3600.000,-0.003021,1962-07,21.000,,,,40.375
3,10495,1962-07-06,A,2600.000,-0.018182,1962-07,21.000,,,,40.375
4,10495,1962-07-09,A,4000.000,0.006173,1962-07,21.000,,,,40.375
...,...,...,...,...,...,...,...,...,...,...,...
78112972,91205,2008-10-27,ZZ,443900.000,-0.028674,2008-10,23.000,,,,6.460
78112973,91205,2008-10-28,ZZ,369000.000,0.014760,2008-10,23.000,,,,6.460
78112974,91205,2008-10-29,ZZ,610400.000,0.090909,2008-10,23.000,,,,6.460
78112975,91205,2008-10-30,ZZ,718200.000,0.100000,2008-10,23.000,,,,6.460


In [29]:
# # Create an empty DataFrame to store the filtered data
# filtered_crsp = pd.DataFrame(columns=crsp.columns)

# for ticker, group in crsp.groupby('TICKER'):
#     # Iterate over each unique month in the ticker's data
#     prc_temp = 10
#     for month, month_group in group.groupby(group['date'].dt.to_period("M")):
#         # Skip the month if it is the first month
#         if month == group['date'].dt.to_period("M").iloc[0]:
#             prc_temp = month_group['PRC'].values[-1]
#             filtered_crsp = pd.concat([filtered_crsp, month_group])
#             continue      
        
#         last_day_of_prev_month = month_group['date'].iloc[-1]
        
#         # Check if PRC >= 5 for the last day of the previous month
#         if prc_temp < 5:
#             #skip the month
#             prc_temp = month_group['PRC'].values[-1]
#             continue
#         else:
#             prc_temp = month_group['PRC'].values[-1]
#             filtered_crsp = pd.concat([filtered_crsp, month_group])

### 3. 
From Tables 3-10, choose two other tables and replicate them.

### 4. 
If the numbers you obtain in questions 2 and 3 deviate from those in the paper, why do you think this is? What parts of the data construction and replication were difficult? Was there any additional information the authors could have given you to make this process simpler?

- We removed the stocks from scope of the daily CRSP dataset if the Monthly Close Price (or Bid-Ask Avg. depending on data availability) was less than or equal to $5. This was done to mitigate market microstructure effects.

### 5. 
In your view, what are the key takeaways of this paper? How did the results in the tables you replicated contribute to the paper as a whole?