# FINM 35000 Problem Set 3: Equity Valuation Stress Testing

<span style="color:blue">Aman Krishna </span> <br>
<br>
<span style="color:#406A5F">Tim Taylor </span> <br>
<br>
<span style="color:purple">Yazmin Ramirez Delgado </span>

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import math as m 
import scipy.stats as stats
import datetime as dt
from statsmodels.regression.rolling import RollingOLS
import seaborn as sns
import warnings
from scipy.stats import norm
pd.set_option("display.precision", 2)
pd.set_option('display.float_format', '{:.3f}'.format)
warnings.filterwarnings("ignore")

c:\Users\Aman\anaconda3\lib\site-packages\numpy\.libs\libopenblas.EL2C6PLE4ZYW3ECEVIV3OXXGRN2NRFM2.gfortran-win_amd64.dll
c:\Users\Aman\anaconda3\lib\site-packages\numpy\.libs\libopenblas.FB5AE2TYXYH2IJRDKGDGQ3XBKLKTF43H.gfortran-win_amd64.dll
c:\Users\Aman\anaconda3\lib\site-packages\numpy\.libs\libopenblas64__v0.3.21-gcc_10_3_0.dll


# 1. Replication of Cosemans and Frehen (2021) (100 points)

Note: for questions 2-3, it is possible you will not obtain the exact numbers in the paper, which is okay as long as you are able to describe the ways in which you might have deviated from the authors (in question 4).

### 1. 
In your own words, describe what the authors mean by “salience theory” and how it affects investor’s portfolio choice decisions.

<span style="color:purple">"Salience theory," as discussed the paper, refers to the idea that investors tend to give disproportionate attention and importance to the most prominent or striking features of an investment, particularly past returns. This theory is grounded in the broader understanding of how cognitive biases (something like behavioural economics) influence decision-making. In the context of stock market investments, salience theory suggests that investors are drawn to stocks that have had notably high or low returns in the past, as these returns are more "salient" or noticeable. Famous stocks like Apple and Tesla come to mind when thinking about this theory from a US Stock market perspective. </span> <br>
<br>

<span style="color:blue"> According to salience theory, investors do not evaluate potential investments in a completely rational or comprehensive manner. That is, investors are unsophisticated in their decision making and, they are more likely to focus on the most memorable or striking aspects of an asset's history, especially its past performance. For example, if a stock has experienced a significant upsurge in value in the recent past, this positive performance becomes a salient feature that attracts investors, leading them to overvalue such stocks. This overvaluation, in turn, means that these stocks are likely to have lower future returns because their current prices are inflated due to high demand based on salient past performance.</span> <br>
<br>

<span style="color:#406A5F"> Going the other way, stocks with notably poor past performance can become undervalued, as investors overlook them due to their salient negative returns. These undervalued stocks, according to the theory, are likely to yield higher future returns as their current lower prices do not reflect their potential value. </span> <br>
<br>

<span style="color:#406A5F"> Overall, we saw a lot of similarity between Fama French's fourth factor (out of 5) - Profitability (RMW - Robust Minus Weak). This factor captures the historical outperformance of profitable companies compared to less profitable ones. It measures the return difference between a portfolio of companies with high profitability and a portfolio of companies with low profitability. </span> <br>
<br>

### 2. 
Following Section 3 of the paper, download the relevant variables from CRSP and Compustat (both available through WRDS). Use this data to replicated Table 2.

Load CRSP Daily, Monthly, and Compustat Fundamentals Data

In [15]:
# Read "C:\Users\Aman\Downloads\Compressed\crsp_us_equity.csv"
crsp_daily = pd.read_csv("C:/Users/Aman/Downloads/Compressed/crsp_us_equity.csv")

In [16]:
# Convert date to datetime format
crsp_daily['date'] = pd.to_datetime(crsp_daily['date'])

# Sort the DataFrame by 'TICKER' and 'date'
crsp_daily = crsp_daily.sort_values(['TICKER', 'date'])

# Remove all rows with missing TICKER or RET
crsp_daily.dropna(subset=['TICKER', 'RET'], inplace=True)

#Drop COMNAM and PERMNO columns
crsp_daily.drop(columns=['COMNAM'], inplace=True)

In [17]:
# Create a month and year column like 2005-01
crsp_daily['month'] = crsp_daily['date'].dt.strftime('%Y-%m')

In [18]:
# Use groupby and transform to calculate the number of days in each month
crsp_daily['days_in_month'] = crsp_daily.groupby(['TICKER', 'month'])['RET'].transform('count')

# # Set data and ticker as index
# crsp_daily = crsp_daily.set_index(['date', 'TICKER'])

# # Remove all dates before 2000-01-01
# crsp_daily = crsp_daily[crsp_daily['date'] >= '2005-01-01']

In [19]:
crsp_daily

Unnamed: 0,PERMNO,date,TICKER,VOL,RET,month,days_in_month
1709955,10495,1962-07-02,A,2600.000,0.021739,1962-07,21
1709956,10495,1962-07-03,A,2100.000,0.006079,1962-07,21
1709957,10495,1962-07-05,A,3600.000,-0.003021,1962-07,21
1709958,10495,1962-07-06,A,2600.000,-0.018182,1962-07,21
1709959,10495,1962-07-09,A,4000.000,0.006173,1962-07,21
...,...,...,...,...,...,...,...
85532128,91205,2013-03-11,ZZ,407000.000,0.000000,2013-03,11
85532129,91205,2013-03-12,ZZ,159900.000,0.004545,2013-03,11
85532130,91205,2013-03-13,ZZ,308900.000,0.000000,2013-03,11
85532131,91205,2013-03-14,ZZ,274900.000,0.000000,2013-03,11


In [20]:
# Read "C:\Users\Aman\Downloads\Compressed\crsp_us_equity_monthly.csv"
crsp_monthly = pd.read_csv("C:/Users/Aman/Downloads/Compressed/crsp_us_equity_monthly.csv")

In [21]:
# Convert 'date' column to datetime
crsp_monthly['date'] = pd.to_datetime(crsp_monthly['date'])

# Sort the DataFrame by 'TICKER' and 'date' columns
crsp_monthly.sort_values(by=['TICKER', 'date'], inplace=True)

# Remove all rows with missing TICKER
crsp_monthly.dropna(subset=['TICKER'], inplace=True)

# Convert negative PRC values to positive
crsp_monthly['PRC'] = crsp_monthly['PRC'].abs()

# Fill missing PRC values with 0
crsp_monthly['PRC'].fillna(0, inplace=True)

# Shift the indexes by 1 for crsp_monthly so that the PRC, VOL and RET values are for the previous month
crsp_monthly['PRC'] = crsp_monthly.groupby(['TICKER'])['PRC'].shift(1)

# # Remove all dates before 2000-01-01
# crsp_monthly = crsp_monthly[crsp_monthly['date'] >= '2005-01-01']

# Backfill the missing PRC values with next available PRC value
crsp_monthly['PRC'].fillna(method='bfill', inplace=True)

#Drop COMNAM and PERMNO columns
crsp_monthly.drop(columns=['COMNAM'], inplace=True)

# # Set data and ticker as index
# crsp_monthly = crsp_monthly.set_index(['date', 'TICKER'])

In [22]:
crsp_monthly

Unnamed: 0,PERMNO,date,TICKER,PRC,VOL,RET
80206,10495,1962-07-31,A,40.375,852.000,0.003106
80207,10495,1962-08-31,A,40.375,967.000,0.024768
80208,10495,1962-09-28,A,40.875,1525.000,-0.094801
80209,10495,1962-10-31,A,37.000,1396.000,0.033784
80210,10495,1962-11-30,A,38.250,1895.000,0.117647
...,...,...,...,...,...,...
4083922,91205,2012-11-30,ZZ,2.230,111189.000,-0.026906
4083923,91205,2012-12-31,ZZ,2.170,116706.000,0.000000
4083924,91205,2013-01-31,ZZ,2.170,71494.000,-0.004608
4083925,91205,2013-02-28,ZZ,2.160,97674.000,0.009259


In [2]:
# Read "C:\Users\Aman\Downloads\Compressed\compustat_us_equity.csv"
compustat_yearly = pd.read_csv("C:/Users/Aman/Downloads/Compressed/compustat_us_equity.csv")

In [3]:
# Drop indfmt	consol	popsrc	datafmt conm curcd costat columns
compustat_yearly.drop(columns=['indfmt', 'consol', 'popsrc', 'datafmt', 'conm', 'curcd', 'costat'], inplace=True)

In [4]:
#Rename datadate to date and convert it to datetime
compustat_yearly.rename(columns={'datadate':'date'}, inplace=True)
compustat_yearly['date'] = pd.to_datetime(compustat_yearly['date'])

#Rename bkvlps to book_value_per_share, csho to shares_outstanding, mkvalt to market_value
compustat_yearly.rename(columns={'bkvlps':'book_value_per_share', 'csho':'shares_outstanding', 'mkvalt':'market_value'}, inplace=True)

compustat_yearly

Unnamed: 0,gvkey,date,fyear,tic,book_value_per_share,shares_outstanding,market_value
0,1000,1961-12-31,1961.000,AE.2,2.434,0.152,
1,1000,1962-12-31,1962.000,AE.2,3.050,0.181,
2,1000,1963-12-31,1963.000,AE.2,2.973,0.186,
3,1000,1964-12-31,1964.000,AE.2,3.097,0.196,
4,1000,1965-12-31,1965.000,AE.2,2.384,0.206,
...,...,...,...,...,...,...,...
515983,328795,2013-12-31,2013.000,ACA,,,
515984,328795,2014-12-31,2014.000,ACA,,,
515985,328795,2015-12-31,2015.000,ACA,,,
515986,335466,2015-12-31,2015.000,HOFSQ,,,


#### Loading the Key to Connect CRSP and Compustat (Permno to GVKEY)

In [5]:
permno_gvkey = pd.read_csv("C:/Users/Aman/Downloads/Compressed/permno_gvkey.csv")

In [6]:
permno_gvkey["LINKDT"] = pd.to_datetime(permno_gvkey["LINKDT"])

# Replace "E" in LINKENDDT with today's date
permno_gvkey["LINKENDDT"].replace({"E": '2016-01-01'}, inplace=True)

permno_gvkey["LINKENDDT"] = pd.to_datetime(permno_gvkey["LINKENDDT"])

permno_gvkey

Unnamed: 0,GVKEY,LINKTYPE,LPERMNO,LPERMCO,LINKDT,LINKENDDT,CONM
0,1000,LU,25881,23369,1970-11-13,1978-06-30,A & E PLASTIK PAK INC
1,1001,LU,10015,6398,1983-09-20,1986-07-31,A & M FOOD SERVICES INC
2,1002,LC,10023,22159,1972-12-14,1973-06-05,AAI CORP
3,1003,LU,10031,6672,1983-12-07,1989-08-16,A.A. IMPORTING CO INC
4,1004,LU,54594,20000,1972-04-24,2016-01-01,AAR CORP
...,...,...,...,...,...,...,...
32928,349994,LC,23514,59438,2022-11-15,2016-01-01,CLEARMIND MEDICINE INC
32929,350681,LC,22205,58855,2021-10-22,2023-03-31,GETNET ADQUIRENCIA E
32930,351038,LC,16161,55612,2021-10-29,2016-01-01,QUOIN PHARMACEUTICALS LTD
32931,352262,LC,23773,59507,2023-03-17,2016-01-01,COOL COMPANY LTD


In [25]:
# Merge crsp_daily and crsp_monthly on TICKER, date, PERMNO
crsp = pd.merge(crsp_daily, crsp_monthly, on=['TICKER','PERMNO', 'date'], how='outer')

In [26]:
del crsp_daily, crsp_monthly

#### Only taking >$5 prev month price and >15 days returns in a month

In [27]:
# group by TICKER and backfill the PRC values
crsp['PRC_x'] = crsp.groupby(['TICKER'])['PRC'].fillna(method='bfill')
# Filter dataframe where PRC_x is >= 5 and days_in_month >15
crsp = crsp[(crsp['PRC_x'] >= 5) & (crsp['days_in_month'] > 15)]

In [107]:
# Convert all RET_x to float, if not possible, convert to NaN
crsp['RET_x'] = pd.to_numeric(crsp['RET_x'], errors='coerce')
#Drop all NaN values in RET_x
crsp.dropna(subset=['RET_x'], inplace=True)

#### Loading the CRSP Index Data

In [7]:
crsp_index = pd.read_csv("C:/Users/Aman/Downloads/Compressed/crsp_index.csv")

In [8]:
crsp_index['caldt'] = pd.to_datetime(crsp_index['caldt'])
crsp_index.rename(columns={'caldt':'date'}, inplace=True)

In [9]:
crsp_index

Unnamed: 0,date,ewretd
0,1926-01-02,0.010
1,1926-01-04,0.006
2,1926-01-05,-0.002
3,1926-01-06,0.001
4,1926-01-07,0.008
...,...,...
23781,2015-12-24,0.002
23782,2015-12-28,-0.008
23783,2015-12-29,0.006
23784,2015-12-30,-0.007


In [203]:
theta = 0.1
delta = 0.7
count=0
for permno in crsp['PERMNO'].unique():
    crsp_sample = crsp[crsp['PERMNO'] == permno].copy()
    crsp_sample = pd.merge(crsp_sample, crsp_index, on='date', how='left')
    crsp_sample['salience'] = abs(crsp_sample['RET_x'] - crsp_sample['ewretd']) / (
                abs(crsp_sample['ewretd']) + abs(crsp_sample['RET_x']) + theta)
    
    # Group by ticker and month and iterate over each group
    for name, group in crsp_sample.groupby(['TICKER', 'month']):
        # Rank the salience values
        group['salience_rank'] = group['salience'].rank(ascending=False)
        # Calculate the salience weight
        group['salience_weight'] = delta / (group['salience_rank'] * delta * (1 / len(group)))
        # Add the salience weight to the dataframe
        crsp_sample.loc[group.index, 'salience_weight'] = group['salience_weight']

        # Calculate Salience theory value ST
        cov_matrix = np.cov(group['RET_x'], group['salience_weight'])
        crsp_sample.loc[group.index, 'ST'] = cov_matrix[0][1]
    #Make the index of crsp_sample same as crsp['PERMNO'] == permno
    crsp_sample.set_index(crsp[crsp['PERMNO'] == permno].index, inplace=True)
    
    # Add the 'ST' column to the original DataFrame
    crsp.loc[crsp[crsp['PERMNO'] == permno].index, 'ST'] = crsp_sample['ST']
    count+=1
    if count%100 == 0:
        print("Processed PERMNO: ", count)

Processed PERMNO:  100
Processed PERMNO:  200
Processed PERMNO:  300
Processed PERMNO:  400
Processed PERMNO:  500
Processed PERMNO:  600
Processed PERMNO:  700
Processed PERMNO:  800
Processed PERMNO:  900
Processed PERMNO:  1000
Processed PERMNO:  1100
Processed PERMNO:  1200
Processed PERMNO:  1300
Processed PERMNO:  1400
Processed PERMNO:  1500
Processed PERMNO:  1600
Processed PERMNO:  1700
Processed PERMNO:  1800
Processed PERMNO:  1900
Processed PERMNO:  2000
Processed PERMNO:  2100
Processed PERMNO:  2200
Processed PERMNO:  2300
Processed PERMNO:  2400
Processed PERMNO:  2500
Processed PERMNO:  2600
Processed PERMNO:  2700
Processed PERMNO:  2800
Processed PERMNO:  2900
Processed PERMNO:  3000
Processed PERMNO:  3100
Processed PERMNO:  3200
Processed PERMNO:  3300
Processed PERMNO:  3400
Processed PERMNO:  3500
Processed PERMNO:  3600
Processed PERMNO:  3700
Processed PERMNO:  3800
Processed PERMNO:  3900
Processed PERMNO:  4000
Processed PERMNO:  4100
Processed PERMNO:  4200
P

In [10]:
crsp = pd.read_csv("C:/Users/Aman/Downloads/Compressed/crsp_filtered_merged.csv")

In [11]:
crsp

Unnamed: 0,PERMNO,date,TICKER,VOL_x,RET_x,month,days_in_month,PRC,VOL_y,RET_y,PRC_x,ST
0,10495,1962-07-02,A,2600.000,0.022,1962-07,21.000,,,,40.375,-0.019
1,10495,1962-07-03,A,2100.000,0.006,1962-07,21.000,,,,40.375,-0.019
2,10495,1962-07-05,A,3600.000,-0.003,1962-07,21.000,,,,40.375,-0.019
3,10495,1962-07-06,A,2600.000,-0.018,1962-07,21.000,,,,40.375,-0.019
4,10495,1962-07-09,A,4000.000,0.006,1962-07,21.000,,,,40.375,-0.019
...,...,...,...,...,...,...,...,...,...,...,...,...
59173521,91205,2008-10-27,ZZ,443900.000,-0.029,2008-10,23.000,,,,6.460,-0.147
59173522,91205,2008-10-28,ZZ,369000.000,0.015,2008-10,23.000,,,,6.460,-0.147
59173523,91205,2008-10-29,ZZ,610400.000,0.091,2008-10,23.000,,,,6.460,-0.147
59173524,91205,2008-10-30,ZZ,718200.000,0.100,2008-10,23.000,,,,6.460,-0.147


In [12]:
# Create deciles based on the 'ST' column
crsp['Decile'] = pd.qcut(crsp['ST'], q=[0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
                       labels=['Low', '2', '3', '4', '5', '6', '7', '8', '9', 'High'])

In [None]:
# 

### 3. 
From Tables 3-10, choose two other tables and replicate them.

### 4. 
If the numbers you obtain in questions 2 and 3 deviate from those in the paper, why do you think this is? What parts of the data construction and replication were difficult? Was there any additional information the authors could have given you to make this process simpler?

- We removed the stocks from scope of the daily CRSP dataset if the Monthly Close Price (or Bid-Ask Avg. depending on data availability) was less than or equal to $5. This was done to mitigate market microstructure effects.

### 5. 
In your view, what are the key takeaways of this paper? How did the results in the tables you replicated contribute to the paper as a whole?