# Final Project - Evaporating Liquidity

In this project, we replicate tables from the paper "_Evaporating Liquidity_" by Stefan Nagel using the Principals of Reproducible Analytical Pipelines (RAPs) learned in the class. 

Our replication is automated from end-to-end using Pydoit, formatted using the project template (blank_project) provided by professor Bejarano, which is based on the Cookiecutter Data Science template.

In [1]:
import pandas as pd

import config

import load_CRSP_stock
import load_FF_industry
import load_vix

import clean_CRSP_stock

DATA_DIR= config.DATA_DIR

## Data Collection

### 1. Pull and load CRSP data from WRDS

Using `load_CRSP_stock`, we pull and save CRSP daily stock data and indexes from WRDS (Wharton Research Data Services). 

The CRSP daily stock data is needed to construct individual portfolios based on Reversal strategy. The CRSP daily index data is needed to evaluate the performance of Reversal strategy portfolios.

Specifically:
- we use query to pull data of stocks with share code 10 or 11, from NYSE, AMEX, and Nasdaq
- pull one extra month of daily stock data for later data cleaning and processing

#### CRSP daily stock data

In [21]:
df_dsf = load_CRSP_stock.load_CRSP_daily_file(data_dir=DATA_DIR)
df_dsf.info()

<class 'pandas.core.frame.DataFrame'>
Index: 17445472 entries, 0 to 445471
Data columns (total 12 columns):
 #   Column   Dtype         
---  ------   -----         
 0   date     datetime64[ns]
 1   permno   int64         
 2   permco   int64         
 3   exchcd   int64         
 4   prc      float64       
 5   bid      float64       
 6   ask      float64       
 7   shrout   float64       
 8   cfacpr   float64       
 9   cfacshr  float64       
 10  ret      float64       
 11  retx     float64       
dtypes: datetime64[ns](1), float64(8), int64(3)
memory usage: 1.7 GB


### CRSP daily indexes

In [20]:
df_msix = load_CRSP_stock.load_CRSP_index_files(data_dir=DATA_DIR)
df_msix.columns

Index(['caldt', 'vwretd', 'vwindd', 'vwretx', 'vwindx', 'ewretd', 'ewindd',
       'ewretx', 'ewindx', 'sprtrn', 'spindx', 'decret1', 'decind1', 'decret2',
       'decind2', 'decret3', 'decind3', 'decret4', 'decind4', 'decret5',
       'decind5', 'decret6', 'decind6', 'decret7', 'decind7', 'decret8',
       'decind8', 'decret9', 'decind9', 'decret10', 'decind10', 'totval',
       'totcnt', 'usdval', 'usdcnt'],
      dtype='object')

### 2. Pull and load data from the Fama-French Data Library



Using `load_FF_industry`, we pull and save 48 industry portfolio daily returns from the Fama/French Data Library. 

The industry portfolios are constructed by classifying stocks into 48 industries as in Fama and French (1997). The industry portfolio daily returns are needed to construct the industry portfolios based on Reversal strategy.

In [3]:
ff = load_FF_industry.load_FF_industry_portfolio_daily(data_dir=DATA_DIR)

#### Average Value Weighted Daily Returns

In [10]:
ff[0].tail()

Unnamed: 0_level_0,Agric,Food,Soda,Beer,Smoke,Toys,Fun,Books,Hshld,Clths,...,Boxes,Trans,Whlsl,Rtail,Meals,Banks,Insur,RlEst,Fin,Other
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2010-12-27,0.12,-0.35,-0.22,-0.61,-0.26,0.06,-0.48,0.29,-0.55,-0.08,...,0.24,0.21,0.12,-0.13,-0.61,1.17,0.24,-0.18,0.83,0.52
2010-12-28,0.37,0.03,-0.39,0.12,-0.05,-1.45,-0.63,-0.5,0.14,-0.37,...,-0.02,-0.08,-0.16,-0.06,-0.23,0.15,-0.21,-0.39,-0.1,0.17
2010-12-29,2.22,-0.03,-0.32,0.09,0.23,0.55,0.22,0.7,-0.38,-0.53,...,0.02,0.21,0.23,0.51,0.49,-0.25,-0.09,0.7,-0.5,-0.1
2010-12-30,1.0,0.07,0.0,-0.11,-0.42,-0.02,-0.6,-0.78,-0.15,0.54,...,-0.04,-0.02,-0.03,0.13,-0.39,-0.35,-0.29,-0.47,-0.25,-0.44
2010-12-31,-0.13,0.01,-0.81,0.34,0.05,-1.12,1.14,-0.18,-0.06,-0.93,...,-1.02,0.0,-0.54,-0.48,-0.4,0.09,0.19,-0.09,0.07,0.6


#### Average Equal Weighted Returns

In [11]:
ff[1].tail()

Unnamed: 0_level_0,Agric,Food,Soda,Beer,Smoke,Toys,Fun,Books,Hshld,Clths,...,Boxes,Trans,Whlsl,Rtail,Meals,Banks,Insur,RlEst,Fin,Other
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2010-12-27,-0.03,0.13,0.98,-0.2,0.24,-0.95,-0.01,1.26,0.6,0.11,...,0.23,-0.02,0.21,0.28,-0.11,0.78,0.53,0.3,0.35,0.36
2010-12-28,0.14,-0.25,-0.34,0.59,-0.2,-1.53,0.1,-1.39,0.25,-0.48,...,-0.04,-0.35,-0.17,-0.34,-0.59,0.04,-0.37,-0.15,0.06,-0.26
2010-12-29,0.16,-0.18,0.26,-0.02,-0.32,0.01,0.25,0.34,0.31,0.18,...,0.06,0.17,0.12,0.54,0.81,0.3,0.23,0.53,-0.25,-0.03
2010-12-30,1.2,0.49,-0.25,0.12,-1.2,0.84,-0.2,-0.73,-0.37,0.62,...,-0.02,0.62,0.23,0.13,0.04,0.18,-0.18,-0.04,-0.35,-0.2
2010-12-31,1.06,-0.4,-1.37,0.44,0.49,0.39,-0.02,-0.2,-0.5,-0.91,...,-0.99,-0.07,-0.47,-0.68,-0.56,0.15,0.11,1.09,-0.39,-0.29


### 3. Pull and load VIX from the Fama-French Data Library



Using `load_vix`, we pull and save CBOE Volatility Index data from FRED. The data is used later in table replicatation.

In [4]:
vix = load_vix.load_vix_from_fred(data_dir=DATA_DIR)

## Data Cleaning and Processing

### Select the desired subsample of the data

> Reversal strategy returns based on transaction prices are calculated from daily closing prices, and the reversal strategy returns based on quote-midpoints are calculated from averages of closing bid and ask quotes, as reported in the CRSP daily returns file (for Nasdaq stocks only), adjusted for stock splits and dividends using the CRSP adjustment factors and dividend information.


> To enter into the sample, stocks must have a closing price of at least $1 on the last trading day of the previous calendar month.



> To screen out data recording errors of bid and ask data for Nasdaq stocks: require that the ratio of bid to quote-midpoint is not smaller than 0.5, and the one-day return based on quote-midpoints minus the return based on closing prices is not less than -50% and not higher than 100%. If a closing transaction price is not available, the quote-midpoint is used to calculate transaction-price returns.


### Load cleaned data

In [24]:
dfcp = clean_CRSP_stock.load_CRSP_closing_price(data_dir=DATA_DIR)
dfcp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12096467 entries, 0 to 12096466
Data columns (total 13 columns):
 #   Column   Dtype         
---  ------   -----         
 0   index    int64         
 1   date     datetime64[ns]
 2   permno   int64         
 3   permco   int64         
 4   exchcd   int64         
 5   prc      float64       
 6   bid      float64       
 7   ask      float64       
 8   shrout   float64       
 9   cfacpr   float64       
 10  cfacshr  float64       
 11  ret      float64       
 12  retx     float64       
dtypes: datetime64[ns](1), float64(8), int64(4)
memory usage: 1.2 GB


In [23]:
dfmid = clean_CRSP_stock.load_CRSP_midpoint(data_dir=DATA_DIR)
dfmid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7014287 entries, 0 to 7014286
Data columns (total 14 columns):
 #   Column          Dtype         
---  ------          -----         
 0   index           int64         
 1   date            datetime64[ns]
 2   permno          int64         
 3   permco          int64         
 4   exchcd          int64         
 5   prc             float64       
 6   bid             float64       
 7   ask             float64       
 8   shrout          float64       
 9   cfacpr          float64       
 10  cfacshr         float64       
 11  ret             float64       
 12  retx            float64       
 13  quote_midpoint  float64       
dtypes: datetime64[ns](1), float64(9), int64(4)
memory usage: 749.2 MB


## Reversal Strategy 

## Table Replication

We replicate tables with data from January 1998 to December 2010.

## Table Reproduction

Here, we reproduce tables with updated data.

## Analysis outside of replication