# Get Stock List
The purpose of this file is to simply generate a list of stocks based on their unique identifier (PERMNO) that have full CRSP data from January 1995 to December 2024. This stock list will then be used to query WRDS databases for all of the other used datasets. The dataset used is the CRSP monthly stock data and includes all stocks. 

In [1]:
#Import packages
suppressPackageStartupMessages({
    if(!require(tidyverse)){install.packages("tidyverse")}
    
    library(tidyverse)
})

In [2]:
#Import data
stock_data <- read_csv("Data/Downloaded/crsp_all.csv", show_col_types = FALSE)

head(stock_data)

PERMNO,date,TICKER,PERMCO,PRC,RET,SHROUT
<dbl>,<date>,<chr>,<dbl>,<dbl>,<chr>,<dbl>
10001,1995-01-31,EWST,7953,-7.75,-0.03125,2224
10001,1995-02-28,EWST,7953,7.54688,-0.02621,2224
10001,1995-03-31,EWST,7953,7.5,0.006377,2244
10001,1995-04-28,EWST,7953,7.5,0.0,2244
10001,1995-05-31,EWST,7953,-7.875,0.05,2244
10001,1995-06-30,EWST,7953,8.25,0.060317,2254


Now, we select stocks that have a full set of data over the period.

In [3]:
#Select stocks
no_na_data <- drop_na(stock_data) %>% #Remove NAs
    filter(RET != "C")                #Keep only numeric returns 

stock_ids <- levels(as.factor(no_na_data$PERMNO)) # A list of all stock_ids

stock_days <- no_na_data %>%                        # Compute # of data points per stock
    group_by(PERMNO) %>% summarize(nb = n())
max_days <- max(stock_days$nb)                     # Number of data points each stock should have

full_stocks <- stock_ids[which(stock_days$nb == max_days)] #Stocks with data every period

full_data <- filter(stock_data, PERMNO %in% full_stocks) # data with all rows

length(unique(stock_data$PERMNO))
length(unique(full_data$PERMNO))

We have 1159 stocks out of 27868 that have full data over the entire period. By only selecting these stocks, we are introducing bias into our dataset, but this makes our analysis far easier, and greatly speeds up computations which is important since some factors require daily data.

Now, we can ouput our list of stock ids to "stock_ids.csv" which is used to pull data from CRSP and other databases.

In [4]:
#Output stock ID list
write.table(as.numeric(full_stocks), "Data/Generated/stock_ids.csv", row.names = FALSE, col.names = FALSE, sep = ",")