# WRDS database

1. [Package](#Package)
2. [Connection](#Connection)
3. [Database](#Database)
4. [Tables](#Tables)
5. [Stock Data](#Stock_Data)


## Package
1. Required package from [WRDS](https://wrds-www.wharton.upenn.edu/)

In [1]:
import wrds

## Connection

2. Make connection to WRDS database via the API

In connection function ``wrds_username`` is the login username on WRDS website. The password will be prompted on the first excution. By running `create_pgpass_file()`, the input of password can be skipped for next time.

In [2]:
conn = wrds.Connection(wrds_username='zhiyucao')

Loading library list...
Done


## Database

3. Check all available [libraries](https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/) in WRDS

In this tutorial we consider the [CRSP(Center for Research in Security Prices)](https://wrds-www.wharton.upenn.edu/pages/get-data/center-research-security-prices-crsp/).

|Product	|Description|
| ------ | -------|
|crsp_a_ccm	|CRSP/Compustat Merged (Annual)|
|crsp_a_indexes	|CRSP Indexes (Annual)|
|crsp_a_stock	|CRSP Stock (Annual)|
|crsp_a_stock10	|CRSP10 is a variation of the CRSP Stock database and holds 10 years of monthly history.|
|crsp_a_stock62	|CRSP Stock 1962 (Annual)|
|crsp_a_treasuries	|CRSP Treasuries (Annual)|
|crsp_a_ziman	|CRSP Ziman Real Estates (Annual)|

In [9]:
library_list = conn.list_libraries()

## Tables

4. Check available tables within CRSP
   
A sample description of the sub-table is listed here. More details can be found in [CRSP website](https://wrds-www.wharton.upenn.edu/pages/about/data-vendors/center-for-research-in-security-prices-crsp/)


|Table	|Description|
| ----- | -----|
|dse	|Daily Stock - Events	|
|dseall	|None	|
|dsedelist	|CRSP Daily Stock Event - Delisting	|
|dsedist	|CRSP Daily Stock Event -Distribution	|
|dseexchdates	|None	|
|dsenames	|CRSP Daily Stock Event - Name History	|
|dsenasdin	|CRSP Daily Stock Event - NASDAQ Information	|
|dseshares	|CRSP Daily Stock Event - Shares Outstanding	|
|dsf	|Daily Stock - Securities	|
|dsfhdr	|Daily Stock - Header Information with Date Ranges	|
|dsi	|Stock - Market Indexes Daily NYSE/AMEX/NASDAQ/ARCA|



In [3]:
data = conn.get_table(library='crsp', table='dsenames', obs=10)
data

Unnamed: 0,permno,namedt,nameendt,shrcd,exchcd,siccd,ncusip,ticker,comnam,shrcls,...,naics,primexch,trdstat,secstat,permco,compno,issuno,hexcd,hsiccd,cusip
0,10000.0,1986-01-07,1986-12-03,10.0,3.0,3990.0,68391610,OMFGA,OPTIMUM MANUFACTURING INC,A,...,,Q,A,R,7952.0,60007905.0,10396.0,3.0,3990.0,68391610
1,10000.0,1986-12-04,1987-03-09,10.0,3.0,3990.0,68391610,OMFGA,OPTIMUM MANUFACTURING INC,A,...,,Q,A,R,7952.0,60007905.0,10396.0,3.0,3990.0,68391610
2,10000.0,1987-03-10,1987-06-11,10.0,3.0,3990.0,68391610,OMFGA,OPTIMUM MANUFACTURING INC,A,...,,Q,A,R,7952.0,60007905.0,10396.0,3.0,3990.0,68391610
3,10001.0,1986-01-09,1993-11-21,11.0,3.0,4920.0,39040610,GFGC,GREAT FALLS GAS CO,,...,,Q,A,R,7953.0,60007906.0,10398.0,2.0,4925.0,36720410
4,10001.0,1993-11-22,2004-06-09,11.0,3.0,4920.0,29274A10,EWST,ENERGY WEST INC,,...,,Q,A,R,7953.0,60007906.0,10398.0,2.0,4925.0,36720410
5,10001.0,2004-06-10,2004-10-18,11.0,3.0,4920.0,29274A10,EWST,ENERGY WEST INC,,...,221210.0,Q,A,R,7953.0,60007906.0,10398.0,2.0,4925.0,36720410
6,10001.0,2004-10-19,2004-12-26,11.0,3.0,4920.0,29274A10,EWST,ENERGY WEST INC,,...,221210.0,Q,A,R,7953.0,60007906.0,10398.0,2.0,4925.0,36720410
7,10001.0,2004-12-27,2008-02-04,11.0,3.0,4920.0,29274A10,EWST,ENERGY WEST INC,,...,221210.0,Q,A,R,7953.0,60007906.0,10398.0,2.0,4925.0,36720410
8,10001.0,2008-02-05,2008-03-04,11.0,3.0,4920.0,29274A20,EWST,ENERGY WEST INC,,...,221210.0,Q,A,R,7953.0,60007906.0,10398.0,2.0,4925.0,36720410
9,10001.0,2008-03-05,2009-08-03,11.0,3.0,4920.0,29274A20,EWST,ENERGY WEST INC,,...,221210.0,Q,A,R,7953.0,60007906.0,10398.0,2.0,4925.0,36720410


In [49]:
data.columns

Index(['permno', 'namedt', 'nameendt', 'shrcd', 'exchcd', 'siccd', 'ncusip',
       'ticker', 'comnam', 'shrcls', 'tsymbol', 'naics', 'primexch', 'trdstat',
       'secstat', 'permco', 'compno', 'issuno', 'hexcd', 'hsiccd', 'cusip'],
      dtype='object')

In [13]:
table_list = conn.list_tables(library='crsp_a_stock')

## Stock Data

5. Daily stock data

More details in [CRSP website](https://wrds-www.wharton.upenn.edu/data-dictionary/crsp_a_stock/dsf/).

|Variable Name	|Type|	Length|	Description|
| ---- | ----- | ---- | ------ |
|ask	|double	|53|	Ask|
|askhi	|double|	53|	Ask or High Price|
|bid	|double	|53|	Bid|
|bidlo	|double|	53|	Bid or Low Price|
|cfacpr	|double|	53|	Cumulative Factor to Adjust Prices|
|cfacshr	|double|	53|	Cumulative Factor to Adjust Shares/Vol|
|cusip	|string|	8|	CUSIP Header|
|date	|date|	|	Date of Observation|
|hexcd	|double|	53|	Exchange Code Header|
|hsiccd	|double|	53|	Standard Industrial Classification Code|
|issuno	|double|	53|	Nasdaq Issue Number|
|numtrd	|double|	53|	Number of Trades|
|openprc	|double|	53|	Price Alternate|
|permco	|double|	53|	PERMCO|
|permno	|double|	53|	PERMNO|
|prc	|double|	53|	Price or Bid/Ask Average|
|ret	|double|	53|	Returns|
|retx	|double|	53|	Returns without Dividends|
|shrout	|double|	53|	Shares Outstanding|
|vol	|double|	53|	Volume|

In [28]:
data = conn.get_table(library='crsp', table='dse', obs=10)
data

Unnamed: 0,event,date,hsicmg,hsicig,comnam,cusip,dclrdt,dlamt,dlpdt,dlstcd,...,dlretx,dlprc,dlret,shrout,shrenddt,trtscd,trtsendt,nmsind,mmcnt,nsdinx
0,NASDIN,1986-01-06,39.0,399.0,,68391610,,,,,...,,,,,,4.0,1986-01-06,1.0,0.0,1.0
1,NAMES,1986-01-07,39.0,399.0,OPTIMUM MANUFACTURING INC,68391610,,,,,...,,,,,,,,,,
2,NASDIN,1986-01-07,39.0,399.0,,68391610,,,,,...,,,,,,1.0,1986-01-09,1.0,9.0,2.0
3,SHARES,1986-01-07,39.0,399.0,,68391610,,,,,...,,,,3680.0,1986-01-30,,,,,
4,NASDIN,1986-01-10,39.0,399.0,,68391610,,,,,...,,,,,,1.0,1986-01-22,1.0,10.0,2.0
5,NASDIN,1986-01-23,39.0,399.0,,68391610,,,,,...,,,,,,1.0,1986-01-23,1.0,11.0,2.0
6,NASDIN,1986-01-24,39.0,399.0,,68391610,,,,,...,,,,,,1.0,1986-01-26,1.0,10.0,2.0
7,NASDIN,1986-01-27,39.0,399.0,,68391610,,,,,...,,,,,,1.0,1986-01-30,1.0,11.0,2.0
8,NASDIN,1986-01-31,39.0,399.0,,68391610,,,,,...,,,,,,1.0,1986-03-02,1.0,13.0,2.0
9,SHARES,1986-01-31,39.0,399.0,,68391610,,,,,...,,,,3680.0,1986-04-29,,,,,


In [4]:
query = "\
SELECT * \
FROM crsp.dsenames \
WHERE ticker = 'AMZN'\
"

## Get data from CRSP
ticker_history = conn.raw_sql(query)
ticker_history

Unnamed: 0,permno,namedt,nameendt,shrcd,exchcd,siccd,ncusip,ticker,comnam,shrcls,...,naics,primexch,trdstat,secstat,permco,compno,issuno,hexcd,hsiccd,cusip
0,84788.0,1997-05-15,2004-06-09,11.0,3.0,7370.0,2313510,AMZN,AMAZON COM INC,,...,,Q,A,R,15473.0,60015310.0,20733.0,3.0,7370.0,2313510
1,84788.0,2004-06-10,2014-01-14,11.0,3.0,7370.0,2313510,AMZN,AMAZON COM INC,,...,454110.0,Q,A,R,15473.0,60015310.0,20733.0,3.0,7370.0,2313510
2,84788.0,2014-01-15,2020-03-18,11.0,3.0,7370.0,2313510,AMZN,AMAZON COM INC,,...,454113.0,Q,A,R,15473.0,60015310.0,20733.0,3.0,7370.0,2313510
3,84788.0,2020-03-19,2020-12-31,11.0,3.0,7370.0,2313510,AMZN,AMAZON COM INC,,...,454110.0,Q,A,R,15473.0,60015310.0,20733.0,3.0,7370.0,2313510


In [6]:
query = "\
SELECT *\
FROM crsp.dsf \
WHERE permno = 14593 \
AND date BETWEEN '1996-01-01' AND '2020-01-01'"

## Get data from CRSP
crsp = conn.raw_sql(query)

In [7]:
crsp

Unnamed: 0,cusip,permno,permco,issuno,hexcd,hsiccd,date,bidlo,askhi,prc,vol,ret,bid,ask,shrout,cfacpr,cfacshr,openprc,numtrd,retx
0,03783310,14593.0,7.0,8.0,3.0,3571.0,1996-01-02,31.750000,32.250000,32.125000,1249047.0,0.007843,32.000000,32.125000,123118.0,112.0,112.0,32.250000,1423.0,0.007843
1,03783310,14593.0,7.0,8.0,3.0,3571.0,1996-01-03,31.875000,32.875000,32.125000,3848106.0,0.000000,32.125000,32.250000,123118.0,112.0,112.0,32.000000,2281.0,0.000000
2,03783310,14593.0,7.0,8.0,3.0,3571.0,1996-01-04,31.375000,32.375000,31.562500,2686841.0,-0.017510,31.500000,31.625000,123118.0,112.0,112.0,32.375000,2208.0,-0.017510
3,03783310,14593.0,7.0,8.0,3.0,3571.0,1996-01-05,31.375000,34.250000,34.250000,3989514.0,0.085149,34.125000,34.250000,123118.0,112.0,112.0,31.625000,3408.0,0.085149
4,03783310,14593.0,7.0,8.0,3.0,3571.0,1996-01-08,34.000000,35.500000,34.625000,1086787.0,0.010949,34.375000,34.625000,123118.0,112.0,112.0,34.500000,1162.0,0.010949
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6037,03783310,14593.0,7.0,8.0,3.0,3571.0,2019-12-24,282.919708,284.890015,284.269989,12105327.0,0.000951,284.269989,284.279999,4443265.0,4.0,4.0,284.690002,84112.0,0.000951
6038,03783310,14593.0,7.0,8.0,3.0,3571.0,2019-12-26,284.700012,289.980011,289.910004,23320594.0,0.019840,289.899994,289.910004,4443265.0,4.0,4.0,284.820007,171905.0,0.019840
6039,03783310,14593.0,7.0,8.0,3.0,3571.0,2019-12-27,288.119995,293.970001,289.799988,36547580.0,-0.000379,289.820007,289.829987,4384959.0,4.0,4.0,291.119995,265444.0,-0.000379
6040,03783310,14593.0,7.0,8.0,3.0,3571.0,2019-12-30,285.220001,292.690002,291.519989,36075285.0,0.005935,291.640015,291.709991,4384959.0,4.0,4.0,289.459991,266347.0,0.005935


## Example

Download Dow 30 constitutes daily stock data as an example of WRDS API

1. Full list of Dow 30
2. Ticker to PERMNO map
3. Pannel data
4. Long table to wide table

In [3]:
import pandas as pd
import yahoo_fin.stock_info as si

### 1. Full list of Dow 30

Download Dow 30 stocks ticker names from wiki page using package [`yahoo_fin`](https://github.com/atreadw1492/yahoo_fin). It's different from the famous [`yfinance`](https://github.com/ranaroussi/yfinance). The documentation of `yahoo_fin` can be found [here](http://theautomatic.net/yahoo_fin-documentation/#tickers_dow).

In [54]:
dow_tickers_df = si.tickers_dow(True)
dow_tickers_list = si.tickers_dow()
dow_tickers_df

Unnamed: 0,Company,Exchange,Symbol,Industry,Date added,Notes,Index weighting
0,3M,NYSE,MMM,Conglomerate,1976-08-09,As Minnesota Mining and Manufacturing,3.38%
1,American Express,NYSE,AXP,Financial services,1982-08-30,,3.29%
2,Amgen,NASDAQ,AMGN,Biopharmaceutical,2020-08-31,,3.84%
3,Apple,NASDAQ,AAPL,Information technology,2015-03-19,,2.76%
4,Boeing,NYSE,BA,Aerospace and defense,1987-03-12,,4.01%
5,Caterpillar,NYSE,CAT,Construction and Mining,1991-05-06,,3.73%
6,Chevron,NYSE,CVX,Petroleum industry,2008-02-19,Also 1930-07-18 to 1999-11-01,2.07%
7,Cisco Systems,NASDAQ,CSCO,Information technology,2009-06-08,,1.03%
8,Coca-Cola,NYSE,KO,Soft Drink,1987-03-12,Also 1932-05-26 to 1935-11-20,1.01%
9,Disney,NYSE,DIS,Broadcasting and entertainment,1991-05-06,,3.18%


### 2. Ticker to PERMNO map

In [5]:
dow_tickers_tuple_str = str(tuple(dow_tickers_list))

query = "\
SELECT *\
FROM crsp.dsenames \
WHERE ticker in {} \
AND nameendt = '2020-12-31' \
".format(dow_tickers_tuple_str)

ticker_history = conn.raw_sql(query)

convert ticker names to permno code

In [57]:
ticker_permno_df = ticker_history[['permno', 'ticker']]
ticker_to_permno_map = ticker_permno_df.set_index('ticker')
permno_to_ticker_map = ticker_permno_df.set_index('permno')

In [70]:
permno_to_ticker_map.loc[not_complete_ticker_list]

Unnamed: 0_level_0,ticker
permno,Unnamed: 1_level_1
18428,DOW
76076,CSCO
86868,GS
90215,CRM
92611,V


### 3. Pannel data of Dow 30

In [38]:
dow_permno_tuple_str = str(tuple(ticker_to_permno_map.loc[dow_tickers_list].values.reshape((1, -1))[0].tolist()))

query = "\
SELECT *\
FROM crsp.dsf \
WHERE permno in {} \
AND date BETWEEN '1990-01-01' AND '2020-01-01'\
".format(dow_permno_tuple_str)

dow_df = conn.raw_sql(query)

### 4. Dow 30 ret dataframe

In [39]:
dow_df = dow_df.set_index(['permno', 'date'])
dow_ret_df = dow_df['ret']
dow_ret_df.head()

permno   date      
10107.0  1990-01-02    0.020115
         1990-01-03    0.005634
         1990-01-04    0.029412
         1990-01-05   -0.024490
         1990-01-08    0.015342
Name: ret, dtype: float64

In [75]:
dow_ret_df = dow_ret_df.unstack().T

#not_complete_ticker_list = [18428, 76076, 86868, 90215, 92611]

### 5. Merge Dow with given ret data

In [113]:
ret_data = pd.read_csv('/Users/cheng/Google Drive/PhD/Research/Portfolio Selection via TBN/data/Data/permno_ret.csv', parse_dates = True, index_col=0)
ret_data.columns = ret_data.columns.astype(int)

In [172]:
ticker_intersection_set = set(ret_data.columns).intersection(set(dow_ret_df.columns))
ticker_intersection_list = list(ticker_intersection_set)

dow_ret_df_nonlap = dow_ret_df.drop(ticker_intersection_list, axis=1)

data = pd.concat([ret_data, dow_ret_df_nonlap], axis=1)

In [174]:
data.to_csv('/Users/cheng/Google Drive/PhD/Research/Portfolio Selection via TBN/data/Data/stock_return_crsp_159.csv')