# WRDS Data Pull

This notebook shows how to pull the different tables from WRDS using the wrds package.

In [1]:
import wrds

import config

WRDS_USERNAME = config.WRDS_USERNAME

In [2]:
db = wrds.Connection(wrds_username=WRDS_USERNAME)

Loading library list...
Done


In [3]:
# RepRisk - Company Identifiers

RepRisk_company_identifiers = db.get_table(library='reprisk', table='v2_company_identifiers')
RepRisk_company_identifiers.head()

Unnamed: 0,reprisk_id,company_name,headquarters_country,headquarters_country_isocode,sectors,url,isins,primary_isin,no_reported_risk_exposure
0,10,Acer Inc,Taiwan,TW,Technology Hardware and Equipment,https://www.acer.com,US0044341065 | US0044342055 | TW0002353000,TW0002353000,False
1,100,Rio Tinto PLC,United Kingdom of Great Britain and Northern I...,GB,Mining,https://www.riotinto.com/,GB0007406639 | BRRIOTBDR007 | ARDEUT112638 | G...,GB0007188757,False
2,1000,Terrane Metals Corp,Canada,CA,Mining,terranemetals.com,CA88103A1167 | US88103A3068 | CA88103A1084 | C...,CA88103A1084,False
3,10000,RAK Properties PJSC,United Arab Emirates,AE,Financial Services,https://www.rakproperties.ae/,AER000601016,AER000601016,True
4,100000,BLUECOM Co Ltd,"Korea, the Republic of (South Korea)",KR,Technology Hardware and Equipment,http://www.bluec.co.kr/eng/,KR7033560004,KR7033560004,False


In [4]:
# RepRisk - Risk Incidents available since 2007

RepRisk_risk_incidents = db.get_table(library='reprisk', table='v2_risk_incidents')
RepRisk_risk_incidents.head()

Unnamed: 0,reprisk_id,story_id,incident_date,unsharp_incident,related_countries,related_countries_codes,severity,reach,novelty,environment,...,ungc_principle_1,ungc_principle_2,ungc_principle_3,ungc_principle_4,ungc_principle_5,ungc_principle_6,ungc_principle_7,ungc_principle_8,ungc_principle_9,ungc_principle_10
0,10,826,2007-02-28,0,China,CN,2,2,1,F,...,F,F,F,F,F,F,F,F,F,F
1,10,1793,2007-09-09,0,Germany,DE,1,2,1,F,...,F,F,F,F,F,F,F,F,F,F
2,10,2335,2007-11-26,0,Unspecified,,1,2,1,F,...,F,F,F,F,F,F,F,F,F,F
3,10,2365,2007-08-21,1,China,CN,2,1,2,F,...,T,T,F,F,T,F,F,F,F,F
4,10,2513,2007-11-30,0,Russian Federation;South Africa,RU;ZA,1,1,2,F,...,T,F,F,F,F,F,F,F,F,F


We are going to keep variables until governance, the next ones being non relevant for our project.

In [5]:
# RepRisk - Metrics available since 2007

RepRisk_metrics = db.get_table(library='reprisk', table='v2_metrics', obs=10)
RepRisk_metrics.head()

Unnamed: 0,reprisk_id,date,current_rri,trend_rri,peak_rri,peak_rri_date,reprisk_rating,country_sector_average,principle1_human_rights,principle2_human_rights,principle3_labour,principle4_labour,principle5_labour,principle6_labour,principle7_environment,principle8_environment,principle9_environment,principle10_anti_corruption
0,10,2007-01-01,24,24,24,2007-01-01,AA,13,,,,,,,,,,
1,10,2007-01-02,24,24,24,2007-01-01,AA,13,,,,,,,,,,
2,10,2007-01-03,24,24,24,2007-01-01,AA,13,,,,,,,,,,
3,10,2007-01-04,24,24,24,2007-01-01,AA,13,,,,,,,,,,
4,10,2007-01-05,24,24,24,2007-01-01,AA,13,,,,,,,,,,


Current RRI: The Current RRI denotes the current level of media and stakeholder attention of a company related to ESG issues.

Trend RRI: Difference in the RepRisk Index (RRI) between current date and the date 30 days ago.

RepRisk Rating: The RepRisk Rating (RRR) facilitates corporate benchmarking against a peer group and the sector, as well as integration of ESG and business conduct risks into business processes. It combines the company-specific ESG risk exposure (provided by the Peak RRI) and the Country- Sector ESG risk exposure (provided by the Country-Sector Average value of a company (see below for details).

We can join the RepRisk tables on `reprisk_id` to get a full table with all the information.

In [6]:
# Markit Securities Finance Analytics - American Equities available since 2002

db.get_table(library='msfanly', table='msfaamer')

ProgrammingError: (psycopg2.errors.UndefinedTable) relation "msfanly.msfaamer" does not exist
LINE 1: SELECT * FROM msfanly.msfaamer  OFFSET 0;
                      ^

[SQL: SELECT * FROM msfanly.msfaamer  OFFSET 0;]
(Background on this error at: https://sqlalche.me/e/14/f405)

In [7]:
db.get_table(library='markit_msf_analytics_eqty_amer', table='msfaamer')

ProgrammingError: (psycopg2.errors.UndefinedTable) relation "markit_msf_analytics_eqty_amer.msfaamer" does not exist
LINE 1: SELECT * FROM markit_msf_analytics_eqty_amer.msfaamer  OFFSE...
                      ^

[SQL: SELECT * FROM markit_msf_analytics_eqty_amer.msfaamer  OFFSET 0;]
(Background on this error at: https://sqlalche.me/e/14/f405)

In [8]:
db.list_tables(library="msfanly")

['amereqty2002',
 'amereqty2003',
 'amereqty2004',
 'amereqty2005',
 'amereqty2006',
 'amereqty2007',
 'amereqty2008',
 'amereqty2009',
 'amereqty2010',
 'amereqty2011',
 'amereqty2012',
 'amereqty2013',
 'amereqty2014',
 'amereqty2015',
 'amereqty2016',
 'amereqty2017',
 'amereqty2018',
 'amereqty2019',
 'amereqty2020',
 'amereqty2021',
 'amereqty2022',
 'amereqty2023',
 'amereqty2024',
 'asiaexjapauseqty2002',
 'asiaexjapauseqty2003',
 'asiaexjapauseqty2004',
 'asiaexjapauseqty2005',
 'asiaexjapauseqty2006',
 'asiaexjapauseqty2007',
 'asiaexjapauseqty2008',
 'asiaexjapauseqty2009',
 'asiaexjapauseqty2010',
 'asiaexjapauseqty2011',
 'asiaexjapauseqty2012',
 'asiaexjapauseqty2013',
 'asiaexjapauseqty2014',
 'asiaexjapauseqty2015',
 'asiaexjapauseqty2016',
 'asiaexjapauseqty2017',
 'asiaexjapauseqty2018',
 'asiaexjapauseqty2019',
 'asiaexjapauseqty2020',
 'asiaexjapauseqty2021',
 'asiaexjapauseqty2022',
 'asiaexjapauseqty2023',
 'asiaexjapauseqty2024',
 'auseqty2002',
 'auseqty2003',
 '

In [9]:
db.list_tables(library="markit_msf_analytics_eqty_amer")

['amereqty2002',
 'amereqty2003',
 'amereqty2004',
 'amereqty2005',
 'amereqty2006',
 'amereqty2007',
 'amereqty2008',
 'amereqty2009',
 'amereqty2010',
 'amereqty2011',
 'amereqty2012',
 'amereqty2013',
 'amereqty2014',
 'amereqty2015',
 'amereqty2016',
 'amereqty2017',
 'amereqty2018',
 'amereqty2019',
 'amereqty2020',
 'amereqty2021',
 'amereqty2022',
 'amereqty2023',
 'amereqty2024',
 'msfa_amer_chars']

It appears that the full table for Markit Securities Finance Analytics - American Equities know as `msfaamer` is no longer available in the `msfanly` library, neither in the `markit_msf_analytics_eqty_amer` library. We can get the tables year by year instead.

In [10]:
# Markit Securities Finance Analytics - American Equities 2024

MarkitSecurities_american_equities_2024 = db.get_table(library='msfanly', table='amereqty2024', obs=10)
MarkitSecurities_american_equities_2024.head()

Unnamed: 0,datadate,dxlid,isin,sedol,cusip,quick,instrumentname,marketarea,valueonloan,quantityonloan,...,lendablequantitystability,lendablevaluestability,lenderquantityonloanstability,lendervalueonloanstability,indicativefee1day,indicativefee7day,indicativerebate1day,indicativerebate7day,saf,sar
0,2024-01-01,DX00000021,US98956P1021,2783815,98956P102,,Zimmer Biomet Holdings Inc,US Equity (S&P500),227014800.0,1855436.0,...,74.12062,74.0883,52.82086,52.82014,,0.002162,,0.051038,56.3462,475.6539
1,2024-01-01,DX00000023,US7901481009,2768663,790148100,,St Joe Co,US Equity (RUSSELL 2000),13916060.0,229522.0,...,83.24494,83.28563,71.45999,71.46178,,0.002615,,0.050585,77.8825,454.1175
2,2024-01-01,DX00000026,US6934751057,2692665,693475105,,Pnc Financial Services Group Inc,US Equity (S&P500),978845200.0,6262167.0,...,77.79643,77.79671,78.8587,78.82349,,0.00281,,0.05039,52.4547,479.5453
3,2024-01-01,DX00000029,US6516391066,2636607,651639106,,Newmont Corporation,US Equity (S&P500),352309900.0,8448317.0,...,87.88121,87.88775,94.43805,94.39819,,0.002624,,0.050576,38.5648,493.4352
4,2024-01-01,DX00000030,US5951121038,2588184,595112103,,Micron Technology Inc,US Equity (S&P500),232887600.0,2693965.0,...,74.48444,74.49966,71.63672,71.67267,,0.002648,,0.050552,53.7643,478.2357


Short interest ratio (shares on loan / shares outstanding) = ShortLoanQuantity/SHROUT or QuantityOnLoan/SHROUT
Loan supply ratio (shares available to be lent / shares outstanding) = LendableQuantity/SHROUT
Loan utilization ratio (shares demanded / shares supplied) = Utilisation
Loan fee = IndicativeFee

A few other variables in Markit that would be interesting to explore are LenderConcentration, BorrowerConcentration, and InventoryConcentration.

We can join RepRisk and Markit Securities Finance Analytics - American Equities tables on `primary_isin` or `isins` but this column needs to be reformatted before. We also need to check what percentage of stocks in Markit and in RepRisk have a populated ISIN. If ISIN is not well-populated, then we may need to merge on names.

The RepRisk tables can be joined on `reprisk_id`.

For shares outstanding we need to get the variable in CRSP Daily Stock. Then the table has to be joined to the Markit Table. A documentation is available [here](https://wrds-www.wharton.upenn.edu/pages/wrds-research/database-linking-matrix/linking-markit-with-crsp-2/).

In [20]:
# CRSP Daily Stock available since 1925

CRSP_daily_stock = db.get_table(library='crspq', table='dsf', columns=['cusip', 'date', 'permco', 'permno', 'shrout'], obs=10)
CRSP_daily_stock.head()

Unnamed: 0,cusip,date,permco,permno,shrout
0,68391610,1986-01-07,7952,10000,3680.0
1,68391610,1986-01-08,7952,10000,3680.0
2,68391610,1986-01-09,7952,10000,3680.0
3,68391610,1986-01-10,7952,10000,3680.0
4,68391610,1986-01-13,7952,10000,3680.0
