# <span style='color:#547DCD'> How to retrieve data from Global Factor Data  </span> 

The database is hosted on Wharton Research Data Service (WRDS) and developed by Theis I. Jensen (Yale) Bryan Kelly (Yale, AQR Capital, and NBER), and Lasse H. Pedersen (Copenhagen Business School and AQR Capital). 

This code shows how to connect to that database and query data from it.

## <span style='color:#7F8BC7'> Preamble  </span> 

### <span style='color:#AA9AC2'> Notebook setup  </span> 

In [10]:
pd.set_option('display.max_columns', None)  # Show all columns
pd.set_option('display.width', 0)  # Use full cell width
pd.set_option('display.expand_frame_repr', False)  # Prevent line breaks

### <span style='color:#AA9AC2'> Package import  </span> 

First the necessary libraries are imported (`dotenv` is not strictly necessary but good for hiding personal information such as usernames and passwords).

In [24]:
from dotenv import load_dotenv

import pandas as pd
import wrds
import os

### <span style='color:#AA9AC2'> Credentials import  </span> 

Password and user-name to Wharton Research Data Services are loaded into environment with `.env` file extension to keep these credentials hidden.

In [25]:
load_dotenv()
wrds_usr = os.getenv("MY_WRDS_USERNAME")
wrds_pw = os.getenv("MY_WRDS_PASSWORD")

## <span style='color:#7F8BC7'> Data import  </span> 

Below a `SQL` query is created and data is queried from the `contrib.global_factor` database hosted on WRDS.

### <span style='color:#AA9AC2'> Initialize connection to WRDS  </span> 

The code below connects this session the WRDS server, so that *all* data available on the website can be pulled into this session via en `SQL` query.

In [26]:
wrds_db = wrds.Connection(
    wrds_username = wrds_usr, 
    wrds_password = wrds_pw)

Loading library list...
Done


### <span style='color:#AA9AC2'> Available variables  </span> 

The code below creates lists of the available variables in the database `contrib.global_factor`.

In [27]:
# Import excel sheets from official GitHub page (created by Global Factor Data team)
countries = pd.read_excel('https://github.com/bkelly-lab/ReplicationCrisis/raw/master/GlobalFactors/Country%20Classification.xlsx')
variables = pd.read_excel('https://github.com/bkelly-lab/ReplicationCrisis/raw/master/GlobalFactors/Factor%20Details.xlsx')

The dataframes of available data are cleaned and corrected in the code below. Notice we only select developed or emerging markets.

In [28]:
countries = countries[countries['msci_development'].isin(('developed', 'emerging'))][['excntry', 'msci_development']]
variables = variables[variables['abr_jkp'].notna()][['abr_jkp', 'name_new']]

The variables we can select are shown below.

In [29]:
print(variables.T)
print("")
print(countries.T)

                                     0                          3                           4                              6                     7                                                  9                                           10                            11                             12                     19              22                24                  26                 27                       30              32            34                36                      38                     39                                 40               42                             44                         47                               49                 51               53                             56            57                        59                    60                              61                                        62                     63                      64                      65                66                 68                           

### <span style='color:#AA9AC2'> Querying data from WRDS  </span> 

This is an example query that pulls data from the previously talked about database. The SQL used for this can be found in the `query.sql` file in this folder.

In [30]:
with open("query.sql", "r") as file: 
    sql_query = file.read()

data = wrds_db.raw_sql(sql_query)