<img alt="QuantRocket logo" src="https://www.quantrocket.com/assets/img/notebook-header-logo.png">

<a href="https://www.quantrocket.com/disclaimer/">Disclaimer</a>

# Universe Selection

The Alpha Architect white paper calls for the trading strategy to run on the universe of NYSE stocks, excluding financials, REITs, and ADRs. Thus our first step is to create universes that define these different groups of securities.

> Note the use of the `domain="sharadar"` parameter in the functions below. This parameter is required because we want to define our universes in the Sharadar securities master database (`quantrocket.master.sharadar.sqlite`), not in the default IB securities master database (`quantrocket.master.main.sqlite`).

## All NYSE securities

First, download a CSV of all NYSE securities from the Sharadar master: 

In [1]:
from quantrocket.master import download_master_file
download_master_file("sharadar_nyse_securities.csv", exchanges="NYSE", domain="sharadar")

We can use the file to create the universe of all NYSE securities:

In [2]:
from quantrocket.master import create_universe
create_universe("nyse-stk", "sharadar_nyse_securities.csv", domain="sharadar")

{'code': 'nyse-stk',
 'provided': 5067,
 'inserted': 5067,
 'total_after_insert': 5067}

## Financials

Next we create a universe of financials. We'll exclude this universe (along with REITs and ADRs) when it comes time to run our backtest. 

First load the securities into Pandas and list the sectors:

In [3]:
import pandas as pd
nyse_securities = pd.read_csv("sharadar_nyse_securities.csv")
nyse_securities.Sector.unique()

array(['Industrials', 'Consumer Cyclical', 'Utilities', 'Healthcare',
       'Technology', 'Real Estate', 'Financial Services',
       'Basic Materials', 'Energy', nan, 'Communication Services',
       'Consumer Defensive'], dtype=object)

In the Sharadar data, the financial sector is called "Financial Services". We filter the DataFrame to stocks in this sector, write them to a file (we use an in-memory file so as not to clutter the hard drive), and upload the file to create the universe of financial stocks:

In [4]:
import io
f = io.StringIO()
nyse_securities[nyse_securities.Sector == "Financial Services"].to_csv(f)
create_universe("nyse-financials", f, domain="sharadar")

{'code': 'nyse-financials',
 'provided': 708,
 'inserted': 708,
 'total_after_insert': 708}

## REITS

Next we create a universe of REITs. From inspecting the master file we know that REITs are identified in the "Industry" column:

In [5]:
f = io.StringIO()
nyse_securities[nyse_securities.Industry.fillna("").str.contains("REIT")].to_csv(f)
create_universe("nyse-reits", f, domain="sharadar")

{'code': 'nyse-reits',
 'provided': 487,
 'inserted': 487,
 'total_after_insert': 487}

## ADRs

To create a universe of ADRs, we can take advantage of the "Category" field in the Sharadar data, which contains this information. First have a peek:

In [6]:
nyse_securities.Category.unique()

array(['Domestic', 'Canadian', 'ADR', nan], dtype=object)

In [7]:
nyse_securities[nyse_securities.Category == "ADR"][["Symbol","LongName","Category"]].head()

Unnamed: 0,Symbol,LongName,Category
6,MSC,Studio City International Holdings Ltd,ADR
11,SB-PD,Safe Bulkers Inc,ADR
12,SB-PC,Safe Bulkers Inc,ADR
13,SAN-PB,Banco Santander Sa,ADR
26,RBS-PS,Royal Bank Of Scotland Group Plc,ADR


Then create the ADR universe:

In [8]:
f = io.StringIO()
nyse_securities[nyse_securities.Category == "ADR"].to_csv(f)
create_universe("nyse-adrs", f, domain="sharadar")

{'code': 'nyse-adrs',
 'provided': 578,
 'inserted': 578,
 'total_after_insert': 578}

***

## *Next Up*

Part 3: [Interactive Strategy Development](Part3-Interactive-Strategy-Development.ipynb)