<img alt="QuantRocket logo" src="https://www.quantrocket.com/assets/img/notebook-header-logo.png">

<a href="https://www.quantrocket.com/disclaimer/">Disclaimer</a>

# IB Sample Data Collection overview

IB sample data listings are pre-loaded in the securities master database. Therefore, the data collection process consists of the following:

1. create a "universe" of sample securities to use in your backtest
2. collect historical price data for your universe

# Define a universe

QuantRocket relies heavily on the concept of universes, which are user-defined groupings of securities. Universes provide a convenient way to refer to and manipulate groups of securities when collecting historical data, running a trading strategy, etc. You can create universes based on exchanges, security types, sectors, liquidity, or any criteria you like. A universe could consist of one or two securities or one or two thousand securities.

## Download master file
To create our first universe, we will download a CSV of all sample stock listings for all exchanges, pare down the CSV to US stock listings, then upload the pared down CSV to create our universe.

First download the listings from the securities master database to a CSV file:

In [1]:
from quantrocket.master import download_master_file
download_master_file("securities.csv", sec_types="STK")

> In QuantRocket terminology, the word "collect" refers to retrieving data from IB and saving it to your QuantRocket databases. The word "download" refers to retrieving data out of your QuantRocket databases into a file for use by you or your algorithms.

We can load the CSV into pandas:

In [2]:
import pandas as pd
securities = pd.read_csv("securities.csv")
securities.head()

Unnamed: 0,ConId,Symbol,Etf,SecType,PrimaryExchange,Currency,LocalSymbol,TradingClass,MarketName,LongName,...,UnderSymbol,UnderSecType,MarketRuleIds,Strike,Right,Cusip,EvRule,EvMultiplier,Delisted,DateDelisted
0,8719,JNJ,0,STK,NYSE,USD,JNJ,JNJ,JNJ,JOHNSON & JOHNSON,...,,,"26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,2...",0.0,,ISIN:US4781601046,,0.0,0,
1,13977,XOM,0,STK,NYSE,USD,XOM,XOM,XOM,EXXON MOBIL CORP,...,,,"26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,2...",0.0,,ISIN:US30231G1022,,0.0,0,
2,228891,BP.,0,STK,LSE,GBP,BP.,BP.,BP.,BP PLC,...,,,19191919191919191919,0.0,,ISIN:GB0007980591,,0.0,0,
3,265598,AAPL,0,STK,NASDAQ,USD,AAPL,NMS,NMS,APPLE INC,...,,,"26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,2...",0.0,,ISIN:US0378331005,,0.0,0,
4,272093,MSFT,0,STK,NASDAQ,USD,MSFT,NMS,NMS,MICROSOFT CORP,...,,,"26,26,26,26,26,26,26,26,26,26,26,26,26,26,26,2...",0.0,,ISIN:US5949181045,,0.0,0,


Note the `ConId` column in the CSV file: ConId is short for "contract ID" and is IB's unique identifier for a particular security or contract. ConIds are used throughout QuantRocket to refer to securities.

## Filter master file

QuantRocket supports working with large universes such as every stock on an exchange. However, for this introductory tutorial we will pare down the master file of sample data to US stock listings. This will help illustrate the flexibility of universe creation (and will also simplify the IB market data permissions required to collect historical data). 

To pare down the master file we'll use `qgrid`, a tool that provides Excel-like filtering and sorting of DataFrames inside Jupyter notebooks. We limit the number of columns to make the grid more readable:

In [None]:
import qgrid
widget = qgrid.show_grid(securities[["ConId","PrimaryExchange","Symbol","LongName","Sector","Industry","Category"]])
widget

> (this is an image of a grid, execute the above cell to see the actual grid)

![QGrid widget](../static/qgrid-widget.png)

Use the grid above to filter the DataFrame to NYSE and NASDAQ sample stocks. Then use `get_changed_df()` to access the filtered DataFrame:

In [4]:
filtered_securities = widget.get_changed_df()
filtered_securities.head()

Unnamed: 0,ConId,PrimaryExchange,Symbol,LongName,Sector,Industry,Category
0,8719,NYSE,JNJ,JOHNSON & JOHNSON,"Consumer, Non-cyclical",Pharmaceuticals,Medical-Drugs
1,13977,NYSE,XOM,EXXON MOBIL CORP,Energy,Oil&Gas,Oil Comp-Integrated
3,265598,NASDAQ,AAPL,APPLE INC,Technology,Computers,Computers
4,272093,NASDAQ,MSFT,MICROSOFT CORP,Technology,Software,Applications Software


## Create universe

To create a universe from the filtered securities, we write the DataFrame to a CSV and upload the CSV. (Only the ConId column in the CSV matters for this purpose; other columns are ignored.) We'll name the universe "demo-stocks":

In [5]:
filtered_securities.to_csv("filtered_securities.csv")

In [6]:
from quantrocket.master import create_universe
create_universe("demo-stocks", infilepath_or_buffer="filtered_securities.csv")

{'code': 'demo-stocks', 'provided': 4, 'inserted': 4, 'total_after_insert': 4}

The function output confirms the name and size of our new universe.

Now that we have a universe, the next step is to collect historical data for our backtest.

***

## *Next Up*

Part B: [Collect Historical Data](PartB-Collect-Historical-Data.ipynb)