# Order Book as a Primary Source of Market Data

The order book is a real-time record of buy and sell orders in a particular market. It reflects the current supply and demand for a particular asset (like stocks) and provides insight into potential future price movements. This information is crucial for traders and investors to make informed decisions about buying or selling securities.

## Data Tiers in the United States

### Level I

- Provides basic real-time bid and ask prices.
- This information is readily available from various online sources.
- Useful for understanding the current market prices but lacks detailed information about market depth.

### Level II

- Adds more granularity by providing information about bid and ask prices from specific market makers.
- Also includes details about the size and time of recent transactions, giving traders a better understanding of the liquidity in the market.

### Level III

- Offers the highest level of detail and functionality.
- Allows market makers and exchange member firms to enter or change quotes, execute orders, and confirm trades.
- Access to Level III quotes is crucial for registered brokers to meet best execution requirements.

## Data Examples

### Level I Data

If you check a stock quote on a financial news website or your brokerage platform, you're likely seeing Level I data. It typically shows the current bid and ask prices for a particular stock.

### Level II Data

When using a trading platform that provides more detailed market information, you might see Level II data. This could include a list of market makers and their respective bid and ask prices, along with recent transaction information.

### Level III Data

Only market makers and exchange member firms have access to Level III data. For instance, a market maker could use Level III data to enter new quotes, update existing quotes, execute orders, and confirm trades, all in real-time.


## Trading Activity and Messages:

Trading activity generates a significant number of messages about orders sent by market participants. These messages typically adhere to the Financial Information eXchange (FIX) protocol. FIX is an industry-standard protocol for the real-time exchange of securities transactions and market data. Alternatively, exchanges may use their native protocols for communication.

### FIX Protocol:

FIX is a set of rules designed to facilitate fast and reliable electronic communication for securities transactions. It standardizes the way financial transactions are communicated, allowing different entities to connect and trade seamlessly. FIX is widely adopted in the financial industry, including stock exchanges, investment banks, and asset management firms.

Example: Consider a scenario where a trader at an investment bank wants to execute a trade with a counterparty. The two entities use the FIX protocol to communicate seamlessly. The FIX messages exchanged between them include details about the trade, such as the security being traded, the quantity, price, and other relevant information.

### Trading Activity and Messages:

Trading activity generates a significant number of messages about orders sent by market participants. These messages typically adhere to the Financial Information eXchange (FIX) protocol. FIX is an industry-standard protocol for the real-time exchange of securities transactions and market data. Alternatively, exchanges may use their native protocols for communication.



### Market Makers:

**Role:** Market makers facilitate the buying and selling of financial instruments by providing liquidity to the market.

**Primary Activity:** They continuously quote bid and ask prices, manage risk, and aim to profit from the bid-ask spread.

**Examples:** Citadel Securities, Jane Street, Susquehanna International Group (SIG), Virtu Financial.

### Investment Banks (IBD - Investment Banking Division):

**Role:** Investment banks provide a wide range of financial services, including underwriting, mergers and acquisitions (M&A) advisory, and securities trading.

**Primary Activity:** Investment banking divisions may engage in market-making activities, but their focus is often on advisory services, capital raising, and strategic financial transactions.

**Examples:** Goldman Sachs, JPMorgan Chase, Morgan Stanley.

### Hedge Funds:

**Role:** Hedge funds are investment funds that pool capital from accredited individuals or institutional investors and employ various strategies to generate returns.

**Primary Activity:** Hedge funds typically engage in active trading across different asset classes, including equities, fixed income, derivatives, and currencies.

**Examples:** Bridgewater Associates, Renaissance Technologies, Citadel (note: Citadel operates both as a market maker and a hedge fund).

### Asset Management Firms:

**Role:** Asset management firms manage investment portfolios on behalf of clients, such as individuals, institutions, and pension funds.

**Primary Activity:** Asset managers focus on investing in a diversified portfolio of securities to achieve specific financial objectives for their clients.

**Examples:** BlackRock, Vanguard, Fidelity Investments.

While there can be some overlap, especially in the case of diversified financial firms, these categories represent different roles and functions within the financial industry. Market makers are specialized entities focused on providing liquidity, while investment banks, hedge funds, and asset managers may have broader mandates that include market-making activities or other financial services.


## Interactive Brokers Interface:

Interactive Brokers (IB) is a well-known brokerage firm, and it provides an interface for traders and developers to interact with its systems. This interface supports the FIX protocol for communication, allowing users to submit trade orders, receive market data, and manage their accounts using the standardized FIX messaging format.

Use Case: Traders using Interactive Brokers can employ the FIX protocol to programmatically execute trades, monitor market data, and manage their portfolios. This interface provides a standardized and efficient way to interact with Interactive Brokers' trading infrastructure.

### Interactive Brokers supports the FIX protocol as part of its interface, enabling traders to communicate seamlessly with its trading systems using FIX messages.


## Nasdaq offers a TotalView ITCH direct data-feed protocol

While FIX has a dominant large market share, exchanges also offer native protocols. 

The Nasdaq offers a TotalView ITCH direct data-feed protocol that allows subscribers to track individual orders for equity instruments from placement to execution or cancellation.

The ITCH Specifications


# AlgoSeek:

**Role:** AlgoSeek provides historical intraday data, particularly minute bar data for equity quote and trade information. It caters to quantitative analysts and algorithmic traders, offering high-quality historical data for research and strategy development.

**Notable Feature:** AlgoSeek's minute bars include comprehensive information such as OHLCV details, bid-ask spread, tick information, and more.

# Securities Information Processor (SIP):

**Role:** SIP refers to the Securities Information Processor, which is a centralized system responsible for consolidating and disseminating trade and quote information from various exchanges. The SIP is crucial for providing consolidated market data.

**Function:** SIP helps ensure that investors and market participants receive a unified and comprehensive view of trading activity across different exchanges.

# Polygon (formerly known as AlgoTrader):

**Role:** Polygon is a financial technology company that provides market data and trading infrastructure services. It offers real-time and historical market data, as well as tools for algorithmic trading.

**Notable Feature:** Polygon's platform includes a wide range of data, including equities, cryptocurrencies, and more. It aims to provide low-latency and reliable market data for developers and traders.

# Interactive Brokers:

**Role:** Interactive Brokers is a well-known brokerage firm that provides a trading platform for retail and institutional clients. It facilitates the execution of trades across various asset classes, including stocks, options, futures, and forex.

**Notable Feature:** Interactive Brokers also offers an API (Application Programming Interface) that allows developers to programmatically interact with its trading platform. The API supports algorithmic trading and the integration of third-party applications.

# Are They Similar?

While AlgoSeek, SIP, Polygon, and Interactive Brokers are all involved in the financial industry, they are not directly similar in terms of their functions or offerings:

- **AlgoSeek:** Primarily provides historical intraday data for strategy development.

- **SIP (Securities Information Processor):** Centralized system for consolidating and disseminating trade and quote information from exchanges.

- **Polygon:** Offers real-time and historical market data along with trading infrastructure services.

- **Interactive Brokers:** A brokerage platform that facilitates the execution of trades and provides an API for algorithmic trading.

In summary, AlgoSeek, SIP, Polygon, and Interactive Brokers serve different roles in the financial ecosystem, with AlgoSeek and Polygon focusing on data services, SIP on data consolidation, and Interactive Brokers on brokerage services.


## How to work with Fundamental data
Fundamental data pertains to the economic drivers that determine the value of securities. The nature of the data depends on the asset class: - For equities and corporate credit, it includes corporate financials as well as industry and economy-wide data. - For government bonds, it includes international macro-data and foreign exchange. - For commodities, it includes asset-specific supply-and-demand determinants, such as weather data for crops.

We will focus on equity fundamentals for the US, where data is easier to access. There are some 13,000+ public companies worldwide that generate 2 million pages of annual reports and 30,000+ hours of earnings calls. In algorithmic trading, fundamental data and features engineered from this data may be used to derive trading signals directly, for example as value indicators, and are an essential input for predictive models, including machine learning models.

### Financial statement data
The Securities and Exchange Commission (SEC) requires US issuers, that is, listed companies and securities, including mutual funds to file three quarterly financial statements (Form 10-Q) and one annual report (Form 10-K), in addition to various other regulatory filing requirements.

Since the early 1990s, the SEC made these filings available through its Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system. They constitute the primary data source for the fundamental analysis of equity and other securities, such as corporate credit, where the value depends on the business prospects and financial health of the issuer.

## Automated processing using XBRL markup
Automated analysis of regulatory filings has become much easier since the SEC introduced XBRL, a free, open, and global standard for the electronic representation and exchange of business reports. XBRL is based on XML; it relies on taxonomies that define the meaning of the elements of a report and map to tags that highlight the corresponding information in the electronic version of the report. One such taxonomy represents the US Generally Accepted Accounting Principles (GAAP).

The SEC introduced voluntary XBRL filings in 2005 in response to accounting scandals before requiring this format for all filers since 2009 and continues to expand the mandatory coverage to other regulatory filings. The SEC maintains a website that lists the current taxonomies that shape the content of different filings and can be used to extract specific items.

There are several avenues to track and access fundamental data reported to the SEC: - As part of the EDGAR Public Dissemination Service (PDS), electronic feeds of accepted filings are available for a fee. - The SEC updates RSS feeds every 10 minutes, which list structured disclosure submissions. - There are public index files for the retrieval of all filings through FTP for automated processing. - The financial statement (and notes) datasets contain parsed XBRL data from all financial statements and the accompanying notes.

The SEC also publishes log files containing the internet search traffic for EDGAR filings through SEC.gov, albeit with a six-month delay.

## Efficient data storage with pandas
We'll be using many different data sets in this book, and it's worth comparing the main formats for efficiency and performance. In particular, we compare the following:

### CSV: 
Comma-separated, standard flat text file format.

### HDF5: 
Hierarchical data format, developed initially at the National Center for Supercomputing, is a fast and scalable storage format for numerical data, available in pandas using the PyTables library.

### Parquet: 
A binary, columnar storage format, part of the Apache Hadoop ecosystem, that provides efficient data compression and encoding and has been developed by Cloudera and Twitter. It is available for pandas through the pyarrow library, led by Wes McKinney, the original author of pandas.