A modern, high-performance C++20 library for parsing NASDAQ TotalView-ITCH 5.0 protocol data feeds. This parser is designed for maximum speed, minimal memory overhead, and type safety, making it ideal for latency-sensitive financial applications, market data analysis, and quantitative research.
- Project Philosophy
- Key Features
- Architectural Overview
- Getting Started
- Dependency Management
- Comprehensive Usage Guide
- API Reference: ITCH 5.0 Message Types
- Performance
- Coding Standards
- Frequently Asked Questions (FAQ)
- Contributing
- License
- References
The design of this ITCH parser is guided by three principles:
-
Performance Above All: In market data processing, performance is not just a feature; it's a requirement. This library is architected to eliminate unnecessary overhead. It avoids dynamic memory allocations, data copies, and stream I/O in its critical path. The goal is to convert raw binary data into a usable C++ struct as efficiently as possible.
-
Correctness and Safety: Financial data is unforgiving. A single misinterpreted byte can corrupt analysis. We prioritize correctness by adhering strictly to the ITCH 5.0 specification and ensuring data is represented in a type-safe manner.
-
Modern and Maintainable C++: The library leverages C++20 and above to provide a clean, maintainable, and efficient implementation. We use modern features to create a robust and developer-friendly API that is both powerful and easy to use correctly.
- High Throughput: Capable of multi-gigabyte-per-second parsing speeds on modern hardware.
- Zero-Allocation Core: The primary parsing loop performs zero dynamic memory allocations, minimizing latency and jitter.
- Type-Safe API: All ITCH messages are deserialized into a
std::variantof dedicated, packedstructs, ensuring compile-time safety. - Specification Compliant: Faithfully implements the message formats outlined in the Nasdaq TotalView-ITCH 5.0 specification.
- Flexible Parsing Strategies:
- Eagerly parse a buffer into a
std::vectorfor convenience. - Use a callback for efficient, low-memory stream processing of large files.
- Filter messages by type at the parsing stage to reduce processing load.
- Eagerly parse a buffer into a
- Cross-Platform: Compatible with any platform supporting a C++20 compiler and CMake.
- Minimal Dependencies: Zero runtime dependencies. Test and benchmark libraries are managed by
vcpkg.
The library is built on some key principles to deliver maximum performance and safety.
To achieve its performance, the parser uses a zero-overhead deserialization strategy. The parser achieves its speed by reading messages directly from the raw data stream without performing heavy data conversions. Each message type is defined in a compact format that matches exactly how the data appears on the network. This means the parser can quickly copy incoming bytes into a message structure in memory, skipping the usual per-field decoding steps. The result is near-zero overhead when reading and processing messages.
All possible ITCH messages are handled using a single, modern C++ object: itch::Message, which is a std::variant. This approach provides significant advantages over traditional designs:
- Compile-Time Safety: The compiler verifies that your code handles every possible message type. This prevents unexpected errors at runtime if a message type is overlooked.
- High Performance: This design allows the compiler to generate highly optimized code for processing different messages, which is typically faster than conventional virtual function lookups.
- Efficient Memory Layout: Messages are stored contiguously in memory, which improves CPU cache utilization and overall processing speed.
The ITCH protocol specifies that all multi-byte numbers use a big-endian (network) byte order. Most modern CPUs, however, use a little-endian format.
The parser handles this discrepancy automatically. After a message is read into a struct, the library transparently converts all multi-byte fields (such as prices and quantities) from the network order to the native order of the machine. This ensures all numerical data is immediately correct and ready for use without requiring any manual conversion.
- C++ Compiler: A compiler with full C++20 support (e.g., GCC 10+, Clang 12+, MSVC 19.29+).
- CMake: Version 3.20 or newer.
-
Clone the Repository
git clone https://github.com/bbalouki/itchcpp.git cd itchcpp -
Configure with CMake This step generates the build system and automatically downloads any developer dependencies if needed.
cmake -S . -B build -
Compile the Code Build the library and any example targets.
cmake --build build --config Release
The project includes unit tests and performance benchmarks that can be built by enabling the corresponding CMake options.
-
To build and run tests:
# Configure with tests enabled cmake -S . -B build -DITCH_BUILD_TESTS=ON # Build cmake --build build --config Release # Run tests from the build directory ctest --test-dir build --output-on-failure
-
To build and run benchmarks:
# Configure with benchmarks enabled cmake -S . -B build -DITCH_BUILD_BENCHMARKS=ON # Build cmake --build build --config Release # Run benchmarks ./build/benchmarks/parser_bench
This library has zero external runtime dependencies.
For development purposes, it uses two popular libraries:
- Google Test: For unit testing.
- Google Benchmark: For performance microbenchmarks.
These are handled automatically by Microsoft's vcpkg package manager. When you enable tests or benchmarks, You will need to install them on your system separately or install them in manifest mode.
Download some sample data here
This is the simplest approach. It reads an entire file into memory and returns a std::vector of parsed messages. This method is convenient for smaller files but may consume significant memory for large datasets.
#include "itch/parser.hpp"
#include <fstream>
#include <iostream>
#include <vector>
int main(int argc, char* argv[]) {
if (argc != 2) {
std::cerr << "Usage: " << argv[0] << " <path_to_itch_file>\n";
return 1;
}
std::ifstream file(argv[1], std::ios::binary);
if (!file) {
std::cerr << "Error: Cannot open file " << argv[1] << "\n";
return 1;
}
itch::Parser parser;
try {
// Parse the entire file stream into a vector of messages.
std::vector<itch::Message> messages = parser.parse(file);
std::cout << "Successfully parsed " << messages.size() << " messages.\n";
// You can now iterate over the messages for analysis.
for (const auto& msg : messages) {
// ... process each message ...
}
} catch (const std::runtime_error& e) {
std::cerr << "Parsing failed: " << e.what() << "\n";
return 1;
}
return 0;
}This is the most memory-efficient way to process large files. The parser invokes your callback function for each message it finds, avoiding the need to store all messages in memory at once. For best performance, read the file into a buffer first.
#include "itch/parser.hpp"
#include <iostream>
#include <vector>
#include <fstream>
// A simple callback function that counts messages
void message_counter(const itch::Message& message) {
static long count = 0;
count++;
if (count % 1000000 == 0) {
std::cout << "Processed " << count << " messages...\n";
}
}
int main(int argc, char* argv[]) {
if (argc != 2) {
std::cerr << "Usage: " << argv[0] << " <path_to_itch_file>\n";
return 1;
}
std::ifstream file(argv[1], std::ios::binary);
if (!file.is_open()) {
std::cerr << "Error opening file!\n";
return 1;
}
// Reading the entire file into a buffer minimizes I/O overhead during parsing.
std::vector<char> buffer(std::istreambuf_iterator<char>(file), {});
itch::Parser parser;
try {
// Parse the buffer and invoke `message_counter` for each message.
parser.parse(buffer.data(), buffer.size(), &message_counter);
std::cout << "Finished processing file.\n";
} catch (const std::runtime_error& e) {
std::cerr << "Parsing failed: " << e.what() << "\n";
return 1;
}
return 0;
}You can instruct the parser to only deserialize specific message types. This provides a significant performance boost if you are only interested in a subset of the data (e.g., just trades and quotes).
#include "itch/parser.hpp"
#include <fstream>
#include <iostream>
#include <vector>
int main(int argc, char* argv[]) {
if (argc != 2) { /*...*/ return 1; }
std::ifstream file(argv[1], std::ios::binary);
if (!file) { /*...*/ return 1; }
itch::Parser parser;
try {
// We only care about Add Order, Order Executed, and Broken Trade messages.
// The parser will skip all other message types.
std::vector<char> filter = {'A', 'F', 'E', 'C', 'B'};
std::vector<itch::Message> trade_events = parser.parse(file, filter);
std::cout << "Found " << trade_events.size() << " relevant order and trade messages.\n";
} catch (const std::runtime_error& e) {
std::cerr << "Parsing failed: " << e.what() << "\n";
return 1;
}
return 0;
}This standalone example demonstrates how to use std::visit with a visitor to process different message types and print their details correctly.
#include "itch/parser.hpp"
#include <iostream>
#include <vector>
#include <fstream>
#include <iomanip> // For std::setprecision
// A visitor object to handle different message types.
// Using `if constexpr` ensures only valid code is compiled for each message type.
auto message_visitor = [](const auto& msg) {
using T = std::decay_t<decltype(msg)>;
// Set fixed-point notation for prices
std::cout << std::fixed << std::setprecision(4);
if constexpr (std::is_same_v<T, itch::AddOrderMessage>) {
std::cout << "ADD ORDER: Ref #" << msg.order_reference_number
<< ", " << msg.shares << " shares of "
<< itch::to_string(msg.stock, sizeof(msg.stock)) // Helper to convert char array
<< " @ $" << static_cast<double>(msg.price) / itch::PRICE_DIVISOR << "\n";
} else if constexpr (std::is_same_v<T, itch::OrderExecutedMessage>) {
std::cout << "EXECUTION: Ref #" << msg.order_reference_number
<< ", executed " << msg.executed_shares << " shares.\n";
} else if constexpr (std::is_same_v<T, itch::SystemEventMessage>) {
std::cout << "SYSTEM EVENT: Code '" << msg.event_code
<< "' (O=Start, S=SysHours, Q=MktHours, M=EndMkt, E=EndSys, C=End)\n";
} else {
// This block will be visited for any other message types.
}
};
int main(int argc, char* argv[]) {
if (argc != 2) {
std::cerr << "Usage: " << argv[0] << " <path_to_itch_file>\n";
return 1;
}
std::ifstream file(argv[1], std::ios::binary);
if (!file) {
std::cerr << "Error: Cannot open file " << argv[1] << "\n";
return 1;
}
itch::Parser parser;
try {
// Use a lambda to pass each message to our visitor
parser.parse(file, [](const itch::Message& msg) {
std::visit(message_visitor, msg);
});
} catch (const std::runtime_error& e) {
std::cerr << "Parsing failed: " << e.what() << "\n";
return 1;
}
return 0;
}This library supports all message types defined in the ITCH 5.0 specification. The char represents the message type on the wire.
| Type | Struct Name | Description |
|---|---|---|
S |
SystemEventMessage |
Signals a market or data feed handler event. |
| Type | Struct Name | Description |
|---|---|---|
R |
StockDirectoryMessage |
Conveys stock symbol directory information. |
H |
StockTradingActionMessage |
Indicates a change in trading status for a security (e.g., Halted, Trading). |
Y |
RegSHORestrictionMessage |
Regulation SHO short sale price test restricted indicator. |
L |
MarketParticipantPositionMessage |
Conveys a market participant's status (e.g., Primary Market Maker). |
V |
MWCBDeclineLevelMessage |
Informs of the Market-Wide Circuit Breaker (MWCB) breach points for the day. |
W |
MWCBStatusMessage |
Indicates that a MWCB level has been breached. |
h |
OperationalHaltMessage |
Indicates an operational halt for a security on a specific market center. |
| Type | Struct Name | Description |
|---|---|---|
A |
AddOrderMessage |
A new order has been accepted (no MPID attribution). |
F |
AddOrderMPIDMessage |
A new order has been accepted (with MPID attribution). |
E |
OrderExecutedMessage |
An order on the book was executed in part or in full. |
C |
OrderExecutedWithPriceMessage |
An order was executed at a price different from the display price. |
X |
OrderCancelMessage |
An order was partially canceled; this message contains the canceled quantity. |
D |
OrderDeleteMessage |
An order was canceled in its entirety and removed from the book. |
U |
OrderReplaceMessage |
An existing order has been replaced with a new order. |
| Type | Struct Name | Description |
|---|---|---|
P |
TradeMessage |
Reports a trade for a non-displayable order type. |
Q |
CrossTradeMessage |
Reports a cross trade (e.g., Opening, Closing, IPO Cross). |
B |
BrokenTradeMessage |
Reports that a previously disseminated execution has been broken. |
| Type | Struct Name | Description |
|---|---|---|
K |
IPOQuotingPeriodUpdateMessage |
Provides the anticipated IPO quotation release time. |
J |
LULDAuctionCollarMessage |
Indicates auction collar thresholds for a security in a LULD Trading Pause. |
I |
NOIIMessage |
Net Order Imbalance Indicator message, used for opening/closing crosses. |
N |
RetailPriceImprovementIndicator |
Indicates the presence of Retail Price Improvement (RPII) interest. |
O |
DirectListingCapitalRaisePriceDiscovery |
Disseminated for Direct Listing with Capital Raise (DLCR) securities. |
The parser is designed for high-throughput scenarios. Performance is heavily dependent on the underlying hardware (CPU speed, memory bandwidth, and I/O speed).
To run the performance benchmarks:
# Configure and build with benchmarks enabled
cmake -S . -B build -DITCH_BUILD_BENCHMARKS=ON
cmake --build build --config Release
# Run the benchmark suite
./build/benchmarks/parser_benchThe output will show the parsing rate in messages per second and throughput in GB/s. On modern server hardware, expect throughput in the multi-gigabytes per second range when parsing from an in-memory buffer.
The project uses .clang-format to enforce a consistent code style. A .clang-tidy configuration is also provided to catch common programming errors and enforce best practices. Developers contributing to the project are expected to format their code using these tools.
-
What exactly is this NASDAQ ITCH data? It's the most detailed data feed available from NASDAQ. It's not just stock prices; it’s every single order placed, modified, cancelled, and executed. It's the complete "play-by-play" of the market.
-
Why can't I just use a simple program to read this data? The data is in a highly optimized, machine-only binary format, not human-readable text. More importantly, the volume and velocity are immense—a single day of trading can generate tens or hundreds of gigabytes of data. A standard program would be far too slow to keep up.
-
Is this a program I can just double-click and run? No, it’s a specialized component—a library—that software developers use as the "engine" inside a larger application (like a trading platform or an analysis tool). It does the heavy lifting of data processing so they can focus on building the features their users need.
Contributions are welcome! Whether it's a bug report, a feature request, or a pull request, your input is valuable. Please feel free to open an issue or submit a PR.
This project is licensed under the MIT License. See the LICENSE file for details.
- Nasdaq TotalView-ITCH 5.0 Specification: The official documentation is the definitive source for protocol details.