Arctic is a high performance datastore for numeric data. It supports Pandas, numpy arrays and pickled objects out-of-the-box, with pluggable support for other data types and optional versioning.
Arctic can query millions of rows per second per client, achieves ~10x compression on network bandwidth, ~10x compression on disk, and scales to hundreds of millions of rows per second per MongoDB instance.
Arctic has been under active development at Man AHL since 2012.
mongod --dbpath <path/to/db_directory>
from arctic import Arctic
# Connect to Local MONGODB
store = Arctic('localhost')
# Create the library - defaults to VersionStore
store.initialize_library('NASDAQ')
# Access the library
library = store['NASDAQ']
# Load some data - maybe from Quandl
aapl = Quandl.get("NASDAQ/AAPL", authtoken="your token here")
# Store the data in the library
library.write('AAPL', aapl, metadata={'source': 'Quandl'})
# Reading the data
item = library.read('AAPL')
aapl = item.data
metadata = item.metadata
VersionStore supports much more: See the HowTo!
Plugging a custom class in as a library type is straightforward. This example shows how.
Arctic provides namespaced libraries of data. These libraries allow bucketing data by source, user or some other metric (for example frequency: End-Of-Day; Minute Bars; etc.).
Arctic supports multiple data libraries per user. A user (or namespace) maps to a MongoDB database (the granularity of mongo authentication). The library itself is composed of a number of collections within the database. Libraries look like:
- user.EOD
- user.ONEMINUTE
A library is mapped to a Python class. All library databases in MongoDB are prefixed with 'arctic_'
Arctic includes two storage engines:
- VersionStore: a key-value versioned TimeSeries store. It supports:
- Pandas data types (other Python types pickled)
- Multiple versions of each data item. Can easily read previous versions.
- Create point-in-time snapshots across symbols in a library
- Soft quota support
- Hooks for persisting other data types
- Audited writes: API for saving metadata and data before and after a write.
- a wide range of TimeSeries data frequencies: End-Of-Day to Minute bars
- See the HowTo
- TickStore: Column oriented tick database. Supports dynamic fields, chunks aren't versioned. Designed for large continuously ticking data.
Arctic storage implementations are pluggable. VersionStore is the default.
Arctic currently works with:
- Python 2.7
- pymongo >= 3.0
- Pandas
- MongoDB >= 2.4.x
Arctic has been under active development at Man AHL since 2012.
It wouldn't be possible without the work of the AHL Data Engineering Team including:
- Richard Bounds
- James Blackburn
- Vlad Mereuta
- Tom Taylor
- Tope Olukemi
- Drake Siard
- Slavi Marinov
- Wilfred Hughes
- ... and many others ...
Contributions welcome!
Arctic is licensed under the GNU LGPL v2.1. A copy of which is included in LICENSE