Time series analysis with Apache Spark based on Chronix |
Java Groovy Shell
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.adoc

Build Status Coverage Apache License 2 Sputnik Stories in Ready

logo

Chronix Spark

An Apache Spark RDD implementation for time series processing - based on Chronix.

Design Principles

  • A ChronixRDD is a collection of univariate time series. Each of them has its own vector of timestamps - they are not aligned on one common vector of timestamps.

  • Time series are multi-dimensional. Each time series is associated to one or more dimensions. The identity of a time series is the combination of some of its dimension values.

  • ChronixRDD has its own storage engine based on Solr Cloud and the Chronix format. So the time series data is stored storage-efficient, sharded and with equipped with low-level queries to perform predicate pushdown.

FAQ

How does Chronix Spark compare to Spark-TS?

  • Spark-TS provides no specific time series storage it uses the Spark persistence mechanisms instead. This leads to a less efficient storage usage and less possibilities to perform performance optimizations via predicate pushdown.

  • In contrast to Spark-TS Chronix does not align all time series values on one vector of timestamps. This leads to greater flexibility in time series aggregation.

  • Chronix provides multi-dimensional time series as this is very useful for data warehousing and APM.

  • Chronix has support for Datasets as this will be an important Spark API in the near future. But Chronix currently doesn’t support an IndexedRowMatrix for SparkML.

  • Chronix is purely written in Java. There is no explicit support for Python and Scala yet.

  • Chronix doesn not support a ZonedTime as this makes it way more complicated.