Skip to content
femtotrader edited this page Oct 6, 2018 · 8 revisions

Overview

There once was a time when RRD was the only time series database available. This has changed and there are now tons to choose from, each have benefits and drawbacks.

I hope to give a comprehensive list here and show what Newts may bring to the table. I will confess I have an inadequate understanding of almost all of these so I might make mistakes about their purpose or how they work. Please feel free to directly correct this, or send me an email if you want me to fix it.

One thing that I don't list in the limitations is library or program support. If one of these projects sounds like one you want to use then you may want to check to see if there is a plugin for grafana, or if it works with Cubism.js, or whatever you're going to be using for graphing.

Who am I?

Just some guy who needed a TSDB for some projects I work on, so I wanted to figure out which one I should use, and now I don't know. I like the people doing OpenNMS things so I thought I would spam their wiki with this in the hopes it benefits everyone.

RRD

Who

Tobias Oetiker (http://oss.oetiker.ch/rrdtool/)

Technology

File based datastore, written in C with as-you-write aggregation. Supports built-in functions like min/max/avg for different aggregation types. Has API's for many languages making it popular for both data collection and graphing in open-source.

Used by

Cacti, Nagios, Zenoss, MRTG, munin

Limitations

  • Reboots need "spike removal" or other manual intervention to fix the data due to counter resets (this may apply to all TSDB's)
  • Heavy disk writes with 5 IOPS per value updated
  • Scaling horizontally is a completely manual operation since it is a library, there is no daemon so you can't run something on multiple systems to gain redundancy or split load

License

GPL2

JRobin

Who

OpenNMS mostly, but historically (http://oldwww.jrobin.org/)

Technology

File based datastore written in Java. Supports built-in functions like min/max/avg for different aggregation types. Writes an endian-agnostic file format.

Limitations

  • Has the same problems as RRD
  • Has fewer aggregation functions written for it (this may have changed)

License

LGPL

Whisper

Who

Graphite (http://graphite.wikidot.com/whisper)

Technology

File based datastore written in Python. It has different requirements for time series than RRD which they spell out on their website. Essentially, they need to be able to skip updates, which is impossible with RRD. RRD considers a late update to be just the next update in the file. In order to "skip" some you would need to write NaN until you get to the new time.

You also can't work backwards in RRD to post a previous value that was missed. You could compensate and do this in your RRD library but it would be very IO intensive to do so, so they wrote a new TSDB.

Limitations

  • Slow, but not enough to stop people using it for 1 minute or less polling precision
  • Can't scale horizontally. Doesn't scale well in general (needs raided SSD or big SAN)
  • Pain to install (?)

License

Apache V2

OpenTSDB

Who

http://opentsdb.net

The project was started by Stumbleupon

Technology

Runs on Hadoop and HBase. Data is stored exactly as provided without aggregation and without removing old data. According to their FAQ they are considering Cassandra support now.

Limitations

  • Ironically, it manipulates the data returned so it might not be exactly as it is stored
  • People say it's a pain to run an HBase cluster. I imagine it's easier now with puppet/chef/etc.
  • According to InfluxDB startup thing, "it was too easy to create hot spots that would kill performance"

License

GPL3

KairosDB

Who

http://kairosdb.github.io/

Technology

Can use in-memory H2 database (for testing) or Cassandra. Supports aggregation using built-in functions like min/max/avg. This project started as a fork of OpenTSDB due to differing requirements.

Limitations

  • unknown

License

Apache V2

InfluxDB

Who

https://influxdb.com/

Technology

Written in Go. No underlying database (like Cassandra or HBase). Has the ability to compute queries continuously and send the data to the client.

Limitations

  • Clustering, Replication and HA are in "alpha" state

License

MIT

Prometheus

Who

Soundcloud (http://prometheus.io)

Technology

Written in Go. File based datastore with external indexing and in-memory cache. Made to be better and faster than Graphite but easier than something like OpenTSDB. InfluxDB didn't exist when they started building it. When compared, InfluxDB uses more diskspace per metric.

Limitations

  • Horizontal scaling isn't possible (they acknowledge this, but seem to be working on being the best single-system monitoring solution)

License

Apache V2

druid

Who

Metamarkets open-sourced druid in 2013 (http://druid.io/)

Technology

Java based, Hadoop/HDFS/Zookeeper. Druid was designed to handle analytics for online advertising. It doesn't bill itself as strictly for Time series, but that isn't a reason to exclude it. The biggest barrier you might face is the complexity. It can do much more than the other mentioned platforms from what I can see, so setup and usage may be difficult.

You might start off with something like this:

https://github.com/Quantiply/druid-vagrant

Limitations

  • unknown

License

Apache V2

blueflood

Who

Rackspace (http://blueflood.io/)

Technology

Java based Cassandra-backed system. Has fixed rollup intervals for aggregation with a few function types (min/max/variance/average)

Limitations

  • Seems to be a work in progress, or perhaps it's exactly what was needed by rackspace without the extra bits
  • no API, the part that handled that was not open sourced so you need to write to the database in Java or write your own compatibility layer

License

Apache V2

Cyanite

Who

http://cyanite.io

Technology

Clojure application, Cassandra-backed system. Seems to be based around Graphite.

Limitations

  • Lots of open issues
  • Might only work with Graphite currently

License

Attribution, Share-alike

Level-TSD

Who

InMobi (https://github.com/InMobi/level-tsd)

Technology

An embedded database based on leveldb that is tailored to graphite. This was written after Ceres was considered and limitations found in it's use (specifically with inode usage). They also tested postgres arrays as a datastore, and found them to be faster than Whisper and Ceres but had problems with the design creating a need for constant VACUUM.

While this is a single-system datastore, they were able to scale to 500K metrics/minute on raid5 4x15K drives. http://www.inmobi.com/blog/2014/01/24/extending-graphites-mileage

Limitations

  • No horizontal scalability
  • LevelDB only allows one process to access the database at a time, so no multi-core threading
  • Project hasn't been updated in 5 months

License

Apache V2

Ceres

Who

Graphite (https://github.com/graphite-project/ceres)

Technology

Distributed database written in Python. Not a fixed-sized db, but instead aggregation and expiration will be done by maintenance plugins in Carbon. Is in a partially usable state (people are reporting they are using it in "production" but documentation is incomplete and development is slow)

Limitations

  • ??

License

Apache V2

Newts

Who

The Opennms Group (http://opennms.org)

Technology

Java based Cassandra-backed system. This will use delayed aggregation (aggregation at read) to make things fast.

Limitations

  • ?? at a guess, runaway disk space until read/rollup intervals happen. Maybe there will be a maintenance task that can run at slow times/late hours to aggregate graphs.

License

Apache V2

Links

Another comparison with others I hadn't found yet

http://www.erol.si/2015/01/the-complete-list-of-all-timeseries-databases-for-your-iot-project/

A breakdown from someone who has used some of the databases

https://lobste.rs/s/kjn5an/recommended_reading_for_building_time_series_databases

Ultimate Comparison of open source TSDBs

https://tsdbbench.github.io/Ultimate-TSDB-Comparison/

It's based on the publication Survey and Comparison of Open Source Time Series Databases [slides].