Skip to content

Performance

okay edited this page Jun 5, 2017 · 12 revisions

Performance

Highlights

On a laptop:

  • Ingestion Rate: safe for up to 5K samples per second; batching is the safest way to ingest
  • Query rate: good for queries on 10 million datapoints

On a server:

  • Ingestion rate: safe for up to 10K samples per second, again go with batching
  • Query rate: each core is good for 2mm per second

On a raspberry pi:

  • Ingestion rate: good for 100 - 300 samples per second
  • Query rate: TBD

Explanations and scenario outlines

Sybil's performance can be broken into 3 parts: perf of the record ingestion, perf of record collation and querying performance. Sybil is fastest at query performance, but is not slow at ingestion, by any means. Below is an example schema and timings.

Machine Specs

these numbers come from an intel, i7-4500u - a dual core hyperthreaded (imitating 4 CPUs) laptop CPU, (benchmark data) with 8GB RAM and modestly performing SSD.

Schema

  • time: high cardinality int
  • weight: low cardinality int
  • random int: low cardinality int
  • ping: low cardinality int
  • host name: low cardinality string
  • status code: low cardinality string
  • groups: low cardinality strings inside a set column
  • host_index: low cardinality int

Commands to run ingestion test

in multiple shells, run a loop that generates samples and sends them into sybil

# generate 1000 samples per batch and ingest them
bash scripts/steady_ingest.sh 1000

Summary

Batched writing makes a difference!

10 records per batch

  • comfortably ingest 500K events per hour with 10 fields per event
  • comfortably runs digestion once every 10 seconds
  • comfortably run queries on 5 - 10 million datapoints in under a second

The above operations can be all be done simultaneously while keeping the machine under 10% load

with 1,000 records per batch

  • comfortably ingest 5K records per second, for total of 15+M per hour
  • digest every 10 seconds
  • comfortably run queries on 5 - 10 million datapoints in under a second

The above operations can be all be done simultaneously while keeping the machine under 10 - 20% load. Adjusting ingestion to 10K records per second will bump machine load to 30 - 40%

Row Ingestion

To ingest 10,000 records of the above schema (generating and writing 10 records at a time with a pause of 10ms between batches) takes approximately 60 seconds on an SSD and has my laptop's 4 CPU's at about 10 - 20% load. That translates to a comfortable ingest rate of 600K events per hour on commodity hardware.

If 600K events seems like a small amount, consider that each event has about 10 fields (I left some unlisted above, because they don't have semantic meaning).

Writing 1,000 records per batch (with spaces of 10ms between writes) takes 100 seconds to ingest 1 million datapoints, translating to 10K records per second (and machine load of 30 - 40%).

Row Digestion

Reading, digesting and collating the above generated 1,000 row store files (with 10 records each) into a half filled block (already containing 37K records of 65K) takes ~500ms, with the time varying based on how full the block currently is. Filling a complete 65K block takes about 600 - 800ms, depending on how many columns are involved (in this example, there are 10 columns, mostly low cardinality)

Querying

As expected of a column store, query times are dependent on how many columns are being loaded off disk. By default, sybil will try to skip columns that are not relevant to the current query (and not bother loading them off disk). For example, running a query that does a group by host on the above data would take less time than a query that does a group by host and status because now the status column has to be read off disk.

Sybil also tries to skip blocks that don't fit the query filters - for example, it will not load any blocks off disk that don't fit within the query time window (if one is supplied) - this translates to great savings in query times as the database grows. However, in order to skip a block, sybil does have to load it's metadata off disk. Once there are hundreds of blocks (or millions samples), that can translate into an extra 50 - 100ms per query to read the block metadata off disk.

Once the blocks to read off disk are chosen, the loading and aggregation time for them is what takes up the main portion of the query time. For example, doing a group by host on 9 million samples on my laptop takes about 900ms. Adding another string column to the group by can increase that to 1,400ms. As you can see - the query time will increase with how much data is being loaded off disk.

For comparison, mongo can take upwards of 10 - 30 seconds to scan those 9 million docs and do the same aggregations.