Performance

Highlights

On a laptop:

Ingestion Rate: safe for up to 5K samples per second; batching is the safest way to ingest
Query rate: good for queries on 10 million datapoints

On a server:

Ingestion rate: safe for up to 10K samples per second, again go with batching
Query rate: each core is good for 2mm per second

On a raspberry pi:

Ingestion rate: good for 100 - 300 samples per second
Query rate: TBD

Explanations and scenario outlines

Sybil's performance can be broken into 3 parts: perf of the record ingestion, perf of record collation and querying performance. Sybil is fastest at query performance, but is not slow at ingestion, by any means. Below is an example schema and timings.

Machine Specs

these numbers come from an intel, i7-4500u - a dual core hyperthreaded (imitating 4 CPUs) laptop CPU, (benchmark data) with 8GB RAM and modestly performing SSD.

Schema

time: high cardinality int
weight: low cardinality int
random int: low cardinality int
ping: low cardinality int
host name: low cardinality string
status code: low cardinality string
groups: low cardinality strings inside a set column
host_index: low cardinality int

Commands to run ingestion test

in multiple shells, run a loop that generates samples and sends them into sybil

# generate 1000 samples per batch and ingest them
bash scripts/steady_ingest.sh 1000

Summary

Batched writing makes a difference!

10 records per batch

comfortably ingest 500K events per hour with 10 fields per event
comfortably runs digestion once every 10 seconds
comfortably run queries on 5 - 10 million datapoints in under a second

The above operations can be all be done simultaneously while keeping the machine under 10% load

with 1,000 records per batch

comfortably ingest 5K records per second, for total of 15+M per hour
digest every 10 seconds
comfortably run queries on 5 - 10 million datapoints in under a second

The above operations can be all be done simultaneously while keeping the machine under 10 - 20% load. Adjusting ingestion to 10K records per second will bump machine load to 30 - 40%

Row Ingestion

To ingest 10,000 records of the above schema (generating and writing 10 records at a time with a pause of 10ms between batches) takes approximately 60 seconds on an SSD and has my laptop's 4 CPU's at about 10 - 20% load. That translates to a comfortable ingest rate of 600K events per hour on commodity hardware.

If 600K events seems like a small amount, consider that each event has about 10 fields (I left some unlisted above, because they don't have semantic meaning).

Writing 1,000 records per batch (with spaces of 10ms between writes) takes 100 seconds to ingest 1 million datapoints, translating to 10K records per second (and machine load of 30 - 40%).

Row Digestion

Reading, digesting and collating the above generated 1,000 row store files (with 10 records each) into a half filled block (already containing 37K records of 65K) takes ~500ms, with the time varying based on how full the block currently is. Filling a complete 65K block takes about 600 - 800ms, depending on how many columns are involved (in this example, there are 10 columns, mostly low cardinality)

Querying

As expected of a column store, query times are dependent on how many columns are being loaded off disk. By default, sybil will try to skip columns that are not relevant to the current query (and not bother loading them off disk). For example, running a query that does a group by host on the above data would take less time than a query that does a group by host and status because now the status column has to be read off disk.

Sybil also tries to skip blocks that don't fit the query filters - for example, it will not load any blocks off disk that don't fit within the query time window (if one is supplied) - this translates to great savings in query times as the database grows. However, in order to skip a block, sybil does have to load it's metadata off disk. Once there are hundreds of blocks (or millions samples), that can translate into an extra 50 - 100ms per query to read the block metadata off disk.

Once the blocks to read off disk are chosen, the loading and aggregation time for them is what takes up the main portion of the query time. For example, doing a group by host on 9 million samples on my laptop takes about 900ms. Adding another string column to the group by can increase that to 1,400ms. As you can see - the query time will increase with how much data is being loaded off disk.

For comparison, mongo can take upwards of 10 - 30 seconds to scan those 9 million docs and do the same aggregations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance

Performance

Highlights

Explanations and scenario outlines

Machine Specs

Schema

Commands to run ingestion test

Summary

10 records per batch

with 1,000 records per batch

Row Ingestion

Row Digestion

Querying

Clone this wiki locally