Citus TPC-H tests
The idea is to compare different PostgreSQL Cloud-based offerings. This repository contains a partial implementation of TPC-H with Direct Loading support for PostgreSQL and some scripting to orchestrate highly concurrent tests.
Running the benchmarks
To run the Citus TPC-H benchmark, follow those steps:
Review the target infrastructure setup:
Review the benchmark schedule and jobs:
Manually start a Citus cluster, and register its DSN in
Give a name to your test setup:
./control.py setup infra.ini tpch.ini
That way, it's possible to run several benchmarks in parallel, with maybe very different infrastructure properties (
db.t4.16xlargecomes to mind) and scale factors etc.
After all we're using Cloud providers, where the hardware is unlimited. We can run all the tests in parallel, in some level of isolation.
This command returns a name, keep that somewhere, you'll need it. Every becnhmark you want to run is given a name.
Choose what systems you want to compare/exercise in this benchmark
./control.py register <schedule> citus rds aurora
Run the tests
./control.py benchmark <name>
The tests are now running. It's possible to see what's happening, of course. Try the following commands:
./control.py tail <name> ./control.py status <name> ./control.py update <name>
For those commands to work you need to register a PostgreSQL database that is local to your controler node (usually your laptop):
[results] section should contain a
where you can put the connection URI
It's possible to list currently known benchmark runs:
$ ./control.py list enticingly_detect currently has 3 systems registered, 3 running aurora: stage 13/vacuum analyze in a day with 184 queries citus: stage 16/vacuum analyze in a day with 1120 queries rds: stage 15/vacuum analyze in a day with 116 queries engage_busily is not currently running promise_never currently has 2 systems registered, 2 running aurora: stage 11/vacuum analyze in 9 hours with 36 queries citus: stage 8/vacuum analyze in 2 hours with 112 queries
Running the benchmarks locally
It's possible to use the TPC-H benchmarks driver with a local PostgreSQL
database, without all the Cloud infrastructure in place. To that end, just
tpch.py command directly, without using the
Here's an example of doing that:
$ DSN=postgresql:///tpch ./tpch.py benchmark pgsql --schedule quick 2018-02-12 11:20:22,544 INFO pgsql: starting benchmark sleep_slowly 2018-02-12 11:20:22,554 INFO pgsql: starting schedule initdb 2018-02-12 11:20:22,554 INFO pgsql: initializing the TPC-H schema 2018-02-12 11:20:22,554 INFO pgsql: create initial schema, pgsql variant psql:./schema/cardinalities.sql:1: NOTICE: view "cardinalities" does not exist, skipping 2018-02-12 11:20:22,707 INFO pgsql: loading 1 steps of data using 16 CPU:  2018-02-12 11:20:28,836 INFO pgsql: loaded step 1/100 for Scale Factor 1 2018-02-12 11:20:28,838 INFO pgsql: vacuum analyze 2018-02-12 11:20:30,758 INFO pgsql: loaded 1 steps of data in 6.12964s, using 16 CPU 2018-02-12 11:20:30,759 INFO pgsql: install constraints from 'schema/tpch-pkeys.sql' 2018-02-12 11:20:30,912 INFO pgsql: install constraints from 'schema/tpch-index.sql' 2018-02-12 11:20:31,253 INFO pgsql: install constraints from 'schema/tpch-fkeys.sql' 2018-02-12 11:20:31,379 INFO pgsql: imported 1 initial steps in 8.82457s, using 16 CPU 2018-02-12 11:20:31,379 INFO pgsql: starting schedule stream 2018-02-12 11:20:31,379 INFO pgsql: Running TPCH with 16 CPUs for 5s, stream 1 4 6 12 2018-02-12 11:20:32,518 INFO pgsql: 1 query streams executed in 1.13002s 2018-02-12 11:20:34,036 INFO pgsql: 17 query streams executed in 2.64805s 2018-02-12 11:20:35,038 INFO pgsql: 17 query streams executed in 3.64983s 2018-02-12 11:20:36,575 INFO pgsql: 33 query streams executed in 5.1877s 2018-02-12 11:20:38,393 INFO pgsql: executed 49 streams (196 queries) in 6.99254s, using 16 CPU
To be able to reproduce this locally, you need to firts create a
tpch-results database and configure it in the
Analysing the Results
tpch.py benchmark driver collects timings as the benchmark runs in a
local PostgreSQL database named
tpch-results. This database is local to
each loader instance, so an extra step is needed to collect the results
Here's the results of running a series of local benchmark as done
previously, where the name of the run as been assigned as
tpch-results# select system, schedule, job, duration, count from results where run = 'sleep_slowly'; system │ schedule │ job │ duration │ count ════════╪══════════╪═══════════════════════╪═════════════════╪═══════ pgsql │ quick │ drop tables │ @ 0.060019 secs │ 1 pgsql │ quick │ create tables │ @ 0.042395 secs │ 1 pgsql │ quick │ initdb │ @ 6.12964 secs │ 1 pgsql │ quick │ schema/tpch-pkeys.sql │ @ 0.144589 secs │ 1 pgsql │ quick │ schema/tpch-index.sql │ @ 0.331428 secs │ 1 pgsql │ quick │ schema/tpch-fkeys.sql │ @ 0.11635 secs │ 1 pgsql │ quick │ stream │ @ 6.992539 secs │ 196 (7 rows)
tpch.py driver is meant to be called on a remote machine, the loader
node, and the DSN is then passed in the environment by the main
Benchmark Schedule and Jobs
This benchmark is based on TPC-H and meant to incrementally reach the Scale
Factor, by implementing the data load in multiple phases. It is possible to
configure several load phases in the file
tpch.ini, following this
[scale] cpu = 16 factor = 300 children = 100 [schedule] full = initdb, stream, phase1, stream quick = initdb, stream stream = stream [initdb] type = load steps = 1..10 [phase1] type = load steps = 11..30 [phase2] type = load steps = 31..100 cpu = 2 [stream] type = stream queries = 1 4 6 12 duration = 5
With such a setup, the target database size is 300 GB (the TPC-H scale factor unit is roughly 1GB), and we can load the data using the three phases arbitrarily named initdb, phase1 and phase2.
To run the
full schedule on all systems in parallel, it's possible to run
the following command:
make -j 4 SCHEDULE=full make benchmark
This command then connects to each of the controller nodes for the tested systems, and runs the following command there:
DSN=... ./tpch.py benchmark <system name> --name <name> --schedule full
The system name is going to be replaced by each of the systems considered
in this benchmark, currently that's _citus, pgsql, rds, and aurora.
The name of the benchmark is computed once on the controller node (the one
where you're tying the
make commands) then the same name is used on every
node. That allows to then easily merge all the tracked timings to further
analyze them as part of the same benchmark run and configuration.
initdb job is an hard-coded special job name that does several
Initialiazing the TPC-H database
The initdb job consists of the following actions:
- create the schema
- install the
cardinalitiesview, a simple COUNT wrapper
- load the configured steps of data, see next section
- install the SQL constraints: primary and foreign keys, and indexes
- vacuum analyze the resulting database
It is possible to select a schema variant thanks to the
--kind option on
tpch.py load command, currently the
pgsql (default) and
Loading Phase Specifications
tpch.ini file expects a Load Phase Specification for each option in
[load] section. Make sure that the section contains the
option, as seen above.
Then, each option is an arbitrary name of a loading phase. Each phase consists of a set of steps as per the TPC-H specifications. The steps must be a continuous range.
[phase1] type = load steps = 11..30
In this exemple, the
phase1 phase consists of the 20 steps from 11 to 30.
The STEP numbers are used as the
-S argument to the
dbgen program. Of
course, for such a setup to make any sense the steps should all be within
the range 1..children, with
children being an option of the
section in the same
The load phases and the streams are done concurrently with a Python pool of
cpu is used to configure how many process
are being started on the coordinator.
With the previous setup where
cpu = 16, the
phase1 load phase of steps
11 to 30 included in going to be ran on the pool of 16 worker process, one
per CPUs. As soon as a worker process is done with a step, the driver starts
another process load one of the remaining steps, until all the steps are
Streaming Queries Concurrently
The stream testing is limited in time, and we measure how much work could be done in a specified amount of time by the different systems in competition.
[stream] type = stream queries = 1 4 6 12 duration = 600
In this setup, a STREAM consists of running the TPC-H queries 1, then 4,
then 6, then 12, in this order, one after the other. As many streams as we
have CPU in the
[scale] section are started concurrently, and as soon as a
stream is done, it is replaced by another one.
Each query execution time is registered, and new streams are started for as
duration allows. The duration is read as a number of seconds.
After having started new streams during duration seconds, then the process
pool waits until the currently running streams are all done.
Here's a sample output of a query stream ran for 5s on a single CPU:
$ DSN=postgresql:///tpch ./tpch.py benchmark pgsql --schedule stream 2018-02-12 11:34:36,730 INFO pgsql: starting benchmark solve_sometimes 2018-02-12 11:34:36,739 INFO pgsql: starting schedule stream 2018-02-12 11:34:36,739 INFO pgsql: Running TPCH with 16 CPUs for 5s, stream 1 4 6 12 2018-02-12 11:34:37,830 INFO pgsql: 1 query streams executed in 1.07669s 2018-02-12 11:34:39,396 INFO pgsql: 17 query streams executed in 2.6434s 2018-02-12 11:34:40,400 INFO pgsql: 17 query streams executed in 3.64771s 2018-02-12 11:34:41,901 INFO pgsql: 33 query streams executed in 5.14833s 2018-02-12 11:34:43,581 INFO pgsql: executed 49 streams (196 queries) in 6.81578s, using 16 CPU
The main Makefile targets are listed with
make help. To test several
systems in parallel, use e.g.
make -j2 stream.
$ make help TPC-H benchmark for PostgreSQL and Citus Data Use make to drive the benchmark, with the following targets: help this help message infra create AWS test infrastructure terminate destroy AWS test infrastructure drop drop TPC-H test tables benchmark bench-citus bench-pgsql bench-rds bench-aurora bench-citus run given SCHEDULE on the citus system bench-pgsql run given SCHEDULE on the pgsql system bench-rds run given SCHEDULE on the rds system bench-aurora run given SCHEDULE on the aurora system tail-f see logs from currently running benchmark fetch-logs fetch logs in ./logs/YYYYMMDD_name/system.log dump-results dump results in ./logs/YYYYMMDD_name/system.dump merge-results merge the results into the RESULTS_DSN database cardinalities run SELECT count(*) on all the tables
Benchmark setup and roles
This benchmark uses 3 kinds of services, that needs to be connected either using SSH or the PostgreSQL protocol directly:
- a controller node, typically localhost, your usual laptop
- several loader nodes
- several database clusters or instances
That's where you type your commands from. The controller mainly uses the following files:
- infra.py and the infra Python module
infra.ini setup registers AWS configuration parameters (such as the
region, availability zone, VPC, subnets, security groups, keyname) and the
setup of each kind of machine that is going to be used.
When running the benchmark, the infrastructure consists of:
- a controller node
- a loader node per database system being benchmarked
- the database services to test
infra.py file knows how to start EC2 instances, RDS database instances
and Aurora PostgreSQL clusters with a single database instance. To test the
Citus and PostgreSQL core instances, you need to manage the services
manually and then register the Database Source Name (or DSN) used to connect
to the running service.
control.py file is using the
infra API directly, so that you don't
need to worry about the details in
infra.py. In case you have to, though,
you can begin with:
$ ./infra.py ec2 list $ ./infra.py rds list $ ./infra.py aurora list
While the loader is meant to be remotely controlled by the controller, it is also made easy to interact with directly. The main controller Makefile uses targets that ssh into the loader and run command there. A typical action from the controller will use something that looks like the following:
ssh -l ec2-user DSN=postgresql://.../db make -f Makefile.loader target
The loader mainly uses the following files:
- tpch-pg PostgreSQL port of the TPC-H sources (dbgen and qgen)
- tpch.py and the tpch Python module
- schema/tracking.sql and a PostgreSQL database where to register the stats
The main entry point of the loader is the
tpch.py command, which
implements its action by means of calling into the
with arguments made available on the command line. A typical command run
from the loader looks like the following:
DSN=postgresql:///tpch ./tpch.py benchmark pgsql --schedule SCHEDULE
tpch.py tool then reads its
tpch.ini configuration file that
contains the benchmarking setup and applies it by calling into the
Makefile.loader with the right arguments passed in the command line, such
as in the following example:
make -f Makefile.loader SF=30000 C=300 S=1 load make -f Makefile.loader STREAM='1 3 6 12' stream
Such a command is expected to be called from the
tpch.py command line, not
interactively. The reason why the Python driver exists is so that we can
easily run 16 such commands, with S varying from 1 to 16, on 16 different
CPU cores concurrently. It's fair to consider the
tcpy.py program as a
concurrent driver for the benchmark suite.
DSN environment variable should be set when calling
Python script does nothing with it, but then
Makefile.loader is using it
to know how to contact the database system being tested. Setting the DSN is
the job of the main controller scripts.
infra.py script knows how to create and terminate both RDS and Aurora
PostgreSQL instances on AWS. The details of the instances are setup using
the following INI syntax:
[rds] name = tpch size = 4500 iops = 10000 class = db.r4.2xlarge stype = io1 pgversion = 9.6.5 [aurora] name = tpch class = db.r4.2xlarge stype = io1
It's then possible to retrieve the DSN to be used to connect to a database created thanks to the infra driver's command:
$ ./infra.py rds dsn --json aws.out/db.rds.json
This might result in an error, so in order to wait until the service is
available, it's possible to use the
$ ./infra.py rds dsn --json aws.out/db.rds.json
That's what the main Makefile is using.
We use the
infra.py script in this repository to spin-off some AWS
instances of EC2 virtual machines and RDS databases.
Getting the numbers
A benchmark consists of the following activities.
First, load the data set:
- create the database model
- load an initial set of data
- add primary keys, foreign keys and extra indexes to the model
- vacuum verbose analyze it all
Now that we have a data set, run the queries:
- start N+1 concurrent sessions
- the first one runs the database update scripts
- each other one is doing a stream of TPCH analytical Queries
- measure time spent on each query in each session, and to run each stream
Now, enlarge the data set and repeat step 2 before
- generate more data with DBGEN
The interesting test is going to be with a 30TB database size when completely loaded, and with the following steps:
- 100 GB
- 300 GB
- 1 TB
- 3 TB
- 10 TB
- 30 TB
Here's how to use DBGEN to achieve that:
./dbgen -s 30000 -C 300 -S 1 -D -n DSN
This command produces the first 100GB of data for a 30TB Scale Factor of a test.
./dbgen -s 30000 -C 300 -U 1 -D -n DSN
This command produces the set of updates that go with the first 100GB of data.
DSS_QUERY=../queries/ ./qgen -s 100 1
This command produces Q1 with parameters adapted to SF=100, and the SQL text is found on stdout, ready to be sent to PostgreSQL.
Autora claims the following on their main page, Amazon Aurora Product Details says:
The PostgreSQL-compatible edition of Aurora delivers up to 3X the throughput of standard PostgreSQL running on the same hardware
Some reading and viewing/listening for background information about Aurora:
TCP keepalives and Admission Resource Control?
So the 3X throughput actually seems to be obtained when using Sysbench, a benchmarking suite that has been known to be good for MySQL and put PostgreSQL into bad lights.
No checkpoints, no WAL, 4-out-of-6 writes on distributed block storage, or something like that.