digoal
2023-03-03
PostgreSQL , PolarDB , duckdb , benchmark
如何通过duckdb 内置的benchmark, 了解duckdb在各种场景中的性能. 包括tpch, tpcds, clickhouse, imdb 等benchmark.
1、编译duckdb 包含benchmark.
《Debian学习入门 - (作为服务器使用, Debian 操作系统可能是长期更好的选择?)》
2、了解duckdb 自带的 benchmark_runner 如何使用
https://github.com/duckdb/duckdb/blob/master/benchmark/README.md
root@9b780f5ea2e8:~/duckdb# build/release/benchmark/benchmark_runner --help
Benchmark to run could not be found.
Usage: benchmark_runner
--list Show a list of all benchmarks
--profile Prints the query profile information
--detailed-profile Prints detailed query profile information
--threads=n Sets the amount of threads to use during execution (default: hardware concurrency)
--out=[file] Move benchmark output to file
--log=[file] Move log output to file
--info Prints info about the benchmark
--query Prints query of the benchmark
[name_pattern] Run only the benchmark which names match the specified name pattern, e.g., DS.* for TPC-DS benchmarks
3、列出有哪些可选的benchmark
root@9b780f5ea2e8:~/duckdb# build/release/benchmark/benchmark_runner --list
Append100KIntegersINSERT
Append100KIntegersINSERTDisk
Append100KIntegersINSERTPrimary
Append100KIntegersINSERTAutoCommit
Append100KIntegersPREPARED
Append100KIntegersPREPAREDDisk
Append100KIntegersPREPAREDPrimary
Append100KIntegersAPPENDER
Append100KIntegersAPPENDERDisk
Append100KIntegersAPPENDERPrimary
Append100KIntegersCOPY
Append100KIntegersCOPYDisk
Append100KIntegersCOPYPrimary
Write100KIntegers
Appender10MRows
Appender10MRowsPrimaryKey
Appender10MRowsDisk
Appender10MRowsDiskPrimaryKey
BulkUpdate
BulkDelete
CastDateToString
.........................
4、列出某个benchmark的信息
build/release/benchmark/benchmark_runner benchmark/micro/nulls/no_nulls_addition.benchmark --info
display_name:NULL Addition (no nulls)
group:micro
subgroup:nulls
5、列出某个benchmark的query
--query
will print the query that is run by the benchmark.
SELECT MIN(i + 1) FROM integers
6、列出某个benchmark的query profile
--profile
will output a query tree (pretty printed), primarily intended for interactive use.
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Query Profiling Information ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
SELECT MIN(i + 1) FROM integers
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Total Time: 0.176s ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌───────────────────────────┐
│ UNGROUPED_AGGREGATE │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ min(#0) │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 1 │
│ (0.03s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ +(i, 1) │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 100000000 │
│ (0.05s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ SEQ_SCAN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ integers │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ i │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 100000000 │
│ (0.08s) │
└───────────────────────────┘
7、列出详细profile
--detailed-profile
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Query Profiling Information ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
SELECT MIN(i + 1) FROM integers
┌─────────────────────────────────────┐
│┌───────────────────────────────────┐│
││ Total Time: 0.152s ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Optimizer: 0.0001s │
│┌───────────────────────────────────┐│
││ Column Lifetime: 0.0000s ││
││ Common Aggregate: 0.0000s ││
││ Common Subexpressions: 0.0000s ││
││ Deliminator: 0.0000s ││
││ Expression Rewriter: 0.0000s ││
││ Filter Pullup: 0.0000s ││
││ Filter Pushdown: 0.0000s ││
││ In Clause: 0.0000s ││
││ Join Order: 0.0000s ││
││ Regex Range: 0.0000s ││
││ Reorder Filter: 0.0000s ││
││ Statistics Propagation: 0.0000s ││
││ Top N: 0.0000s ││
││ Unnest Rewriter: 0.0000s ││
││ Unused Columns: 0.0000s ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Physical Planner: 0.0000s │
│┌───────────────────────────────────┐│
││ Column Binding: 0.0000s ││
││ Create Plan: 0.0000s ││
││ Resolve Types: 0.0000s ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌─────────────────────────────────────┐
│ Planner: 0.0004s │
│┌───────────────────────────────────┐│
││ Binder: 0.0003s ││
│└───────────────────────────────────┘│
└─────────────────────────────────────┘
┌───────────────────────────┐
│ RESULT_COLLECTOR │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 0 │
│ (0.00s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ UNGROUPED_AGGREGATE │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ min(#0) │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 1 │
│ (0.08s) │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ sample_count: 71 │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│sample_tuples_count: 145408│
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ total_count: 0 │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ PROJECTION │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ (i + 1) │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 100000000 │
│ (0.11s) │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│ SEQ_SCAN │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ integers │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ i │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ EC: 0 │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│ 100000000 │
│ (0.21s) │
└───────────────────────────┘
0.152318
8、执行benchmark
build/release/benchmark/benchmark_runner benchmark/micro/nulls/no_nulls_addition.benchmark
name run timing
benchmark/micro/nulls/no_nulls_addition.benchmark 1 0.121234
benchmark/micro/nulls/no_nulls_addition.benchmark 2 0.121702
benchmark/micro/nulls/no_nulls_addition.benchmark 3 0.122948
benchmark/micro/nulls/no_nulls_addition.benchmark 4 0.122534
benchmark/micro/nulls/no_nulls_addition.benchmark 5 0.124102
9、如果你想执行一系列benchmark(使用正则)
Regex
You can also use a regex to specify which benchmarks to run. Be careful of shell expansion of certain regex characters (e.g. *
will likely be expanded by your shell, hence this requires proper quoting or escaping).
build/release/benchmark/benchmark_runner "benchmark/micro/nulls/.*"
例如
build/release/benchmark/benchmark_runner "benchmark/tpch/sf1/.*"
name run timing
benchmark/tpch/sf1/q01.benchmark 1 0.094016
benchmark/tpch/sf1/q01.benchmark 2 0.125059
benchmark/tpch/sf1/q01.benchmark 3 0.110736
benchmark/tpch/sf1/q01.benchmark 4 0.100461
benchmark/tpch/sf1/q01.benchmark 5 0.099787
benchmark/tpch/sf1/q02.benchmark 1 0.021591
benchmark/tpch/sf1/q02.benchmark 2 0.023860
benchmark/tpch/sf1/q02.benchmark 3 0.025475
......
benchmark/tpch/sf1/q22.benchmark 2 0.051606
benchmark/tpch/sf1/q22.benchmark 3 0.048059
benchmark/tpch/sf1/q22.benchmark 4 0.052484
benchmark/tpch/sf1/q22.benchmark 5 0.053868
10、设置并行度
root@9b780f5ea2e8:~/duckdb# build/release/benchmark/benchmark_runner benchmark/micro/aggregate/grouped_distinct.benchmark --threads=1
name run timing
benchmark/micro/aggregate/grouped_distinct.benchmark 1 0.542637
benchmark/micro/aggregate/grouped_distinct.benchmark 2 0.532041
benchmark/micro/aggregate/grouped_distinct.benchmark 3 0.532411
benchmark/micro/aggregate/grouped_distinct.benchmark 4 0.542322
benchmark/micro/aggregate/grouped_distinct.benchmark 5 0.532524
root@9b780f5ea2e8:~/duckdb# build/release/benchmark/benchmark_runner benchmark/micro/aggregate/grouped_distinct.benchmark --threads=4
name run timing
benchmark/micro/aggregate/grouped_distinct.benchmark 1 0.163530
benchmark/micro/aggregate/grouped_distinct.benchmark 2 0.157072
benchmark/micro/aggregate/grouped_distinct.benchmark 3 0.152295
benchmark/micro/aggregate/grouped_distinct.benchmark 4 0.152698
benchmark/micro/aggregate/grouped_distinct.benchmark 5 0.155544
11、benchmark_runner的数据会放在duckdb目录的duckdb_benchmark_data目录中:
drwxr-xr-x 2 root root 4.0K Mar 3 08:14 duckdb_benchmark_data
root@9b780f5ea2e8:~/duckdb# ll duckdb_benchmark_data/
total 245M
drwxr-xr-x 16 root root 4.0K Mar 3 07:43 ..
-rw-r--r-- 1 root root 245M Mar 3 07:52 tpch_sf1.duckdb
drwxr-xr-x 2 root root 4.0K Mar 3 08:14 .
root@9b780f5ea2e8:~/duckdb/duckdb_benchmark_data# ../build/release/duckdb ./tpch_sf1.duckdb
v0.7.1 b00b93f
Enter ".help" for usage hints.
D .table
customer lineitem nation orders part partsupp region supplier
D .tables
customer lineitem nation orders part partsupp region supplier
D select count(*) from customer;
┌──────────────┐
│ count_star() │
│ int64 │
├──────────────┤
│ 150000 │
└──────────────┘
12、benchmark 的详细内容可以参考每一项对应的代码:
https://github.com/duckdb/duckdb/tree/master/benchmark
例如: https://github.com/duckdb/duckdb/blob/master/benchmark/micro/aggregate/simple_aggregate.benchmark
# name: benchmark/micro/aggregate/simple_aggregate.benchmark
# description: SUM(i) over a bunch of integers
# group: [aggregate]
name Integer Sum (Ungrouped)
group aggregate
load
CREATE TABLE integers AS SELECT i % 5 AS i FROM range(0, 10000000) tbl(i);
run
SELECT SUM(i) FROM integers
result I
20000000