Skip to content

Latest commit

 

History

History
364 lines (295 loc) · 16.6 KB

20230303_01.md

File metadata and controls

364 lines (295 loc) · 16.6 KB

DuckDB 内置 benchmark 的使用

作者

digoal

日期

2023-03-03

标签

PostgreSQL , PolarDB , duckdb , benchmark


背景

如何通过duckdb 内置的benchmark, 了解duckdb在各种场景中的性能. 包括tpch, tpcds, clickhouse, imdb 等benchmark.

1、编译duckdb 包含benchmark.

《Debian学习入门 - (作为服务器使用, Debian 操作系统可能是长期更好的选择?)》

2、了解duckdb 自带的 benchmark_runner 如何使用

https://github.com/duckdb/duckdb/blob/master/benchmark/README.md

root@9b780f5ea2e8:~/duckdb# build/release/benchmark/benchmark_runner --help  
Benchmark to run could not be found.  
Usage: benchmark_runner  
              --list                 Show a list of all benchmarks  
              --profile              Prints the query profile information  
              --detailed-profile     Prints detailed query profile information  
              --threads=n            Sets the amount of threads to use during execution (default: hardware concurrency)  
              --out=[file]           Move benchmark output to file  
              --log=[file]           Move log output to file  
              --info                 Prints info about the benchmark  
              --query                Prints query of the benchmark  
              [name_pattern]         Run only the benchmark which names match the specified name pattern, e.g., DS.* for TPC-DS benchmarks  

3、列出有哪些可选的benchmark

root@9b780f5ea2e8:~/duckdb# build/release/benchmark/benchmark_runner --list  
Append100KIntegersINSERT  
Append100KIntegersINSERTDisk  
Append100KIntegersINSERTPrimary  
Append100KIntegersINSERTAutoCommit  
Append100KIntegersPREPARED  
Append100KIntegersPREPAREDDisk  
Append100KIntegersPREPAREDPrimary  
Append100KIntegersAPPENDER  
Append100KIntegersAPPENDERDisk  
Append100KIntegersAPPENDERPrimary  
Append100KIntegersCOPY  
Append100KIntegersCOPYDisk  
Append100KIntegersCOPYPrimary  
Write100KIntegers  
Appender10MRows  
Appender10MRowsPrimaryKey  
Appender10MRowsDisk  
Appender10MRowsDiskPrimaryKey  
BulkUpdate  
BulkDelete  
CastDateToString  
.........................  

4、列出某个benchmark的信息

build/release/benchmark/benchmark_runner benchmark/micro/nulls/no_nulls_addition.benchmark --info  
display_name:NULL Addition (no nulls)  
group:micro  
subgroup:nulls  

5、列出某个benchmark的query

--query will print the query that is run by the benchmark.

SELECT MIN(i + 1) FROM integers  

6、列出某个benchmark的query profile

--profile will output a query tree (pretty printed), primarily intended for interactive use.

┌─────────────────────────────────────┐  
│┌───────────────────────────────────┐│  
││    Query Profiling Information    ││  
│└───────────────────────────────────┘│  
└─────────────────────────────────────┘  
SELECT MIN(i + 1) FROM integers  
┌─────────────────────────────────────┐  
│┌───────────────────────────────────┐│  
││         Total Time: 0.176s        ││  
│└───────────────────────────────────┘│  
└─────────────────────────────────────┘  
┌───────────────────────────┐  
│    UNGROUPED_AGGREGATE    │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│          min(#0)          │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│             1             │  
│          (0.03s)          │  
└─────────────┬─────────────┘                               
┌─────────────┴─────────────┐  
│         PROJECTION        │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│          +(i, 1)          │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│         100000000         │  
│          (0.05s)          │  
└─────────────┬─────────────┘                               
┌─────────────┴─────────────┐  
│          SEQ_SCAN         │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│          integers         │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│             i             │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│         100000000         │  
│          (0.08s)          │  
└───────────────────────────┘        

7、列出详细profile

--detailed-profile  
  
┌─────────────────────────────────────┐  
│┌───────────────────────────────────┐│  
││    Query Profiling Information    ││  
│└───────────────────────────────────┘│  
└─────────────────────────────────────┘  
SELECT MIN(i + 1) FROM integers  
┌─────────────────────────────────────┐  
│┌───────────────────────────────────┐│  
││         Total Time: 0.152s        ││  
│└───────────────────────────────────┘│  
└─────────────────────────────────────┘  
┌─────────────────────────────────────┐  
│          Optimizer: 0.0001s         │  
│┌───────────────────────────────────┐│  
││      Column Lifetime: 0.0000s     ││  
││     Common Aggregate: 0.0000s     ││  
││   Common Subexpressions: 0.0000s  ││  
││        Deliminator: 0.0000s       ││  
││    Expression Rewriter: 0.0000s   ││  
││       Filter Pullup: 0.0000s      ││  
││      Filter Pushdown: 0.0000s     ││  
││         In Clause: 0.0000s        ││  
││        Join Order: 0.0000s        ││  
││        Regex Range: 0.0000s       ││  
││      Reorder Filter: 0.0000s      ││  
││  Statistics Propagation: 0.0000s  ││  
││           Top N: 0.0000s          ││  
││      Unnest Rewriter: 0.0000s     ││  
││      Unused Columns: 0.0000s      ││  
│└───────────────────────────────────┘│  
└─────────────────────────────────────┘  
┌─────────────────────────────────────┐  
│      Physical Planner: 0.0000s      │  
│┌───────────────────────────────────┐│  
││      Column Binding: 0.0000s      ││  
││        Create Plan: 0.0000s       ││  
││       Resolve Types: 0.0000s      ││  
│└───────────────────────────────────┘│  
└─────────────────────────────────────┘  
┌─────────────────────────────────────┐  
│           Planner: 0.0004s          │  
│┌───────────────────────────────────┐│  
││          Binder: 0.0003s          ││  
│└───────────────────────────────────┘│  
└─────────────────────────────────────┘  
┌───────────────────────────┐  
│      RESULT_COLLECTOR     │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│             0             │  
│          (0.00s)          │  
└─────────────┬─────────────┘                               
┌─────────────┴─────────────┐  
│    UNGROUPED_AGGREGATE    │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│          min(#0)          │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│             1             │  
│          (0.08s)          │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│      sample_count: 71     │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│sample_tuples_count: 145408│  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│       total_count: 0      │  
└─────────────┬─────────────┘                               
┌─────────────┴─────────────┐  
│         PROJECTION        │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│          (i + 1)          │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│         100000000         │  
│          (0.11s)          │  
└─────────────┬─────────────┘                               
┌─────────────┴─────────────┐  
│          SEQ_SCAN         │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│          integers         │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│             i             │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│           EC: 0           │  
│   ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─   │  
│         100000000         │  
│          (0.21s)          │  
└───────────────────────────┘                               
  
  
  
0.152318  

8、执行benchmark

build/release/benchmark/benchmark_runner benchmark/micro/nulls/no_nulls_addition.benchmark  
  
name	run	timing  
benchmark/micro/nulls/no_nulls_addition.benchmark	1	0.121234  
benchmark/micro/nulls/no_nulls_addition.benchmark	2	0.121702  
benchmark/micro/nulls/no_nulls_addition.benchmark	3	0.122948  
benchmark/micro/nulls/no_nulls_addition.benchmark	4	0.122534  
benchmark/micro/nulls/no_nulls_addition.benchmark	5	0.124102  

9、如果你想执行一系列benchmark(使用正则)

Regex

You can also use a regex to specify which benchmarks to run. Be careful of shell expansion of certain regex characters (e.g. * will likely be expanded by your shell, hence this requires proper quoting or escaping).

build/release/benchmark/benchmark_runner "benchmark/micro/nulls/.*"   

例如

build/release/benchmark/benchmark_runner "benchmark/tpch/sf1/.*"  
name	run	timing  
benchmark/tpch/sf1/q01.benchmark	1	0.094016  
benchmark/tpch/sf1/q01.benchmark	2	0.125059  
benchmark/tpch/sf1/q01.benchmark	3	0.110736  
benchmark/tpch/sf1/q01.benchmark	4	0.100461  
benchmark/tpch/sf1/q01.benchmark	5	0.099787  
benchmark/tpch/sf1/q02.benchmark	1	0.021591  
benchmark/tpch/sf1/q02.benchmark	2	0.023860  
benchmark/tpch/sf1/q02.benchmark	3	0.025475  
......  
benchmark/tpch/sf1/q22.benchmark	2	0.051606  
benchmark/tpch/sf1/q22.benchmark	3	0.048059  
benchmark/tpch/sf1/q22.benchmark	4	0.052484  
benchmark/tpch/sf1/q22.benchmark	5	0.053868  

10、设置并行度

root@9b780f5ea2e8:~/duckdb# build/release/benchmark/benchmark_runner benchmark/micro/aggregate/grouped_distinct.benchmark --threads=1  
name	run	timing  
benchmark/micro/aggregate/grouped_distinct.benchmark	1	0.542637  
benchmark/micro/aggregate/grouped_distinct.benchmark	2	0.532041  
benchmark/micro/aggregate/grouped_distinct.benchmark	3	0.532411  
benchmark/micro/aggregate/grouped_distinct.benchmark	4	0.542322  
benchmark/micro/aggregate/grouped_distinct.benchmark	5	0.532524  
  
root@9b780f5ea2e8:~/duckdb# build/release/benchmark/benchmark_runner benchmark/micro/aggregate/grouped_distinct.benchmark --threads=4  
name	run	timing  
benchmark/micro/aggregate/grouped_distinct.benchmark	1	0.163530  
benchmark/micro/aggregate/grouped_distinct.benchmark	2	0.157072  
benchmark/micro/aggregate/grouped_distinct.benchmark	3	0.152295  
benchmark/micro/aggregate/grouped_distinct.benchmark	4	0.152698  
benchmark/micro/aggregate/grouped_distinct.benchmark	5	0.155544  

11、benchmark_runner的数据会放在duckdb目录的duckdb_benchmark_data目录中:

drwxr-xr-x  2 root root 4.0K Mar  3 08:14 duckdb_benchmark_data

root@9b780f5ea2e8:~/duckdb# ll duckdb_benchmark_data/
total 245M
drwxr-xr-x 16 root root 4.0K Mar  3 07:43 ..
-rw-r--r--  1 root root 245M Mar  3 07:52 tpch_sf1.duckdb
drwxr-xr-x  2 root root 4.0K Mar  3 08:14 .
  
root@9b780f5ea2e8:~/duckdb/duckdb_benchmark_data# ../build/release/duckdb ./tpch_sf1.duckdb 
v0.7.1 b00b93f
Enter ".help" for usage hints.
D .table
customer  lineitem  nation    orders    part      partsupp  region    supplier
D .tables
customer  lineitem  nation    orders    part      partsupp  region    supplier
D select count(*) from customer;
┌──────────────┐
│ count_star() │
│    int64     │
├──────────────┤
│       150000 │
└──────────────┘

12、benchmark 的详细内容可以参考每一项对应的代码:

https://github.com/duckdb/duckdb/tree/master/benchmark

例如: https://github.com/duckdb/duckdb/blob/master/benchmark/micro/aggregate/simple_aggregate.benchmark

# name: benchmark/micro/aggregate/simple_aggregate.benchmark
# description: SUM(i) over a bunch of integers
# group: [aggregate]

name Integer Sum (Ungrouped)
group aggregate

load
CREATE TABLE integers AS SELECT i % 5 AS i FROM range(0, 10000000) tbl(i);

run
SELECT SUM(i) FROM integers

result I
20000000

digoal's wechat