digoal
2023-12-01
PostgreSQL , DuckDB , tpc-ds
duckdb 内置tpcds, tpch模块, 可以快速生成数据, 生产测试SQL, 快速测试.
如果你有其他数据库产品需要测试, 可以直接使用duckdb来生成测试数据和SQL, 横向对比.
- 《DuckDB 采用外部 parquet 格式存储 - tpch 测试 - in_memory VS in_parquet》
- 《DuckDB vs PostgreSQL TPC-H 测试》
- 《DuckDB TPC-H 测试》
步骤如下
- 安装tpch/tpcds extension
- 加载extension
- 生成数据
- 导出query
- 打开结果重定向、时间、profiling、等配置
- 执行query, 导出执行结果
- 查看profile结果, 执行结果.
详情
https://duckdb.org/docs/extensions/tpcds.html
安装、加载extension
D install 'tpcds';
D load 'tpcds';
生成数据
D CALL dsdgen(sf=1);
导出tpcds SQL:
D copy (select query as " " from tpcds_queries()) to 'tpcds.sql' with (quote '');
查看sql内容, 一共99条
cat tpcds.sql
WITH customer_total_return AS
(SELECT sr_customer_sk AS ctr_customer_sk,
sr_store_sk AS ctr_store_sk,
sum(sr_return_amt) AS ctr_total_return
FROM store_returns,
date_dim
WHERE sr_returned_date_sk = d_date_sk
AND d_year = 2000
GROUP BY sr_customer_sk,
sr_store_sk)
SELECT c_customer_id
FROM customer_total_return ctr1,
store,
customer
WHERE ctr1.ctr_total_return >
(SELECT avg(ctr_total_return)*1.2
FROM customer_total_return ctr2
WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)
AND s_store_sk = ctr1.ctr_store_sk
AND s_state = 'TN'
AND ctr1.ctr_customer_sk = c_customer_sk
ORDER BY c_customer_id
LIMIT 100;
...
按sql编号(1~99)执行测试
D .timer on
D PRAGMA tpcds(8);
┌──────────────┬────────────────────┐
│ s_store_name │ sum(ss_net_profit) │
│ varchar │ decimal(38,2) │
├──────────────┼────────────────────┤
│ able │ -10354620.18 │
│ ation │ -10576395.52 │
│ bar │ -10625236.01 │
│ ese │ -10076698.16 │
│ ought │ -10994052.78 │
└──────────────┴────────────────────┘
Run Time (s): real 0.035 user 0.074382 sys 0.004634
使用刚才导出的tpcds.sql, 测试性能
配置profile, 输出重定向等.
D PRAGMA enable_profiling='QUERY_TREE_OPTIMIZER';
D PRAGMA explain_output='all';
D PRAGMA profiling_mode='detailed';
D PRAGMA profile_output='tpcds.profile';
D .timer on
将执行结果重定向到my_results.txt
D .output my_results.txt
执行SQL, 每一条的耗时看real, 单位秒. 我这里使用的机器是MacBook Pro (15-inch, 2018) 2.2 GHz 六核Intel Core i7 , 32G 2400MHz
下面的docker容器(资源限制8GB, 6C, 0 SWAP)
D .read tpcds.sql
Run Time (s): real 0.045 user 0.102511 sys 0.005212
Run Time (s): real 0.093 user 0.357725 sys 0.009345
Run Time (s): real 0.011 user 0.024834 sys 0.005110
Run Time (s): real 0.345 user 1.253968 sys 0.124454
Run Time (s): real 0.049 user 0.120580 sys 0.024272
Run Time (s): real 0.021 user 0.063336 sys 0.003390
Run Time (s): real 0.059 user 0.201122 sys 0.010204
Run Time (s): real 0.025 user 0.059354 sys 0.004384
Run Time (s): real 0.131 user 0.371283 sys 0.002992
Run Time (s): real 0.046 user 0.173898 sys 0.003287
Run Time (s): real 0.225 user 0.636656 sys 0.051982
Run Time (s): real 0.014 user 0.034692 sys 0.002332
Run Time (s): real 0.038 user 0.173474 sys 0.002123
Run Time (s): real 0.278 user 1.037221 sys 0.140083
Run Time (s): real 0.034 user 0.129587 sys 0.002064
Run Time (s): real 0.037 user 0.096894 sys 0.012613
Run Time (s): real 0.049 user 0.177654 sys 0.007142
Run Time (s): real 0.064 user 0.196052 sys 0.007506
Run Time (s): real 0.170 user 0.427845 sys 0.030514
Run Time (s): real 0.017 user 0.043945 sys 0.001574
Run Time (s): real 0.022 user 0.071809 sys 0.001300
Run Time (s): real 0.326 user 1.614572 sys 0.060703
Run Time (s): real 0.482 user 2.400375 sys 0.174530
Run Time (s): real 0.051 user 0.134917 sys 0.009525
Run Time (s): real 0.029 user 0.091872 sys 0.002064
Run Time (s): real 0.040 user 0.125914 sys 0.008500
Run Time (s): real 0.123 user 0.526933 sys 0.035607
Run Time (s): real 0.058 user 0.298473 sys 0.003460
Run Time (s): real 0.034 user 0.105500 sys 0.003202
Run Time (s): real 0.030 user 0.063444 sys 0.003466
Run Time (s): real 0.078 user 0.255402 sys 0.029586
Run Time (s): real 0.008 user 0.021511 sys 0.001074
Run Time (s): real 0.019 user 0.047749 sys 0.001191
Run Time (s): real 0.026 user 0.069697 sys 0.001201
Run Time (s): real 0.077 user 0.284389 sys 0.033933
Run Time (s): real 0.141 user 0.475610 sys 0.044567
Run Time (s): real 0.031 user 0.157093 sys 0.002996
Run Time (s): real 0.050 user 0.199169 sys 0.005099
Run Time (s): real 0.056 user 0.191889 sys 0.003279
Run Time (s): real 0.031 user 0.098717 sys 0.011701
Run Time (s): real 0.015 user 0.020655 sys 0.002227
Run Time (s): real 0.010 user 0.030898 sys 0.003248
Run Time (s): real 0.026 user 0.119584 sys 0.002149
Run Time (s): real 0.042 user 0.179673 sys 0.021623
Run Time (s): real 0.030 user 0.056407 sys 0.003384
Run Time (s): real 0.050 user 0.112786 sys 0.017813
Run Time (s): real 0.413 user 1.810031 sys 0.232826
Run Time (s): real 0.040 user 0.183546 sys 0.004902
Run Time (s): real 0.035 user 0.111519 sys 0.007789
Run Time (s): real 0.024 user 0.083776 sys 0.003429
Run Time (s): real 0.206 user 1.038731 sys 0.044445
Run Time (s): real 0.009 user 0.027915 sys 0.000158
Run Time (s): real 0.016 user 0.048616 sys 0.005734
Run Time (s): real 0.022 user 0.055986 sys 0.004590
Run Time (s): real 0.009 user 0.025713 sys 0.006221
Run Time (s): real 0.019 user 0.048297 sys 0.005444
Run Time (s): real 0.176 user 0.742320 sys 0.094409
Run Time (s): real 0.047 user 0.138009 sys 0.014770
Run Time (s): real 0.105 user 0.510612 sys 0.011856
Run Time (s): real 0.024 user 0.065717 sys 0.009596
Run Time (s): real 0.022 user 0.061963 sys 0.001357
Run Time (s): real 0.016 user 0.050796 sys 0.002128
Run Time (s): real 0.014 user 0.033107 sys 0.004378
Run Time (s): real 0.462 user 1.074837 sys 0.069583
Run Time (s): real 0.047 user 0.208255 sys 0.003455
Run Time (s): real 0.086 user 0.171171 sys 0.029407
Run Time (s): real 0.497 user 2.422080 sys 0.231068
Run Time (s): real 0.037 user 0.098806 sys 0.020772
Run Time (s): real 0.045 user 0.177807 sys 0.008435
Run Time (s): real 0.035 user 0.139556 sys 0.003121
Run Time (s): real 0.025 user 0.060287 sys 0.004220
Run Time (s): real 0.077 user 0.252835 sys 0.003162
Run Time (s): real 0.027 user 0.065962 sys 0.008655
Run Time (s): real 0.146 user 0.522634 sys 0.030312
Run Time (s): real 0.097 user 0.337094 sys 0.021532
Run Time (s): real 0.024 user 0.076816 sys 0.011812
Run Time (s): real 0.026 user 0.062120 sys 0.001268
Run Time (s): real 0.186 user 0.887985 sys 0.052854
Run Time (s): real 0.027 user 0.089989 sys 0.009524
Run Time (s): real 0.110 user 0.449026 sys 0.019728
Run Time (s): real 0.048 user 0.100241 sys 0.016073
Run Time (s): real 0.033 user 0.172670 sys 0.000000
Run Time (s): real 0.043 user 0.103319 sys 0.017033
Run Time (s): real 0.019 user 0.042995 sys 0.005229
Run Time (s): real 0.054 user 0.202849 sys 0.004261
Run Time (s): real 0.014 user 0.040690 sys 0.006200
Run Time (s): real 0.054 user 0.235763 sys 0.005898
Run Time (s): real 0.057 user 0.161726 sys 0.005051
Run Time (s): real 0.029 user 0.086155 sys 0.015303
Run Time (s): real 0.007 user 0.017269 sys 0.001076
Run Time (s): real 0.032 user 0.108520 sys 0.002269
Run Time (s): real 0.009 user 0.018233 sys 0.004383
Run Time (s): real 0.049 user 0.201757 sys 0.006041
Run Time (s): real 0.035 user 0.076491 sys 0.011798
Run Time (s): real 0.348 user 1.202210 sys 0.255382
Run Time (s): real 0.007 user 0.021925 sys 0.002413
Run Time (s): real 0.063 user 0.252291 sys 0.017730
Run Time (s): real 0.043 user 0.092025 sys 0.008163
Run Time (s): real 0.028 user 0.068277 sys 0.006165
查询结果:
- 执行结果: my_results.txt -- 执行时间没有被重定向.
- profile结果: tpcds.profile -- overwrite了, 只有最后一条
profile_output:
- This file is overwritten with each query that is issued. If you want to store the profile output for later it should be copied to a different file.
dsdgen用法:
D .mode table
D select * from duckdb_functions() where function_name='dsdgen';
+---------------+-------------+---------------+---------------+-------------+-------------+------------------------------------------------+-------------------------------------------------------+---------+------------------+------------------+----------+--------------+---------+
| database_name | schema_name | function_name | function_type | description | return_type | parameters | parameter_types | varargs | macro_definition | has_side_effects | internal | function_oid | example |
+---------------+-------------+---------------+---------------+-------------+-------------+------------------------------------------------+-------------------------------------------------------+---------+------------------+------------------+----------+--------------+---------+
| system | main | dsdgen | table | | | [suffix, schema, catalog, keys, overwrite, sf] | [VARCHAR, VARCHAR, VARCHAR, BOOLEAN, BOOLEAN, DOUBLE] | | | | true | 1775 | |
+---------------+-------------+---------------+---------------+-------------+-------------+------------------------------------------------+-------------------------------------------------------+---------+------------------+------------------+----------+--------------+---------+
数据库配置如下:
D SELECT * FROM duckdb_settings();
+----------------------------------+---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+------------+
| name | value | description | input_type |
+----------------------------------+---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+------------+
| access_mode | automatic | Access mode of the database (AUTOMATIC, READ_ONLY or READ_WRITE) | VARCHAR |
| checkpoint_threshold | 16.7MB | The WAL size threshold at which to automatically trigger a checkpoint (e.g. 1GB) | VARCHAR |
| debug_checkpoint_abort | none | DEBUG SETTING: trigger an abort while checkpointing for testing purposes | VARCHAR |
| debug_force_external | false | DEBUG SETTING: force out-of-core computation for operators that support it, used for testing | BOOLEAN |
| debug_force_no_cross_product | false | DEBUG SETTING: Force disable cross product generation when hyper graph isn't connected, used for testing | BOOLEAN |
| debug_asof_iejoin | false | DEBUG SETTING: force use of IEJoin to implement AsOf joins | BOOLEAN |
| prefer_range_joins | false | Force use of range joins with mixed predicates | BOOLEAN |
| debug_window_mode | NULL | DEBUG SETTING: switch window mode to use | VARCHAR |
| default_collation | | The collation setting used when none is specified | VARCHAR |
| default_order | asc | The order type used when none is specified (ASC or DESC) | VARCHAR |
| default_null_order | nulls_last | Null ordering used when none is specified (NULLS_FIRST or NULLS_LAST) | VARCHAR |
| disabled_filesystems | | Disable specific file systems preventing access (e.g. LocalFileSystem) | VARCHAR |
| disabled_optimizers | | DEBUG SETTING: disable a specific set of optimizers (comma separated) | VARCHAR |
| enable_external_access | true | Allow the database to access external state (through e.g. loading/installing modules, COPY TO/FROM, CSV readers, pandas replacement scans, etc) | BOOLEAN |
| enable_fsst_vectors | false | Allow scans on FSST compressed segments to emit compressed vectors to utilize late decompression | BOOLEAN |
| allow_unsigned_extensions | false | Allow to load extensions with invalid or missing signatures | BOOLEAN |
| custom_extension_repository | | Overrides the custom endpoint for remote extension installation | VARCHAR |
| autoinstall_extension_repository | | Overrides the custom endpoint for extension installation on autoloading | VARCHAR |
| autoinstall_known_extensions | true | Whether known extensions are allowed to be automatically installed when a query depends on them | BOOLEAN |
| autoload_known_extensions | true | Whether known extensions are allowed to be automatically loaded when a query depends on them | BOOLEAN |
| enable_object_cache | false | Whether or not object cache is used to cache e.g. Parquet metadata | BOOLEAN |
| enable_http_metadata_cache | false | Whether or not the global http metadata is used to cache HTTP metadata | BOOLEAN |
| enable_profiling | NULL | Enables profiling, and sets the output format (JSON, QUERY_TREE, QUERY_TREE_OPTIMIZER) | VARCHAR |
| enable_progress_bar | true | Enables the progress bar, printing progress to the terminal for long queries | BOOLEAN |
| enable_progress_bar_print | true | Controls the printing of the progress bar, when 'enable_progress_bar' is true | BOOLEAN |
| explain_output | physical_only | Output of EXPLAIN statements (ALL, OPTIMIZED_ONLY, PHYSICAL_ONLY) | VARCHAR |
| extension_directory | | Set the directory to store extensions in | VARCHAR |
| external_threads | 0 | The number of external threads that work on DuckDB tasks. | BIGINT |
| file_search_path | | A comma separated list of directories to search for input files | VARCHAR |
| force_compression | Auto | DEBUG SETTING: forces a specific compression method to be used | VARCHAR |
| force_bitpacking_mode | auto | DEBUG SETTING: forces a specific bitpacking mode | VARCHAR |
| home_directory | | Sets the home directory used by the system | VARCHAR |
| log_query_path | NULL | Specifies the path to which queries should be logged (default: empty string, queries are not logged) | VARCHAR |
| lock_configuration | false | Whether or not the configuration can be altered | BOOLEAN |
| immediate_transaction_mode | false | Whether transactions should be started lazily when needed, or immediately when BEGIN TRANSACTION is called | BOOLEAN |
| integer_division | 0 | Whether or not the / operator defaults to integer division, or to floating point division | BOOLEAN |
| max_expression_depth | 1000 | The maximum expression depth limit in the parser. WARNING: increasing this setting and using very deep expressions might lead to stack overflow errors. | UBIGINT |
| max_memory | 6.6GB | The maximum memory of the system (e.g. 1GB) | VARCHAR |
| memory_limit | 6.6GB | The maximum memory of the system (e.g. 1GB) | VARCHAR |
| null_order | nulls_last | Null ordering used when none is specified (NULLS_FIRST or NULLS_LAST) | VARCHAR |
| ordered_aggregate_threshold | 262144 | The number of rows to accumulate before sorting, used for tuning | UBIGINT |
| password | NULL | The password to use. Ignored for legacy compatibility. | VARCHAR |
| perfect_ht_threshold | 12 | Threshold in bytes for when to use a perfect hash table (default: 12) | BIGINT |
| pivot_filter_threshold | 10 | The threshold to switch from using filtered aggregates to LIST with a dedicated pivot operator | BIGINT |
| pivot_limit | 100000 | The maximum number of pivot columns in a pivot statement (default: 100000) | BIGINT |
| preserve_identifier_case | true | Whether or not to preserve the identifier case, instead of always lowercasing all non-quoted identifiers | BOOLEAN |
| preserve_insertion_order | true | Whether or not to preserve insertion order. If set to false the system is allowed to re-order any results that do not contain ORDER BY clauses. | BOOLEAN |
| profiler_history_size | NULL | Sets the profiler history size | BIGINT |
| profile_output | | The file to which profile output should be saved, or empty to print to the terminal | VARCHAR |
| profiling_mode | NULL | The profiling mode (STANDARD or DETAILED) | VARCHAR |
| profiling_output | | The file to which profile output should be saved, or empty to print to the terminal | VARCHAR |
| progress_bar_time | 2000 | Sets the time (in milliseconds) how long a query needs to take before we start printing a progress bar | BIGINT |
| schema | main | Sets the default search schema. Equivalent to setting search_path to a single value. | VARCHAR |
| search_path | | Sets the default catalog search path as a comma-separated list of values | VARCHAR |
| temp_directory | | Set the directory to which to write temp files | VARCHAR |
| threads | 6 | The number of total threads used by the system. | BIGINT |
| username | NULL | The username to use. Ignored for legacy compatibility. | VARCHAR |
| arrow_large_buffer_size | false | If arrow buffers for strings, blobs, uuids and bits should be exported using large buffers | BOOLEAN |
| user | NULL | The username to use. Ignored for legacy compatibility. | VARCHAR |
| wal_autocheckpoint | 16.7MB | The WAL size threshold at which to automatically trigger a checkpoint (e.g. 1GB) | VARCHAR |
| worker_threads | 6 | The number of total threads used by the system. | BIGINT |
| allocator_flush_threshold | 134.2MB | Peak allocation threshold at which to flush the allocator after completing a task. | VARCHAR |
| duckdb_api | duckdb/v0.9.2(linux_amd64_gcc4) | DuckDB API surface | VARCHAR |
| custom_user_agent | | Metadata from DuckDB callers | VARCHAR |
| binary_as_string | | In Parquet files, interpret binary data as a string. | BOOLEAN |
| Calendar | gregorian | The current calendar | VARCHAR |
| TimeZone | Etc/UTC | The current time zone | VARCHAR |
+----------------------------------+---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+------------+