Skip to content

Latest commit

 

History

History
342 lines (296 loc) · 28.7 KB

20231201_01.md

File metadata and controls

342 lines (296 loc) · 28.7 KB

DuckDB 0.9.2 TPC-DS 测试

作者

digoal

日期

2023-12-01

标签

PostgreSQL , DuckDB , tpc-ds


背景

duckdb 内置tpcds, tpch模块, 可以快速生成数据, 生产测试SQL, 快速测试.

如果你有其他数据库产品需要测试, 可以直接使用duckdb来生成测试数据和SQL, 横向对比.

步骤如下

  • 安装tpch/tpcds extension
  • 加载extension
  • 生成数据
  • 导出query
  • 打开结果重定向、时间、profiling、等配置
  • 执行query, 导出执行结果
  • 查看profile结果, 执行结果.

详情

https://duckdb.org/docs/extensions/tpcds.html

安装、加载extension

D install 'tpcds';    
D load 'tpcds';    

生成数据

D CALL dsdgen(sf=1);  

导出tpcds SQL:

D copy (select query as " " from tpcds_queries()) to 'tpcds.sql' with (quote '');   

查看sql内容, 一共99条

cat tpcds.sql  
  
  
WITH customer_total_return AS  
  (SELECT sr_customer_sk AS ctr_customer_sk,  
          sr_store_sk AS ctr_store_sk,  
          sum(sr_return_amt) AS ctr_total_return  
   FROM store_returns,  
        date_dim  
   WHERE sr_returned_date_sk = d_date_sk  
     AND d_year = 2000  
   GROUP BY sr_customer_sk,  
            sr_store_sk)  
SELECT c_customer_id  
FROM customer_total_return ctr1,  
     store,  
     customer  
WHERE ctr1.ctr_total_return >  
    (SELECT avg(ctr_total_return)*1.2  
     FROM customer_total_return ctr2  
     WHERE ctr1.ctr_store_sk = ctr2.ctr_store_sk)  
  AND s_store_sk = ctr1.ctr_store_sk  
  AND s_state = 'TN'  
  AND ctr1.ctr_customer_sk = c_customer_sk  
ORDER BY c_customer_id  
LIMIT 100;  
  
...  

按sql编号(1~99)执行测试

D .timer on  
D PRAGMA tpcds(8);  
┌──────────────┬────────────────────┐  
│ s_store_name │ sum(ss_net_profit) │  
│   varchar    │   decimal(38,2)    │  
├──────────────┼────────────────────┤  
│ able         │       -10354620.18 │  
│ ation        │       -10576395.52 │  
│ bar          │       -10625236.01 │  
│ ese          │       -10076698.16 │  
│ ought        │       -10994052.78 │  
└──────────────┴────────────────────┘  
Run Time (s): real 0.035 user 0.074382 sys 0.004634  

使用刚才导出的tpcds.sql, 测试性能

配置profile, 输出重定向等.

D PRAGMA enable_profiling='QUERY_TREE_OPTIMIZER';    
D PRAGMA explain_output='all';    
D PRAGMA profiling_mode='detailed';    
D PRAGMA profile_output='tpcds.profile';    
D .timer on    

将执行结果重定向到my_results.txt

D .output my_results.txt    

执行SQL, 每一条的耗时看real, 单位秒. 我这里使用的机器是MacBook Pro (15-inch, 2018) 2.2 GHz 六核Intel Core i7 , 32G 2400MHz 下面的docker容器(资源限制8GB, 6C, 0 SWAP)

D .read tpcds.sql    
  
  
Run Time (s): real 0.045 user 0.102511 sys 0.005212  
Run Time (s): real 0.093 user 0.357725 sys 0.009345  
Run Time (s): real 0.011 user 0.024834 sys 0.005110  
Run Time (s): real 0.345 user 1.253968 sys 0.124454  
Run Time (s): real 0.049 user 0.120580 sys 0.024272  
Run Time (s): real 0.021 user 0.063336 sys 0.003390  
Run Time (s): real 0.059 user 0.201122 sys 0.010204  
Run Time (s): real 0.025 user 0.059354 sys 0.004384  
Run Time (s): real 0.131 user 0.371283 sys 0.002992  
Run Time (s): real 0.046 user 0.173898 sys 0.003287  
Run Time (s): real 0.225 user 0.636656 sys 0.051982  
Run Time (s): real 0.014 user 0.034692 sys 0.002332  
Run Time (s): real 0.038 user 0.173474 sys 0.002123  
Run Time (s): real 0.278 user 1.037221 sys 0.140083  
Run Time (s): real 0.034 user 0.129587 sys 0.002064  
Run Time (s): real 0.037 user 0.096894 sys 0.012613  
Run Time (s): real 0.049 user 0.177654 sys 0.007142  
Run Time (s): real 0.064 user 0.196052 sys 0.007506  
Run Time (s): real 0.170 user 0.427845 sys 0.030514  
Run Time (s): real 0.017 user 0.043945 sys 0.001574  
Run Time (s): real 0.022 user 0.071809 sys 0.001300  
Run Time (s): real 0.326 user 1.614572 sys 0.060703  
Run Time (s): real 0.482 user 2.400375 sys 0.174530  
Run Time (s): real 0.051 user 0.134917 sys 0.009525  
Run Time (s): real 0.029 user 0.091872 sys 0.002064  
Run Time (s): real 0.040 user 0.125914 sys 0.008500  
Run Time (s): real 0.123 user 0.526933 sys 0.035607  
Run Time (s): real 0.058 user 0.298473 sys 0.003460  
Run Time (s): real 0.034 user 0.105500 sys 0.003202  
Run Time (s): real 0.030 user 0.063444 sys 0.003466  
Run Time (s): real 0.078 user 0.255402 sys 0.029586  
Run Time (s): real 0.008 user 0.021511 sys 0.001074  
Run Time (s): real 0.019 user 0.047749 sys 0.001191  
Run Time (s): real 0.026 user 0.069697 sys 0.001201  
Run Time (s): real 0.077 user 0.284389 sys 0.033933  
Run Time (s): real 0.141 user 0.475610 sys 0.044567  
Run Time (s): real 0.031 user 0.157093 sys 0.002996  
Run Time (s): real 0.050 user 0.199169 sys 0.005099  
Run Time (s): real 0.056 user 0.191889 sys 0.003279  
Run Time (s): real 0.031 user 0.098717 sys 0.011701  
Run Time (s): real 0.015 user 0.020655 sys 0.002227  
Run Time (s): real 0.010 user 0.030898 sys 0.003248  
Run Time (s): real 0.026 user 0.119584 sys 0.002149  
Run Time (s): real 0.042 user 0.179673 sys 0.021623  
Run Time (s): real 0.030 user 0.056407 sys 0.003384  
Run Time (s): real 0.050 user 0.112786 sys 0.017813  
Run Time (s): real 0.413 user 1.810031 sys 0.232826  
Run Time (s): real 0.040 user 0.183546 sys 0.004902  
Run Time (s): real 0.035 user 0.111519 sys 0.007789  
Run Time (s): real 0.024 user 0.083776 sys 0.003429  
Run Time (s): real 0.206 user 1.038731 sys 0.044445  
Run Time (s): real 0.009 user 0.027915 sys 0.000158  
Run Time (s): real 0.016 user 0.048616 sys 0.005734  
Run Time (s): real 0.022 user 0.055986 sys 0.004590  
Run Time (s): real 0.009 user 0.025713 sys 0.006221  
Run Time (s): real 0.019 user 0.048297 sys 0.005444  
Run Time (s): real 0.176 user 0.742320 sys 0.094409  
Run Time (s): real 0.047 user 0.138009 sys 0.014770  
Run Time (s): real 0.105 user 0.510612 sys 0.011856  
Run Time (s): real 0.024 user 0.065717 sys 0.009596  
Run Time (s): real 0.022 user 0.061963 sys 0.001357  
Run Time (s): real 0.016 user 0.050796 sys 0.002128  
Run Time (s): real 0.014 user 0.033107 sys 0.004378  
Run Time (s): real 0.462 user 1.074837 sys 0.069583  
Run Time (s): real 0.047 user 0.208255 sys 0.003455  
Run Time (s): real 0.086 user 0.171171 sys 0.029407  
Run Time (s): real 0.497 user 2.422080 sys 0.231068  
Run Time (s): real 0.037 user 0.098806 sys 0.020772  
Run Time (s): real 0.045 user 0.177807 sys 0.008435  
Run Time (s): real 0.035 user 0.139556 sys 0.003121  
Run Time (s): real 0.025 user 0.060287 sys 0.004220  
Run Time (s): real 0.077 user 0.252835 sys 0.003162  
Run Time (s): real 0.027 user 0.065962 sys 0.008655  
Run Time (s): real 0.146 user 0.522634 sys 0.030312  
Run Time (s): real 0.097 user 0.337094 sys 0.021532  
Run Time (s): real 0.024 user 0.076816 sys 0.011812  
Run Time (s): real 0.026 user 0.062120 sys 0.001268  
Run Time (s): real 0.186 user 0.887985 sys 0.052854  
Run Time (s): real 0.027 user 0.089989 sys 0.009524  
Run Time (s): real 0.110 user 0.449026 sys 0.019728  
Run Time (s): real 0.048 user 0.100241 sys 0.016073  
Run Time (s): real 0.033 user 0.172670 sys 0.000000  
Run Time (s): real 0.043 user 0.103319 sys 0.017033  
Run Time (s): real 0.019 user 0.042995 sys 0.005229  
Run Time (s): real 0.054 user 0.202849 sys 0.004261  
Run Time (s): real 0.014 user 0.040690 sys 0.006200  
Run Time (s): real 0.054 user 0.235763 sys 0.005898  
Run Time (s): real 0.057 user 0.161726 sys 0.005051  
Run Time (s): real 0.029 user 0.086155 sys 0.015303  
Run Time (s): real 0.007 user 0.017269 sys 0.001076  
Run Time (s): real 0.032 user 0.108520 sys 0.002269  
Run Time (s): real 0.009 user 0.018233 sys 0.004383  
Run Time (s): real 0.049 user 0.201757 sys 0.006041  
Run Time (s): real 0.035 user 0.076491 sys 0.011798  
Run Time (s): real 0.348 user 1.202210 sys 0.255382  
Run Time (s): real 0.007 user 0.021925 sys 0.002413  
Run Time (s): real 0.063 user 0.252291 sys 0.017730  
Run Time (s): real 0.043 user 0.092025 sys 0.008163  
Run Time (s): real 0.028 user 0.068277 sys 0.006165  

查询结果:

  • 执行结果: my_results.txt -- 执行时间没有被重定向.
  • profile结果: tpcds.profile -- overwrite了, 只有最后一条

profile_output:

  • This file is overwritten with each query that is issued. If you want to store the profile output for later it should be copied to a different file.

dsdgen用法:

D .mode table  
D select * from duckdb_functions() where function_name='dsdgen';  
+---------------+-------------+---------------+---------------+-------------+-------------+------------------------------------------------+-------------------------------------------------------+---------+------------------+------------------+----------+--------------+---------+  
| database_name | schema_name | function_name | function_type | description | return_type |                   parameters                   |                    parameter_types                    | varargs | macro_definition | has_side_effects | internal | function_oid | example |  
+---------------+-------------+---------------+---------------+-------------+-------------+------------------------------------------------+-------------------------------------------------------+---------+------------------+------------------+----------+--------------+---------+  
| system        | main        | dsdgen        | table         |             |             | [suffix, schema, catalog, keys, overwrite, sf] | [VARCHAR, VARCHAR, VARCHAR, BOOLEAN, BOOLEAN, DOUBLE] |         |                  |                  | true     | 1775         |         |  
+---------------+-------------+---------------+---------------+-------------+-------------+------------------------------------------------+-------------------------------------------------------+---------+------------------+------------------+----------+--------------+---------+  

数据库配置如下:

D SELECT * FROM duckdb_settings();    
+----------------------------------+---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+------------+  
|               name               |              value              |                                                                       description                                                                       | input_type |  
+----------------------------------+---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+------------+  
| access_mode                      | automatic                       | Access mode of the database (AUTOMATIC, READ_ONLY or READ_WRITE)                                                                                        | VARCHAR    |  
| checkpoint_threshold             | 16.7MB                          | The WAL size threshold at which to automatically trigger a checkpoint (e.g. 1GB)                                                                        | VARCHAR    |  
| debug_checkpoint_abort           | none                            | DEBUG SETTING: trigger an abort while checkpointing for testing purposes                                                                                | VARCHAR    |  
| debug_force_external             | false                           | DEBUG SETTING: force out-of-core computation for operators that support it, used for testing                                                            | BOOLEAN    |  
| debug_force_no_cross_product     | false                           | DEBUG SETTING: Force disable cross product generation when hyper graph isn't connected, used for testing                                                | BOOLEAN    |  
| debug_asof_iejoin                | false                           | DEBUG SETTING: force use of IEJoin to implement AsOf joins                                                                                              | BOOLEAN    |  
| prefer_range_joins               | false                           | Force use of range joins with mixed predicates                                                                                                          | BOOLEAN    |  
| debug_window_mode                | NULL                            | DEBUG SETTING: switch window mode to use                                                                                                                | VARCHAR    |  
| default_collation                |                                 | The collation setting used when none is specified                                                                                                       | VARCHAR    |  
| default_order                    | asc                             | The order type used when none is specified (ASC or DESC)                                                                                                | VARCHAR    |  
| default_null_order               | nulls_last                      | Null ordering used when none is specified (NULLS_FIRST or NULLS_LAST)                                                                                   | VARCHAR    |  
| disabled_filesystems             |                                 | Disable specific file systems preventing access (e.g. LocalFileSystem)                                                                                  | VARCHAR    |  
| disabled_optimizers              |                                 | DEBUG SETTING: disable a specific set of optimizers (comma separated)                                                                                   | VARCHAR    |  
| enable_external_access           | true                            | Allow the database to access external state (through e.g. loading/installing modules, COPY TO/FROM, CSV readers, pandas replacement scans, etc)         | BOOLEAN    |  
| enable_fsst_vectors              | false                           | Allow scans on FSST compressed segments to emit compressed vectors to utilize late decompression                                                        | BOOLEAN    |  
| allow_unsigned_extensions        | false                           | Allow to load extensions with invalid or missing signatures                                                                                             | BOOLEAN    |  
| custom_extension_repository      |                                 | Overrides the custom endpoint for remote extension installation                                                                                         | VARCHAR    |  
| autoinstall_extension_repository |                                 | Overrides the custom endpoint for extension installation on autoloading                                                                                 | VARCHAR    |  
| autoinstall_known_extensions     | true                            | Whether known extensions are allowed to be automatically installed when a query depends on them                                                         | BOOLEAN    |  
| autoload_known_extensions        | true                            | Whether known extensions are allowed to be automatically loaded when a query depends on them                                                            | BOOLEAN    |  
| enable_object_cache              | false                           | Whether or not object cache is used to cache e.g. Parquet metadata                                                                                      | BOOLEAN    |  
| enable_http_metadata_cache       | false                           | Whether or not the global http metadata is used to cache HTTP metadata                                                                                  | BOOLEAN    |  
| enable_profiling                 | NULL                            | Enables profiling, and sets the output format (JSON, QUERY_TREE, QUERY_TREE_OPTIMIZER)                                                                  | VARCHAR    |  
| enable_progress_bar              | true                            | Enables the progress bar, printing progress to the terminal for long queries                                                                            | BOOLEAN    |  
| enable_progress_bar_print        | true                            | Controls the printing of the progress bar, when 'enable_progress_bar' is true                                                                           | BOOLEAN    |  
| explain_output                   | physical_only                   | Output of EXPLAIN statements (ALL, OPTIMIZED_ONLY, PHYSICAL_ONLY)                                                                                       | VARCHAR    |  
| extension_directory              |                                 | Set the directory to store extensions in                                                                                                                | VARCHAR    |  
| external_threads                 | 0                               | The number of external threads that work on DuckDB tasks.                                                                                               | BIGINT     |  
| file_search_path                 |                                 | A comma separated list of directories to search for input files                                                                                         | VARCHAR    |  
| force_compression                | Auto                            | DEBUG SETTING: forces a specific compression method to be used                                                                                          | VARCHAR    |  
| force_bitpacking_mode            | auto                            | DEBUG SETTING: forces a specific bitpacking mode                                                                                                        | VARCHAR    |  
| home_directory                   |                                 | Sets the home directory used by the system                                                                                                              | VARCHAR    |  
| log_query_path                   | NULL                            | Specifies the path to which queries should be logged (default: empty string, queries are not logged)                                                    | VARCHAR    |  
| lock_configuration               | false                           | Whether or not the configuration can be altered                                                                                                         | BOOLEAN    |  
| immediate_transaction_mode       | false                           | Whether transactions should be started lazily when needed, or immediately when BEGIN TRANSACTION is called                                              | BOOLEAN    |  
| integer_division                 | 0                               | Whether or not the / operator defaults to integer division, or to floating point division                                                               | BOOLEAN    |  
| max_expression_depth             | 1000                            | The maximum expression depth limit in the parser. WARNING: increasing this setting and using very deep expressions might lead to stack overflow errors. | UBIGINT    |  
| max_memory                       | 6.6GB                           | The maximum memory of the system (e.g. 1GB)                                                                                                             | VARCHAR    |  
| memory_limit                     | 6.6GB                           | The maximum memory of the system (e.g. 1GB)                                                                                                             | VARCHAR    |  
| null_order                       | nulls_last                      | Null ordering used when none is specified (NULLS_FIRST or NULLS_LAST)                                                                                   | VARCHAR    |  
| ordered_aggregate_threshold      | 262144                          | The number of rows to accumulate before sorting, used for tuning                                                                                        | UBIGINT    |  
| password                         | NULL                            | The password to use. Ignored for legacy compatibility.                                                                                                  | VARCHAR    |  
| perfect_ht_threshold             | 12                              | Threshold in bytes for when to use a perfect hash table (default: 12)                                                                                   | BIGINT     |  
| pivot_filter_threshold           | 10                              | The threshold to switch from using filtered aggregates to LIST with a dedicated pivot operator                                                          | BIGINT     |  
| pivot_limit                      | 100000                          | The maximum number of pivot columns in a pivot statement (default: 100000)                                                                              | BIGINT     |  
| preserve_identifier_case         | true                            | Whether or not to preserve the identifier case, instead of always lowercasing all non-quoted identifiers                                                | BOOLEAN    |  
| preserve_insertion_order         | true                            | Whether or not to preserve insertion order. If set to false the system is allowed to re-order any results that do not contain ORDER BY clauses.         | BOOLEAN    |  
| profiler_history_size            | NULL                            | Sets the profiler history size                                                                                                                          | BIGINT     |  
| profile_output                   |                                 | The file to which profile output should be saved, or empty to print to the terminal                                                                     | VARCHAR    |  
| profiling_mode                   | NULL                            | The profiling mode (STANDARD or DETAILED)                                                                                                               | VARCHAR    |  
| profiling_output                 |                                 | The file to which profile output should be saved, or empty to print to the terminal                                                                     | VARCHAR    |  
| progress_bar_time                | 2000                            | Sets the time (in milliseconds) how long a query needs to take before we start printing a progress bar                                                  | BIGINT     |  
| schema                           | main                            | Sets the default search schema. Equivalent to setting search_path to a single value.                                                                    | VARCHAR    |  
| search_path                      |                                 | Sets the default catalog search path as a comma-separated list of values                                                                                | VARCHAR    |  
| temp_directory                   |                                 | Set the directory to which to write temp files                                                                                                          | VARCHAR    |  
| threads                          | 6                               | The number of total threads used by the system.                                                                                                         | BIGINT     |  
| username                         | NULL                            | The username to use. Ignored for legacy compatibility.                                                                                                  | VARCHAR    |  
| arrow_large_buffer_size          | false                           | If arrow buffers for strings, blobs, uuids and bits should be exported using large buffers                                                              | BOOLEAN    |  
| user                             | NULL                            | The username to use. Ignored for legacy compatibility.                                                                                                  | VARCHAR    |  
| wal_autocheckpoint               | 16.7MB                          | The WAL size threshold at which to automatically trigger a checkpoint (e.g. 1GB)                                                                        | VARCHAR    |  
| worker_threads                   | 6                               | The number of total threads used by the system.                                                                                                         | BIGINT     |  
| allocator_flush_threshold        | 134.2MB                         | Peak allocation threshold at which to flush the allocator after completing a task.                                                                      | VARCHAR    |  
| duckdb_api                       | duckdb/v0.9.2(linux_amd64_gcc4) | DuckDB API surface                                                                                                                                      | VARCHAR    |  
| custom_user_agent                |                                 | Metadata from DuckDB callers                                                                                                                            | VARCHAR    |  
| binary_as_string                 |                                 | In Parquet files, interpret binary data as a string.                                                                                                    | BOOLEAN    |  
| Calendar                         | gregorian                       | The current calendar                                                                                                                                    | VARCHAR    |  
| TimeZone                         | Etc/UTC                         | The current time zone                                                                                                                                   | VARCHAR    |  
+----------------------------------+---------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+------------+  

digoal's wechat