Skip to content

string_agg aggregate function is 1000x slower than duckdb (SQLStorm) #17789

@2010YOUY01

Description

@2010YOUY01

Is your feature request related to a problem or challenge?

Reproducer

  1. Generate dataset with https://github.com/clflushopt/tpchgen-rs/tree/main/tpchgen-cli
  2. In datafusion-cli
DataFusion CLI v50.0.0
> CREATE EXTERNAL TABLE partsupp
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/partsupp.parquet';

> select ps_partkey, string_agg(ps_comment, ';')
from partsupp
group by ps_partkey;

...
20000 row(s) fetched. (First 40 displayed. Use --maxrows to adjust)
Elapsed 53.737 seconds.

Also the memory usage is quite high.
DuckDB finishes in 0.03s

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

Found by SQLStorm #17698

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions