Is your feature request related to a problem or challenge?
Reproducer
- Generate dataset with https://github.com/clflushopt/tpchgen-rs/tree/main/tpchgen-cli
- In
datafusion-cli
DataFusion CLI v50.0.0
> CREATE EXTERNAL TABLE partsupp
STORED AS PARQUET
LOCATION '/Users/yongting/Code/datafusion-sqlstorm/data/partsupp.parquet';
> select ps_partkey, string_agg(ps_comment, ';')
from partsupp
group by ps_partkey;
...
20000 row(s) fetched. (First 40 displayed. Use --maxrows to adjust)
Elapsed 53.737 seconds.
Also the memory usage is quite high.
DuckDB finishes in 0.03s
Describe the solution you'd like
No response
Describe alternatives you've considered
No response
Additional context
Found by SQLStorm #17698