Replies: 3 comments 5 replies
-
Hi! I suppose it'll be more fair to compare datafusion with duckdb-parquet -- in this case both systems have to process same data without transforming it to systems internal format. Currently there are several configuration options, disabled by default, which could significantly improve performance for queries over parquet files if enabled:
To sum up -- at this moment, datafusion by default doesnt utilize all features available for parquet format to speed up scanning/filtering. |
Beta Was this translation helpful? Give feedback.
-
@korowa this is interesting. So if datafusion uses parquet you can’t avoid decompression and syscalls, but if it uses Arrow then it doesn’t have statistics, bloom filter, page index? |
Beta Was this translation helpful? Give feedback.
-
Is there any benchmark tracking so we can observe how performance changes over time? |
Beta Was this translation helpful? Give feedback.
-
Hello,
I opened benchmarks linked in readme and it turns out that on comparable machine, datafusion performs few times worse than e.g. duckdb. What could be the reason for this?
Beta Was this translation helpful? Give feedback.
All reactions