Query optimization for a ducklake table #800
Unanswered
billziss-gh
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have a large table on a ducklake with parquet files stored on object storage. I am trying to understand the time difference between the two queries below. I ran each query after restarting duckdb to avoid caching effects (and for the same reason I did not use
cache_httpfs).In my mind the second query should not take 9.5x longer than the first one as it returns a subset of the results of the first one. I modified the second query as follows to get a time similar to the first query.
I will admit that I am no expert in database optimizers, but to my untrained eye this looks like a failure of the optimizer to rewrite the query into one that performs significantly better (almost 10x).
I include full
explain analyzeplans for the queries below. From these it can be seen that all 3 queries fetch about the same amount of data (approx. 258 MiB) but the second query issues substantially more GETs (19965 vs 2502).Beta Was this translation helpful? Give feedback.
All reactions