-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Automatically materialize CTEs #12290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Hi @lnkuiper cool idea! I've briefly discussed a similar (yet non existing) PR with @Mytherin a while back. I just glimpsed over the code, so not sure if this is actually an issue currently. But I think this optimization should not be applied when a CTE explicitly requests to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM w.r.t. the join order optimizer. Had a couple of comments about some of the other logic
Also don't quite understand the reference counting stuff, but I think that's just to get a count of the CTEs in a query plan right?
|
I've also added statistics to EDIT: I've also implemented the feedback by Tom/Denis - thanks a lot! |
|
I've added |
|
Thanks! LGTM - could you just solve the merge conflict? |
|
Thanks! |
Merge pull request duckdb/duckdb#12290 from lnkuiper/auto_cte_materialize
This PR adds functionality to inspect query plans before binding, to find out whether CTEs should be materialized. Currently, DuckDB never materializes CTEs. If CTEs are queried multiple times, they are recomputed, which can be very expensive for large complex CTEs.
The heuristic introduced in this PR is pretty simple: if the CTE performs a (grouped) aggregation and is queried more than once, it should be materialized. This heuristic can be improved in a later PR to also materialize when, e.g., the CTE contains a window function.
Pipeline sharing is something that is on our roadmap for the future, which serves a similar purpose while being more generally useful, but this is much more difficult to implement, so that will have to wait.
To make sure query plans do not regress, I've improved CTE statistics in the join order optimizer.
Some performance improvement highlights below.
TPC-H
TPC-DS