-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Materialized CTE #61086
base: master
Are you sure you want to change the base?
Materialized CTE #61086
Conversation
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
This is an automated comment for commit 790ba94 with description of existing statuses. It's updated for the latest CI running ❌ Click here to open a full report in a separate page
Successful checks
|
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Just want to say this feature is incredibly necessary! We have CTEs being re-run all over the place, but it would be super cumbersome and error-prone to extract them all into temp tables. Keep up the awesome work, really excited to see this merged :) |
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Awesome work! |
Resolve conflicts Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
…eeded for index Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
…ted query Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
4c3ba37
to
a685d32
Compare
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
…, update test Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
…e for index analysis Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
…subquery Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
0671b27
to
f34f8cb
Compare
Hi @kitaisreal @KochetovNicolai can you help to take a look at this PR? Briefly about the implementation (same for analyzer and non-analyzer):
WDYT about this? For us, this is sufficient (and I think this is sufficient for most users) but I can make a more sophisticated implementation (with proper scope for materialized CTE) if needed. |
Resolve conflicts Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
@canhld94 This is amazing work - just wanted to speak to our use case that would be made a lot more straightforward with properly scoped materialized CTEs. We have a query builder / BI tool type web dashboard for analyzing cloud usage and cost, with which the user can construct complicated queries. Relevant features include "compare to a prior period" and "aggregate cost across multiple usage types or cloud services". So in our backend, the constructed queries look structurally something like this (although they can get significantly more nested):
We intentionally construct one very complex query rather than multiple sequential queries for a couple reasons:
To make our use case work with this materialized CTE implementation, we could in theory extract all CTEs at every level of nesting to the top level and prefix them (e.g. Maybe I'm overlooking something or there's an alternative approach we could take with our query builder? |
@adamshugar yes, it is possible to implement properly scoped CTE with new analyzer. But I hardly see any case that materialized cte in deep subquery will bring benefit.
It looks like you only need |
@canhld94 Apologies, I should've given more detail. In our case:
This may seem a little contrived, and I'm happy to share some more concrete examples. But the queries can get quite complex and I don't want to add too much irrelevant information. Distilled to its essence, our use case is: you have an expensive-to-compute CTE (e.g. Please let me know if I'm misunderstanding something. Regardless, just wanted to offer an example in support of the scoped version. But we can make it work either way, and what you've built already is massively helpful. 🙏 |
@adamshugar Thanks for the example, I understand your point. |
Resolve conflicts Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Just an update, after checking code today I think it's possible to implement materialized CTE with properly scope using new analyzer. Also it's possible to implement it as an optimisation: even without I will start working on this soon. Nevertheless, if you only need materialised CTE at query level, this PR will work w/o issue (we're running it now). |
- Decouple CTE from QueryNode and UnionNode, a CTE will be represented in query tree by a CTENode with one child is the QueryNode or UnionNode - Add MaterializeCTEPass that will materialize the CTENode - TODO: add CTEScanStep as a query plan step and CTESource Signed-off-by: Duc Canh Le <duccanh.le@ahrefs.com>
Just wanted to echo @adamshugar's comments. This change would be so powerful. My use case would also benefit greatly from scoped CTEs. I think I'm in a similar boat as @adamshugar - the product we're working on dynamically builds up queries for the end users, and we use CTEs to break the query up into parts which are referenced in several places, including other CTEs.
Quick thought here - while the super majority of cases would likely benefit from materialized CTEs, there's at least one use case where I would not want the CTE to be materialized: when working with any of the random functions, for example to create mock data. IMHO automatically materializing is also just riskier in general. @canhld94 does the latest support scoped CTEs? Eager to try it out and see how it impacts our use cases. |
Is the latest commit expected to be working? Built things locally to experiment, and I get the following error: WITH all_traits AS MATERIALIZED
(
SELECT number FROM numbers(100)
)
SELECT * FROM all_traits Error: Adding Memory: WITH all_traits AS MATERIALIZED
(
SELECT number FROM numbers(100)
) MEMORY
SELECT * FROM all_traits Error: Thanks for working on this one, I really think it could be a great feature. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Syntax (inspired by Postgres):
WITH t AS MATERIALIZED (subquery)
.The implement is basic now, mostly influenced by implementation of
GLOBAL JOIN
.Memory
(default) orJoin
engine.Close #53449
Documentation entry for user-facing changes