-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
opt: hoist WITHs to the top level #41033
Conversation
outScope = inScope.push() | ||
|
||
start := len(b.ctes) | ||
// Make a fake subquery to ensure that no CTEs are correlated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rytaft is this correct/ok to do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we only wanted to pull up With
s that have mutations / side-effects?
FWIW postgres supports correlated With
expressions when they're not mutations (and mutations inside subqueries error out with WITH clause containing a data-modifying statement must be at the top level
). We could do the same (but we still want to hoist up the implicit Withs created by [ .. ]
).
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @RaduBerinde and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 13 of 13 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @justinj)
pkg/sql/opt/norm/custom_funcs.go, line 1838 at r1 (raw file):
} if w, ok := r.(*memo.WithExpr); ok {
A comment would help here
pkg/sql/opt/norm/testdata/rules/decorrelate, line 2103 at r1 (raw file):
FAMILY "primary" (_int2, _int8, _timestamptz, _bool, _decimal, _string, rowid) ) ----
This table can be deleted. Not sure if there is another way to add a regression test for #40592...
pkg/sql/opt/optbuilder/select.go, line 571 at r1 (raw file):
Previously, justinj (Justin Jaffray) wrote…
@rytaft is this correct/ok to do?
I think this probably works, but it's sort of hard to reason about. Another option might be: whenever you build a CTE, just pass in an empty scope (similar to how Explain works).
If you want to support correlated CTEs that don't have mutations, however, then maybe creating a fake subquery is the way to go....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I fleshed this out a bunch more and addressed comments, I probably shouldn't have made it so much bigger, but that's where we are :/ I can break it up into smaller PRs if that would make things easier, it's already in three commits.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/opt/norm/custom_funcs.go, line 1838 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
A comment would help here
Done.
pkg/sql/opt/norm/testdata/rules/decorrelate, line 2103 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
This table can be deleted. Not sure if there is another way to add a regression test for #40592...
Done.
pkg/sql/opt/optbuilder/select.go, line 571 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
I think this probably works, but it's sort of hard to reason about. Another option might be: whenever you build a CTE, just pass in an empty scope (similar to how Explain works).
If you want to support correlated CTEs that don't have mutations, however, then maybe creating a fake subquery is the way to go....
I think we can't just pass in an empty scope, since CTEs still have to be able to refer to parent CTEs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool! Especially like how the second commit turned out!
On the first commit: we are adding this new buildCtx
but we don't seem to be threading it everywhere, and in a lot of places we assume that the empty context is ok to pass down (which won't be true if we want to expand the use of this context; there should probably be a child()
method on the input context). Until now, we've been using the scope
to store everything, couldn't we add the atRoot
flag there?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @justinj and @rytaft)
pkg/sql/opt/optbuilder/select.go, line 580 at r3 (raw file):
outScope = inScope.push() // Make a fake subquery to ensure that no CTEs are correlated.
I thought we would still allow correlated CTEs that aren't mutations? (and only pull up mutations, and all square bracket stuff). Restricting read-only CTEs looks like it could be a regression in functionality (but no idea if it matters). Sorry if we discussed this and I forgot the latest conclusion
pkg/sql/opt/optbuilder/select.go, line 173 at r4 (raw file):
inScope.cols = nil i := 0 for _, col := range cte.cols {
[nit] looks like we can just do for i, col :=
pkg/sql/opt/xform/optimizer.go, line 791 at r4 (raw file):
neededCols := rootProps.ColSet() if !neededCols.SubsetOf(root.Relational().OutputCols) { fmt.Println("yeet")
needs cleanup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
I agree with Radu that it feels like atRoot
should be part of scope
.
Reviewed 17 of 19 files at r2, 13 of 14 files at r3, 23 of 23 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @justinj)
pkg/sql/opt/norm/testdata/rules/ordering, line 221 at r4 (raw file):
│ └── (123,) [type=tuple{int}] └── projections └── variable: field [type=string, outer=(3)]
Why can't this be a passthrough column anymore?
pkg/sql/opt/optbuilder/select.go, line 571 at r1 (raw file):
Previously, justinj (Justin Jaffray) wrote…
I think we can't just pass in an empty scope, since CTEs still have to be able to refer to parent CTEs.
Ok! I can't think of any reason why this would fail...
But to be safe, have you tested something like including a WITH inside another correlated subquery?
pkg/sql/opt/xform/testdata/physprops/ordering, line 1163 at r4 (raw file):
├── columns: a:7(int!null) b:8(int!null) c:9(int!null) ├── ordering: +8 └── with &1
shouldn't WITHs that are only scanned once be optimized away?
pkg/sql/opt/xform/testdata/rules/join_order, line 513 at r4 (raw file):
JOIN [INSERT INTO x (a) SELECT NULL FROM x RETURNING 1] ON false JOIN x ON true JOIN [UPDATE x SET a = 1 RETURNING 1] ON true
Is this test still testing what it was meant to?
247ef14
to
96b4ff2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't thread it through because it seemed difficult to verify it was threaded everywhere, and would be difficult to test that that remained correct. In retrospect this has the same problem. Going to think about how this could be improved.
The problem with using the scope
is that we use the same scope object in situations that change the build context (say, a subselect used as a data source). We could perhaps set a "context" on the scope and defer
it back to what it was before, but that seems kind of sketchy and I think it would probably be more reasonable to just have the two concepts separate. Maybe the way to phrase this is that scope boundaries and context boundaries are different? Or maybe there's a way they can be unified somehow? It's not obvious to me how to do that. It does seem problematic to have these as two separate concepts, I'm open to being convinced there's a better solution here.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @justinj, @RaduBerinde, and @rytaft)
pkg/sql/opt/norm/testdata/rules/ordering, line 221 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Why can't this be a passthrough column anymore?
I think because we hoist it, then subsequently inline it, rather than building it inlined. I'm not sure if there's a nice way to fix it short of having some kind of norm rule to re-number columns that could be passthrough columns after a normalization occurred?
pkg/sql/opt/optbuilder/select.go, line 571 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Ok! I can't think of any reason why this would fail...
But to be safe, have you tested something like including a WITH inside another correlated subquery?
Added a test for this.
pkg/sql/opt/optbuilder/select.go, line 580 at r3 (raw file):
Previously, RaduBerinde wrote…
I thought we would still allow correlated CTEs that aren't mutations? (and only pull up mutations, and all square bracket stuff). Restricting read-only CTEs looks like it could be a regression in functionality (but no idea if it matters). Sorry if we discussed this and I forgot the latest conclusion
I think making this change is doable here but I'd rather make it a follow-up, just because there's some hairiness (it introduces two separate code-paths for where WITHs are introduced and I'm finding it tricky to square it with the rest of the changes here.
pkg/sql/opt/optbuilder/select.go, line 173 at r4 (raw file):
Previously, RaduBerinde wrote…
[nit] looks like we can just do
for i, col :=
Hm, not sure why I had it like this before!
pkg/sql/opt/xform/optimizer.go, line 791 at r4 (raw file):
Previously, RaduBerinde wrote…
needs cleanup
Yikes, fixed
pkg/sql/opt/xform/testdata/physprops/ordering, line 1163 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
shouldn't WITHs that are only scanned once be optimized away?
We don't do so in the presence of side effects. We could maybe do it in some limited cases but it's by design that we want to hoist up all the mutations.
pkg/sql/opt/xform/testdata/rules/join_order, line 513 at r4 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
Is this test still testing what it was meant to?
This change actually obviates what this test was originally supposed to test, so deleted!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I still don't fully understand the difference between scope and context... is there some formal definition of when the context changes?
Reviewed 7 of 43 files at r5, 12 of 14 files at r6, 23 of 23 files at r7.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @justinj and @RaduBerinde)
pkg/sql/opt/norm/testdata/rules/ordering, line 221 at r4 (raw file):
Previously, justinj (Justin Jaffray) wrote…
I think because we hoist it, then subsequently inline it, rather than building it inlined. I'm not sure if there's a nice way to fix it short of having some kind of norm rule to re-number columns that could be passthrough columns after a normalization occurred?
Makes sense. Probably not worth trying to fix at this point...
pkg/sql/opt/optbuilder/misc_statements.go, line 59 at r7 (raw file):
emptyScope := &scope{builder: b} colTypes := []*types.T{types.String} inputScope := b.buildStmt(n.Queries, colTypes, emptyScope, ctx)
why do some of these have ctx.child()
but others just have ctx
?
pkg/sql/opt/xform/testdata/physprops/ordering, line 1163 at r4 (raw file):
Previously, justinj (Justin Jaffray) wrote…
We don't do so in the presence of side effects. We could maybe do it in some limited cases but it's by design that we want to hoist up all the mutations.
Ok! Thanks for the explanation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only definition I have is "when it needs to change", whereas scopes have a semantically-defined lifecycle (right? or else hopping up the scope tree to figure out the right level for an aggregation to live at wouldn't make sense). If you think there's a way to mangle this into the scope I'd be happy to discuss! I'm not super happy with this as it's ended up.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @justinj and @RaduBerinde)
36e3926
to
408476b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, finally rebased—PTAL! Sorry for how long this is taking—lots of finicky scope
stuff to work out.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @justinj, @RaduBerinde, and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me, just some minor comments!
Reviewed 1 of 43 files at r5, 10 of 44 files at r8, 10 of 12 files at r9, 24 of 24 files at r10.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @justinj and @RaduBerinde)
pkg/sql/opt/optbuilder/delete.go, line 44 at r8 (raw file):
var wrapWiths func(*scope) inScope, wrapWiths = b.processWith(del.With, inScope) defer func() { wrapWiths(outScope) }()
Why not just call wrapWiths()
before return
? The magic defer that modifies the result is usually trickier to think about.
We could also consider taking the processWith
call outside of these functions and change it to take a function for building the specific statement. Something like
func (b *Builder) processWith(with *tree.With, inScope *scope, buildStmt fn(inScope *scope) *scope) *scope {
...
inScope = b.buildCTEs(with, inScope)
outScope := buildStmt(inScope)
...
return outScope
}
In most cases the builder can just call this directly, e.g.
return b.processWith(ins.With, inScope, func (inScope *scope) *scope {
return b.buildInsert(ins, inScope)
})
We'd split buildSelect
to have a wrapper that does this since that is used in a bunch of places.
pkg/sql/opt/optbuilder/scope.go, line 286 at r10 (raw file):
} func (s *scope) makePresentationWithHiddenCols() physical.Presentation {
[nit] add a comment, this sounds like is should be used only in this special case
pkg/sql/opt/optbuilder/select.go, line 580 at r3 (raw file):
Previously, justinj (Justin Jaffray) wrote…
I think making this change is doable here but I'd rather make it a follow-up, just because there's some hairiness (it introduces two separate code-paths for where WITHs are introduced and I'm finding it tricky to square it with the rest of the changes here.
Sure, just file an issue so we don't forget
pkg/sql/opt/optbuilder/select.go, line 583 at r9 (raw file):
addedCTEs := make([]cteSource, len(with.CTEList)) // Make a fake subquery to ensure that no CTEs are correlated.
Add a TODO here if we plan to relax this
pkg/sql/opt/optbuilder/select.go, line 634 at r9 (raw file):
} // constructed within an EXPLAIN should not be hoisted above it, and so we need
Missing text
pkg/sql/opt/optbuilder/select.go, line 648 at r9 (raw file):
// flushCTEs adds With expressions on top of an expression. func (b *Builder) flushCTEs(expr memo.RelExpr) memo.RelExpr { var ctes []cteSource
[nit] ctes := b.cteStack[len(b.cteStack)-1]
on the first line would be easier to read
25cfa7a
to
5fbc8c3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTRs! Gonna finally try to merge this pupper
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @justinj, @RaduBerinde, and @rytaft)
pkg/sql/opt/optbuilder/delete.go, line 44 at r8 (raw file):
Previously, RaduBerinde wrote…
Why not just call
wrapWiths()
beforereturn
? The magic defer that modifies the result is usually trickier to think about.We could also consider taking the
processWith
call outside of these functions and change it to take a function for building the specific statement. Something likefunc (b *Builder) processWith(with *tree.With, inScope *scope, buildStmt fn(inScope *scope) *scope) *scope { ... inScope = b.buildCTEs(with, inScope) outScope := buildStmt(inScope) ... return outScope }In most cases the builder can just call this directly, e.g.
return b.processWith(ins.With, inScope, func (inScope *scope) *scope { return b.buildInsert(ins, inScope) })We'd split
buildSelect
to have a wrapper that does this since that is used in a bunch of places.
Done, yeah my apprehension for a while was that I wanted it to still be safe to just call buildSelect
, but after making this change I do think it's a lot better.
pkg/sql/opt/optbuilder/misc_statements.go, line 59 at r7 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
why do some of these have
ctx.child()
but others just havectx
?
Not relevant any more
pkg/sql/opt/optbuilder/scope.go, line 286 at r10 (raw file):
Previously, RaduBerinde wrote…
[nit] add a comment, this sounds like is should be used only in this special case
Done.
pkg/sql/opt/optbuilder/select.go, line 580 at r3 (raw file):
Previously, RaduBerinde wrote…
Sure, just file an issue so we don't forget
Done.
pkg/sql/opt/optbuilder/select.go, line 583 at r9 (raw file):
Previously, RaduBerinde wrote…
Add a TODO here if we plan to relax this
Done.
pkg/sql/opt/optbuilder/select.go, line 634 at r9 (raw file):
Previously, RaduBerinde wrote…
Missing text
Fixed.
pkg/sql/opt/optbuilder/select.go, line 648 at r9 (raw file):
Previously, RaduBerinde wrote…
[nit]
ctes := b.cteStack[len(b.cteStack)-1]
on the first line would be easier to read
Done.
Merge conflict (retrying...) |
1 similar comment
Merge conflict (retrying...) |
Merge conflict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebased, trying again
bors r+
Reviewable status:
complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @justinj, @RaduBerinde, and @rytaft)
Build failed |
This change simplifies hoisting CTEs quite a bit. Release note (sql change): Mutations in CTEs not at the top level are no longer allowed. This restriction is part of Postgres as well.
Release note (sql change): WITH expressions are now hoisted to the top level in a query when possible.
Release note: None
looked like a flake, givin it another go bors r+ |
41033: opt: hoist WITHs to the top level r=justinj a=justinj This commit changes how we build WITHs to always hoist them up to the highest level possible. In most cases this means they're hoisted up to be the root expression of the tree, but in some cases they cannot travel further up past certain operators which logically denote a different query (say, the input to an EXPLAIN). It also stops WITHs from being correlated, fixing an overlooked case from before. This will mean that pruning columns through WITH expressions is increasingly important and I will work on that next. Release note (sql change): WITH expressions can no longer be correlated, even when using LATERAL. Co-authored-by: Justin Jaffray <justin@cockroachlabs.com>
Build succeeded |
This commit changes how we build WITHs to always hoist them up to the
highest level possible. In most cases this means they're hoisted up to
be the root expression of the tree, but in some cases they cannot travel
further up past certain operators which logically denote a different
query (say, the input to an EXPLAIN).
It also stops WITHs from being correlated, fixing an overlooked case
from before.
This will mean that pruning columns through WITH expressions is
increasingly important and I will work on that next.
Release note (sql change): WITH expressions can no longer be correlated,
even when using LATERAL.