-
-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subquery caching rules onceAfterAll #1470
Conversation
We cannot cache subquery results for a scope until the child scope has been finalized. A child scope cannot be finalized until the parent scope is finalized. Caching subqueries has to happen after all scopes are finalized. Applying hash lookups in the old way is much more difficult with this setup. This also exposed some bugs related to cacheability checks and subquery/union finalization recursion.
sql/analyzer/scope.go
Outdated
@@ -80,6 +79,8 @@ func (s *Scope) newScopeFromSubqueryAlias(sqa *plan.SubqueryAlias) *Scope { | |||
// We don't include the current inner node so that the outer scope nodes are still present, but not the lateral nodes | |||
if s.currentNodeIsFromSubqueryExpression { | |||
sqa.OuterScopeVisibility = true | |||
} | |||
if sqa.OuterScopeVisibility { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this concerns me, can't really explain yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrm... yeah... that does seem weird... leaving line 84 inside the if s.currentNodeIsFromSubqueryExpression
block doesn't work? Seems like that currentNodeIsFromSubqueryExpression
bit isn't getting set consistently?
It would probably be interesting to see if we can catch it where s.currentNodeIsFromSubqueryExpression == false && sqa.OuterScopeVisibility == true
. That shouldn't ever happen, but if you need that separate if
block, seems like it must be happening somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my later changes appeared to resolved this, still don't remember which test broke, but reverting back to the original code
…lthub/go-mysql-server into max/cache-sq-after-resolving-children
@@ -316,11 +373,9 @@ var PlanTests = []QueryPlanTest{ | |||
Query: `with cte(a,b) as (select * from ab) select * from xy where exists (select * from cte where a = x)`, | |||
ExpectedPlan: "SemiJoin(cte.a = xy.x)\n" + | |||
" ├─ Table(xy)\n" + | |||
" └─ HashLookup(child: (cte.a), lookup: (xy.x))\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was an invalid plan, partial joins are not correct with hash lookups yet. need a new operator to support this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very awesome changes @max-hoffman! 👏
Just some minor comments and questions.
sql/analyzer/exec_builder.go
Outdated
case plan.JoinTypeSemi: | ||
newOp = plan.JoinTypeSemiHash | ||
case plan.JoinTypeAnti: | ||
newOp = plan.JoinTypeAntiHash | ||
default: | ||
panic("can only apply hash join to InnerJoin or LeftOuterJoin") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor)
panic("can only apply hash join to InnerJoin or LeftOuterJoin") | |
panic("can only apply hash join to InnerJoin, LeftOuterJoin, SemiJoin, or AntiJoin") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually... similar to another comment below... would be safer to just return an error. Seems like that would still have the effect of failing the statement and communicating the message, but wouldn't crash the whole process if somehow this codepath becomes active.
sql/analyzer/scope.go
Outdated
@@ -80,6 +79,8 @@ func (s *Scope) newScopeFromSubqueryAlias(sqa *plan.SubqueryAlias) *Scope { | |||
// We don't include the current inner node so that the outer scope nodes are still present, but not the lateral nodes | |||
if s.currentNodeIsFromSubqueryExpression { | |||
sqa.OuterScopeVisibility = true | |||
} | |||
if sqa.OuterScopeVisibility { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hrm... yeah... that does seem weird... leaving line 84 inside the if s.currentNodeIsFromSubqueryExpression
block doesn't work? Seems like that currentNodeIsFromSubqueryExpression
bit isn't getting set consistently?
It would probably be interesting to see if we can catch it where s.currentNodeIsFromSubqueryExpression == false && sqa.OuterScopeVisibility == true
. That shouldn't ever happen, but if you need that separate if
block, seems like it must be happening somewhere?
sql/analyzer/analyzer.go
Outdated
parallelizeId, | ||
pushdownFiltersId, | ||
subqueryIndexesId: | ||
finalizeSubqueryExprsId: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
finalizeSubqueries
now takes care of subqueries as well as subquery expressions, so you can kill off finalizeSubqueryExprsId
(sorry I didn't clean that up when I combined the rules!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
// recursively handle subqueries from the bottom of the plan up. | ||
case | ||
// skip recursive resolve rules | ||
resolveSubqueryExprsId, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(minor) similar comment for resolveSubqueryExprs
... I needed to combine the handling of subqs and subq expressions when I added outer scope visibility to sqas so the scopes would be nested correctly, so the resolveSubqueryExprs
rule has been removed. Sorry for missing cleaning up these Ids!
target = c.Name() | ||
case *plan.RecursiveCte: | ||
case *plan.UnresolvedTable: | ||
panic("Table not resolved") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these need to be panics? Seems like we could just return an error
and still fail the statement. Having the entire sql-server process crash from a single bad statement sucks. I know we don't expect these to ever be hit, but seems like a good process to avoid future crashes.
case *plan.TableAlias: | ||
alias = n.Name() | ||
switch c := n.Child.(type) { | ||
case *plan.ResolvedTable, *plan.SubqueryAlias, *plan.ValueDerivedTable, *plan.TransformedNamedNode, *plan.RecursiveTable, *plan.DeferredAsOfTable: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth using the sql.Table
interface to capture table types more aggressively? For example... would other implementations like FilteredTable
be valid here? (not sure if we'd push a filter down through a TableAlias currently or not, just brainstorming)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think sql.Table
is normally under one of these nodes, not a plan node itself, although the main two ResolvedTable
and IndexedTableAccess
do implement sql.Table
.Mmight need a different interface i the future to capture TableNode
return nil | ||
} | ||
indexes, err := newIndexAnalyzerForNode(ctx, rt) | ||
if err != nil { | ||
return nil | ||
} | ||
defer indexes.releaseUsedIndexes() | ||
idx := indexes.MatchingIndex(ctx, ctx.GetCurrentDatabase(), rt.Name(), normalizeExpressions(tableAliases, gf)...) | ||
if rt := scope.get(strings.ToLower(gf.Table())); rt != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(super minor) might be worth adding the strings.ToLower
to the aliasScope.get
method so that it's symmetric with how the aliasScope.add
automatically lower cases the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
We cannot cache subquery results for a scope until the child scope has been finalized. A child scope cannot be finalized until the parent scope is finalized. Caching subqueries has to happen after all scopes are finalized. Applying hash lookups in the old way is much more difficult with this setup. This also exposed some bugs related to cacheability checks and subquery/union finalization recursion. Rewrite rules that depend on subquery caching to be in order traversals in OnceAfterAll.
The original bug sourced from a bad resolve in a query like this:
The join and two scopes appear necessary to trigger the original bug. The join's children will be unresolved when we index the inner scopes, whereas a single relation scope's child will be a resolved table. The two scopes provide a gap for determining cachability in the middle scope while the child scope still had a
deferredColumn
, which previously permitted caching.GMS bump: dolthub/dolt#4987