-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xform: multiple MIN/MAX subqueries rewrite, filter support #70854
Conversation
Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good!
nit: in the PR message you can use backticks to format the code nicely (single backticks for inline code elements, or three backticks at the beginning and end for blocks of code). You can just update the PR message directly in GitHub.
Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @msirek)
pkg/sql/opt/xform/groupby_funcs.go, line 57 at r1 (raw file):
newScanExpr := c.e.f.ConstructScan(newScanPrivate) var inputExpr memo.RelExpr
nit: you could define this earlier (maybe outside the for loop) and then directly assign the output of ConstructScan
to it.
pkg/sql/opt/xform/groupby_funcs.go, line 61 at r1 (raw file):
// If the input to the scalar group by is a Select with filters, remap the // column IDs in the filters and use that to build a new Select. if grp.Child(0).Op() == opt.SelectOp {
instead of checking that the group is a Select, I would just check that len(filters) > 0
pkg/sql/opt/xform/rules/groupby.opt, line 21 at r1 (raw file):
# ReplaceFilteredScalarMinMaxWithSubqueries replaces a scalar group by from a # SelectExpr having a Filter, a single ScanExpr and n > 1 MIN/MAX expressions
nit: all SelectExprs
should have a filter -- I'd just say a "SelectExpr with a ScanExpr as input and...".
pkg/sql/opt/xform/testdata/rules/groupby, line 1187 at r1 (raw file):
└── z:3 # We expect ReplaceScalarMinMaxWithScalarSubqueries not to be preferred here.
This comment isn't quite right. ReplaceScalarMinMaxWithScalarSubqueries
cannot be applied because there is a filter. ReplaceFilteredScalarMinMaxWithSubqueries
is applied, but the resulting plan is not chosen because the ordering of the scan does not match the aggregation column. (FYI, expect-not
means the rule did not generate any new expressions in the memo. expect
means the rule did generate a new expression in the memo, but it does not necessarily mean the new expression is chosen).
I would just leave the old comment as-is.
pkg/sql/opt/xform/testdata/rules/groupby, line 1333 at r1 (raw file):
# ReplaceFilteredScalarMinMaxWithSubqueries can be applied due to min/max on # leading index column.
nit: the rule will be applied regardless -- the plan is actually chosen because the min/max is on the leading index column. I'd just remove this comment.
pkg/sql/opt/xform/testdata/rules/groupby, line 1402 at r1 (raw file):
# ReplaceFilteredScalarMinMaxWithSubqueries can be applied due to min/max on # leading index column.
ditto
8e23bdb
to
be96571
Compare
Thank you for updating your pull request. My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing! Added the backticks.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @rytaft)
pkg/sql/opt/xform/groupby_funcs.go, line 57 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: you could define this earlier (maybe outside the for loop) and then directly assign the output of
ConstructScan
to it.
Done.
pkg/sql/opt/xform/groupby_funcs.go, line 61 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
instead of checking that the group is a Select, I would just check that
len(filters) > 0
Right, I guess this is after the rewrite phase, so the EliminateSelect rule probably wouldn't kick in. Made the change.
pkg/sql/opt/xform/rules/groupby.opt, line 21 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: all
SelectExprs
should have a filter -- I'd just say a "SelectExpr with a ScanExpr as input and...".
Done.
pkg/sql/opt/xform/testdata/rules/groupby, line 1187 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
This comment isn't quite right.
ReplaceScalarMinMaxWithScalarSubqueries
cannot be applied because there is a filter.ReplaceFilteredScalarMinMaxWithSubqueries
is applied, but the resulting plan is not chosen because the ordering of the scan does not match the aggregation column. (FYI,expect-not
means the rule did not generate any new expressions in the memo.expect
means the rule did generate a new expression in the memo, but it does not necessarily mean the new expression is chosen).I would just leave the old comment as-is.
The old comment was: 'ReplaceScalarMinMaxWithScalarSubqueries can only be applied to canonical
scans', doesn't really describe this case any more. I cleaned up the comment to describe what's actually going on and moved this test down into the the ReplaceFilteredScalarMinMaxWithSubqueries section. The new comment is:
We expect the ReplaceFilteredScalarMinMaxWithSubqueries and
ReplaceScalarMinMaxWithLimit rules to fire, however, as we know nothing about
the ordering of y on the index zyx after a scan on xyz with z > 0, a sort is
required, making the new plan more expensive and therefore not selected.
opt expect=ReplaceFilteredScalarMinMaxWithSubqueries expect=ReplaceScalarMinMaxWithLimit
SELECT min(y), max(x) FROM xyz@zyx WHERE z > 0
pkg/sql/opt/xform/testdata/rules/groupby, line 1333 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
nit: the rule will be applied regardless -- the plan is actually chosen because the min/max is on the leading index column. I'd just remove this comment.
Right, confusing statement. The leading index column just allows the sort requested in ReplaceScalarMinMaxWithLimit to be avoided. Removed these comments.
pkg/sql/opt/xform/testdata/rules/groupby, line 1402 at r1 (raw file):
Previously, rytaft (Rebecca Taft) wrote…
ditto
Removed.
be96571
to
b3b64bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
Reviewed 2 of 5 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @msirek and @rytaft)
-- commits, line 4 at r2:
nit: rewrite => rewrote
-- commits, line 6 at r2:
nit: is => was
-- commits, line 10 at r2:
nit: Adding empty lines around your examples here and below would make this easier to read.
-- commits, line 13 at r2:
nit: Make this a full sentence like "The purpose of this transformation was to allow each subquery to ...". It's a bit confusing to read this sentence extension after the examples.
-- commits, line 31 at r2:
This rule applies in more cases than described here. It applies for two or more min or max aggregates. This release note implies that the rule only works for min(a), max(b)
, but not min(a), min(b)
, or min(a), max(b), min(c)
. Maybe there's a way to make it more clear that these are optimized too?
-- commits, line 35 at r2:
nit: I'd move this example above the release note. I don't think an example in the release note is necessary.
pkg/sql/logictest/testdata/logic_test/aggregate, line 788 at r2 (raw file):
query FI SELECT max(z), min(x) FROM xyz having (max(z), min(x)) = (select max(z), min(x) from xyz)
nit: capitalize keywords from
, where
, select
, having
in these 3 tests. I think we have a linter that will enforce this, but I'm not sure.
pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 909 at r2 (raw file):
# Filter on derived table is supported. query T EXPLAIN SELECT max(z), min(x) FROM (select x,y,z from xyz a) dt where dt.y > 0
nit: capitalize keywords
pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 957 at r2 (raw file):
# Scalar subquery in filter is supported. query T EXPLAIN SELECT max(z), min(x) FROM xyz where (z,x) = (select max(z), min(x) from xyz)
nit: capitalize keywords
pkg/sql/opt/xform/groupby_funcs.go, line 61 at r1 (raw file):
Previously, msirek (Mark Sirek) wrote…
Right, I guess this is after the rewrite phase, so the EliminateSelect rule probably wouldn't kick in. Made the change.
EliminateSelect
should still kick in when you call c.e.f.ConstructSelect
. If you look at the internals of that function you'll see the generated code for EliminateSelect
. But you'll still want the if len(filters) > 0 {
check because MapFilterCols
will probably panic with nil
filters. I think @rytaft's suggestion was for the sake of simplicity.
pkg/sql/opt/xform/groupby_funcs.go, line 133 at r2 (raw file):
case opt.MinOp, opt.MaxOp: // The child of a min or max aggregation should always be a variable, but // we add a sanity check here anyway.
👍 Thanks for adding this!
pkg/sql/opt/xform/testdata/rules/groupby, line 1378 at r2 (raw file):
opt expect=ReplaceFilteredScalarMinMaxWithSubqueries SELECT max(z), min(x) FROM xyz WHERE y is not null
nit: capitalize IS NOT NULL
pkg/sql/opt/xform/testdata/rules/groupby, line 1447 at r2 (raw file):
# expressions which are projections. opt expect-not=ReplaceFilteredScalarMinMaxWithSubqueries SELECT max(z+1), min(x) FROM xyz WHERE y is not null
nit: capitalize IS NOT NULL
pkg/sql/opt/xform/testdata/rules/groupby, line 1480 at r2 (raw file):
# ORDER BY should not disable ReplaceFilteredScalarMinMaxWithSubqueries. opt expect=ReplaceFilteredScalarMinMaxWithSubqueries SELECT max(z), min(x) FROM xyz WHERE y is not null order by 1
nit: capitalize IS NOT NULL ORDER BY
pkg/sql/opt/xform/testdata/rules/groupby, line 1550 at r2 (raw file):
# the ordering of y on the index zyx after a scan on xyz with z > 0, a sort is # required, making the new plan more expensive and therefore not selected. opt expect=ReplaceFilteredScalarMinMaxWithSubqueries expect=ReplaceScalarMinMaxWithLimit
To expect multipled rules, use expect=(FirstRule,SecondRule)
rather than two expect
s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once you address comments from @mgartner. Nice work!
Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @msirek)
Previously, we only rewrote single-table queries with scalar min and max aggregate expressions into separate subqueries if there was no WHERE clause, for example: SELECT min(pk_col1), max(pk_col1) FROM t1; becomes: SELECT (SELECT min(pk_col1) FROM t1), (SELECT max(pk_col1) FROM t1); The purpose of this transformation was to allow each subquery to utilize a limited scan via the primary key. This was inadequate because it is of limited value. Many user queries use a WHERE clause, so WHERE clause support is crucial to allow this optimization to be applied in more cases. To address this, this patch adds a new rewrite rule named ReplaceFilteredScalarMinMaxWithSubqueries to detect a Select as input to two or more scalar MIN or MAX expressions. If present, a new Select is built for each MIN or MAX expression with the same Filters (WHERE clause) as the original Select, but with mapped column IDs in a copy of the tree. Each new Select is used as input to a newly constructed subquery and combined via a values clause. The values clause replaces the original ScalarGroupByExpr. Example: CREATE TABLE t1 (a INT, b INT, PRIMARY KEY(a)); EXPLAIN (OPT) SELECT min(a), max(a) FROM t1 WHERE b < 6; --------------------------------------------- values └── tuple ├── subquery │ └── scalar-group-by │ ├── limit │ │ ├── select │ │ │ ├── scan t1 │ │ │ └── filters │ │ │ └── b < 6 │ │ └── 1 │ └── aggregations │ └── const-agg │ └── a └── subquery └── scalar-group-by ├── limit │ ├── select │ │ ├── scan t1,rev │ │ └── filters │ │ └── b < 6 │ └── 1 └── aggregations └── const-agg └── a Release note (performance improvement): A SELECT query from a single table with more than one MIN or MAX scalar aggregate expression and a WHERE clause can now be performed with LIMITED SCANs, one per aggregate expression, instead of a single FULL SCAN. Note: No other aggregate function, such as SUM, may be present in the query block in order for it to be eligible for this transformation. This optimization should occur when each MIN or MAX expression involves a leading index column, so that a sort is not required for the limit operation, and the resulting query plan will appear cheapest to the optimizer.
b3b64bf
to
8b7de6e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)
Previously, mgartner (Marcus Gartner) wrote…
nit: rewrite => rewrote
Done.
Previously, mgartner (Marcus Gartner) wrote…
nit: is => was
Done.
Previously, mgartner (Marcus Gartner) wrote…
nit: Adding empty lines around your examples here and below would make this easier to read.
Done.
The purpose of this transformation was to allow each subquery to
Done.
Previously, mgartner (Marcus Gartner) wrote…
This rule applies in more cases than described here. It applies for two or more min or max aggregates. This release note implies that the rule only works for
min(a), max(b)
, but notmin(a), min(b)
, ormin(a), max(b), min(c)
. Maybe there's a way to make it more clear that these are optimized too?
OK, changed it to the following. What do you think?
Release note (performance improvement): A SELECT query from
a single table with more than one MIN or MAX scalar aggregate
expression and a WHERE clause can now be performed with LIMITED SCANs,
one per aggregate expression, instead of a single FULL SCAN.
Note: No other aggregate function, such as SUM, may be present
in the query block in order for it to be eligible for
this transformation. This optimization should occur when
each MIN or MAX expression involves a leading index column,
so that a sort is not required for the limit operation, and
the resulting query plan will appear cheapest to the optimizer.
Previously, mgartner (Marcus Gartner) wrote…
nit: I'd move this example above the release note. I don't think an example in the release note is necessary.
Done.
pkg/sql/logictest/testdata/logic_test/aggregate, line 788 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: capitalize keywords
from
,where
,select
,having
in these 3 tests. I think we have a linter that will enforce this, but I'm not sure.
Done.
pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 909 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: capitalize keywords
Done.
pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 957 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: capitalize keywords
Done.
pkg/sql/opt/xform/groupby_funcs.go, line 61 at r1 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
EliminateSelect
should still kick in when you callc.e.f.ConstructSelect
. If you look at the internals of that function you'll see the generated code forEliminateSelect
. But you'll still want theif len(filters) > 0 {
check becauseMapFilterCols
will probably panic withnil
filters. I think @rytaft's suggestion was for the sake of simplicity.
OK, thanks.
pkg/sql/opt/xform/groupby_funcs.go, line 133 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
👍 Thanks for adding this!
Sure
pkg/sql/opt/xform/testdata/rules/groupby, line 1378 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: capitalize
IS NOT NULL
Done.
pkg/sql/opt/xform/testdata/rules/groupby, line 1447 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: capitalize
IS NOT NULL
Done.
pkg/sql/opt/xform/testdata/rules/groupby, line 1480 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
nit: capitalize
IS NOT NULL ORDER BY
Done.
pkg/sql/opt/xform/testdata/rules/groupby, line 1550 at r2 (raw file):
Previously, mgartner (Marcus Gartner) wrote…
To expect multipled rules, use
expect=(FirstRule,SecondRule)
rather than twoexpect
s.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)
Previously, msirek (Mark Sirek) wrote…
OK, changed it to the following. What do you think?
Release note (performance improvement): A SELECT query from
a single table with more than one MIN or MAX scalar aggregate
expression and a WHERE clause can now be performed with LIMITED SCANs,
one per aggregate expression, instead of a single FULL SCAN.Note: No other aggregate function, such as SUM, may be present
in the query block in order for it to be eligible for
this transformation. This optimization should occur when
each MIN or MAX expression involves a leading index column,
so that a sort is not required for the limit operation, and
the resulting query plan will appear cheapest to the optimizer.
Looks great!
bors r+ |
Build succeeded: |
Fixes #70735
Previously, we only rewrote single-table queries with scalar
min and max aggregate expressions into separate subqueries
if there was no WHERE clause, for example:
becomes:
The purpose of this transformation was to allow each subquery
to utilize a limited scan via the primary key.
This was inadequate because it is of limited value.
Many user queries use a WHERE clause, so WHERE clause
support is crucial to allow this optimization to be
applied in more cases.
To address this, this patch adds a new rewrite rule named
ReplaceFilteredScalarMinMaxWithSubqueries to detect a Select
as input to two or more scalar MIN or MAX expressions. If present,
a new Select is built for each MIN or MAX expression with the same
Filters (WHERE clause) as the original Select, but with mapped column
IDs in a copy of the tree. Each new Select is used as input
to a newly constructed subquery and combined via a values clause.
The values clause replaces the original ScalarGroupByExpr.
Example:
Release note (performance improvement): A SELECT query from
a single table with more than one MIN or MAX scalar aggregate
expression and a WHERE clause can now be performed with LIMITED SCANs,
one per aggregate expression, instead of a single FULL SCAN.
Note: No other aggregate function, such as SUM, may be present
in the query block in order for it to be eligible for
this transformation. This optimization should occur when
each MIN or MAX expression involves a leading index column,
so that a sort is not required for the limit operation, and
the resulting query plan will appear cheapest to the optimizer.