Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xform: multiple MIN/MAX subqueries rewrite, filter support #70854

Merged
merged 1 commit into from
Sep 30, 2021

Conversation

msirek
Copy link
Contributor

@msirek msirek commented Sep 29, 2021

Fixes #70735

Previously, we only rewrote single-table queries with scalar
min and max aggregate expressions into separate subqueries
if there was no WHERE clause, for example:

    SELECT min(pk_col1), max(pk_col1) FROM t1;

becomes:

    SELECT (SELECT min(pk_col1) FROM t1),
           (SELECT max(pk_col1) FROM t1);

The purpose of this transformation was to allow each subquery
to utilize a limited scan via the primary key.

This was inadequate because it is of limited value.
Many user queries use a WHERE clause, so WHERE clause
support is crucial to allow this optimization to be
applied in more cases.

To address this, this patch adds a new rewrite rule named
ReplaceFilteredScalarMinMaxWithSubqueries to detect a Select
as input to two or more scalar MIN or MAX expressions. If present,
a new Select is built for each MIN or MAX expression with the same
Filters (WHERE clause) as the original Select, but with mapped column
IDs in a copy of the tree. Each new Select is used as input
to a newly constructed subquery and combined via a values clause.
The values clause replaces the original ScalarGroupByExpr.

Example:

    CREATE TABLE t1 (a INT, b INT, PRIMARY KEY(a));
    
    EXPLAIN (OPT) SELECT min(a), max(a) FROM t1 WHERE b < 6;
    ---------------------------------------------
      values
       └── tuple
            ├── subquery
            │    └── scalar-group-by
            │         ├── limit
            │         │    ├── select
            │         │    │    ├── scan t1
            │         │    │    └── filters
            │         │    │         └── b < 6
            │         │    └── 1
            │         └── aggregations
            │              └── const-agg
            │                   └── a
            └── subquery
                 └── scalar-group-by
                      ├── limit
                      │    ├── select
                      │    │    ├── scan t1,rev
                      │    │    └── filters
                      │    │         └── b < 6
                      │    └── 1
                      └── aggregations
                           └── const-agg
                                └── a

Release note (performance improvement): A SELECT query from
a single table with more than one MIN or MAX scalar aggregate
expression and a WHERE clause can now be performed with LIMITED SCANs,
one per aggregate expression, instead of a single FULL SCAN.

Note: No other aggregate function, such as SUM, may be present
in the query block in order for it to be eligible for
this transformation. This optimization should occur when
each MIN or MAX expression involves a leading index column,
so that a sort is not required for the limit operation, and
the resulting query plan will appear cheapest to the optimizer.

@msirek msirek requested a review from a team as a code owner September 29, 2021 01:56
@blathers-crl
Copy link

blathers-crl bot commented Sep 29, 2021

Thank you for contributing to CockroachDB. Please ensure you have followed the guidelines for creating a PR.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added the O-community Originated from the community label Sep 29, 2021
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

nit: in the PR message you can use backticks to format the code nicely (single backticks for inline code elements, or three backticks at the beginning and end for blocks of code). You can just update the PR message directly in GitHub.

Reviewed 5 of 5 files at r1, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msirek)


pkg/sql/opt/xform/groupby_funcs.go, line 57 at r1 (raw file):

		newScanExpr := c.e.f.ConstructScan(newScanPrivate)

		var inputExpr memo.RelExpr

nit: you could define this earlier (maybe outside the for loop) and then directly assign the output of ConstructScan to it.


pkg/sql/opt/xform/groupby_funcs.go, line 61 at r1 (raw file):

		// If the input to the scalar group by is a Select with filters, remap the
		// column IDs in the filters and use that to build a new Select.
		if grp.Child(0).Op() == opt.SelectOp {

instead of checking that the group is a Select, I would just check that len(filters) > 0


pkg/sql/opt/xform/rules/groupby.opt, line 21 at r1 (raw file):


# ReplaceFilteredScalarMinMaxWithSubqueries replaces a scalar group by from a
# SelectExpr having a Filter, a single ScanExpr and n > 1 MIN/MAX expressions

nit: all SelectExprs should have a filter -- I'd just say a "SelectExpr with a ScanExpr as input and...".


pkg/sql/opt/xform/testdata/rules/groupby, line 1187 at r1 (raw file):

           └── z:3

# We expect ReplaceScalarMinMaxWithScalarSubqueries not to be preferred here.

This comment isn't quite right. ReplaceScalarMinMaxWithScalarSubqueries cannot be applied because there is a filter. ReplaceFilteredScalarMinMaxWithSubqueries is applied, but the resulting plan is not chosen because the ordering of the scan does not match the aggregation column. (FYI, expect-not means the rule did not generate any new expressions in the memo. expect means the rule did generate a new expression in the memo, but it does not necessarily mean the new expression is chosen).

I would just leave the old comment as-is.


pkg/sql/opt/xform/testdata/rules/groupby, line 1333 at r1 (raw file):


# ReplaceFilteredScalarMinMaxWithSubqueries can be applied due to min/max on
# leading index column.

nit: the rule will be applied regardless -- the plan is actually chosen because the min/max is on the leading index column. I'd just remove this comment.


pkg/sql/opt/xform/testdata/rules/groupby, line 1402 at r1 (raw file):


# ReplaceFilteredScalarMinMaxWithSubqueries can be applied due to min/max on
# leading index column.

ditto

@blathers-crl
Copy link

blathers-crl bot commented Sep 30, 2021

Thank you for updating your pull request.

My owl senses detect your PR is good for review. Please keep an eye out for any test failures in CI.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing! Added the backticks.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @rytaft)


pkg/sql/opt/xform/groupby_funcs.go, line 57 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: you could define this earlier (maybe outside the for loop) and then directly assign the output of ConstructScan to it.

Done.


pkg/sql/opt/xform/groupby_funcs.go, line 61 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

instead of checking that the group is a Select, I would just check that len(filters) > 0

Right, I guess this is after the rewrite phase, so the EliminateSelect rule probably wouldn't kick in. Made the change.


pkg/sql/opt/xform/rules/groupby.opt, line 21 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: all SelectExprs should have a filter -- I'd just say a "SelectExpr with a ScanExpr as input and...".

Done.


pkg/sql/opt/xform/testdata/rules/groupby, line 1187 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

This comment isn't quite right. ReplaceScalarMinMaxWithScalarSubqueries cannot be applied because there is a filter. ReplaceFilteredScalarMinMaxWithSubqueries is applied, but the resulting plan is not chosen because the ordering of the scan does not match the aggregation column. (FYI, expect-not means the rule did not generate any new expressions in the memo. expect means the rule did generate a new expression in the memo, but it does not necessarily mean the new expression is chosen).

I would just leave the old comment as-is.

The old comment was: 'ReplaceScalarMinMaxWithScalarSubqueries can only be applied to canonical
scans', doesn't really describe this case any more. I cleaned up the comment to describe what's actually going on and moved this test down into the the ReplaceFilteredScalarMinMaxWithSubqueries section. The new comment is:

We expect the ReplaceFilteredScalarMinMaxWithSubqueries and

ReplaceScalarMinMaxWithLimit rules to fire, however, as we know nothing about

the ordering of y on the index zyx after a scan on xyz with z > 0, a sort is

required, making the new plan more expensive and therefore not selected.

opt expect=ReplaceFilteredScalarMinMaxWithSubqueries expect=ReplaceScalarMinMaxWithLimit
SELECT min(y), max(x) FROM xyz@zyx WHERE z > 0


pkg/sql/opt/xform/testdata/rules/groupby, line 1333 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

nit: the rule will be applied regardless -- the plan is actually chosen because the min/max is on the leading index column. I'd just remove this comment.

Right, confusing statement. The leading index column just allows the sort requested in ReplaceScalarMinMaxWithLimit to be avoided. Removed these comments.


pkg/sql/opt/xform/testdata/rules/groupby, line 1402 at r1 (raw file):

Previously, rytaft (Rebecca Taft) wrote…

ditto

Removed.

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Reviewed 2 of 5 files at r1, 3 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @msirek and @rytaft)


-- commits, line 4 at r2:
nit: rewrite => rewrote


-- commits, line 6 at r2:
nit: is => was


-- commits, line 10 at r2:
nit: Adding empty lines around your examples here and below would make this easier to read.


-- commits, line 13 at r2:
nit: Make this a full sentence like "The purpose of this transformation was to allow each subquery to ...". It's a bit confusing to read this sentence extension after the examples.


-- commits, line 31 at r2:
This rule applies in more cases than described here. It applies for two or more min or max aggregates. This release note implies that the rule only works for min(a), max(b), but not min(a), min(b), or min(a), max(b), min(c). Maybe there's a way to make it more clear that these are optimized too?


-- commits, line 35 at r2:
nit: I'd move this example above the release note. I don't think an example in the release note is necessary.


pkg/sql/logictest/testdata/logic_test/aggregate, line 788 at r2 (raw file):


query FI
SELECT max(z), min(x) FROM xyz having (max(z), min(x)) = (select max(z), min(x) from xyz)

nit: capitalize keywords from, where, select, having in these 3 tests. I think we have a linter that will enforce this, but I'm not sure.


pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 909 at r2 (raw file):

# Filter on derived table is supported.
query T
EXPLAIN SELECT max(z), min(x) FROM (select x,y,z from xyz a) dt where dt.y > 0

nit: capitalize keywords


pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 957 at r2 (raw file):

# Scalar subquery in filter is supported.
query T
EXPLAIN SELECT max(z), min(x) FROM xyz where (z,x) = (select max(z), min(x) from xyz)

nit: capitalize keywords


pkg/sql/opt/xform/groupby_funcs.go, line 61 at r1 (raw file):

Previously, msirek (Mark Sirek) wrote…

Right, I guess this is after the rewrite phase, so the EliminateSelect rule probably wouldn't kick in. Made the change.

EliminateSelect should still kick in when you call c.e.f.ConstructSelect. If you look at the internals of that function you'll see the generated code for EliminateSelect. But you'll still want the if len(filters) > 0 { check because MapFilterCols will probably panic with nil filters. I think @rytaft's suggestion was for the sake of simplicity.


pkg/sql/opt/xform/groupby_funcs.go, line 133 at r2 (raw file):

		case opt.MinOp, opt.MaxOp:
			// The child of a min or max aggregation should always be a variable, but
			// we add a sanity check here anyway.

👍 Thanks for adding this!


pkg/sql/opt/xform/testdata/rules/groupby, line 1378 at r2 (raw file):


opt expect=ReplaceFilteredScalarMinMaxWithSubqueries
SELECT max(z), min(x) FROM xyz WHERE y is not null

nit: capitalize IS NOT NULL


pkg/sql/opt/xform/testdata/rules/groupby, line 1447 at r2 (raw file):

# expressions which are projections.
opt expect-not=ReplaceFilteredScalarMinMaxWithSubqueries
SELECT max(z+1), min(x) FROM xyz WHERE y is not null

nit: capitalize IS NOT NULL


pkg/sql/opt/xform/testdata/rules/groupby, line 1480 at r2 (raw file):

# ORDER BY should not disable ReplaceFilteredScalarMinMaxWithSubqueries.
opt expect=ReplaceFilteredScalarMinMaxWithSubqueries
SELECT max(z), min(x) FROM xyz WHERE y is not null order by 1

nit: capitalize IS NOT NULL ORDER BY


pkg/sql/opt/xform/testdata/rules/groupby, line 1550 at r2 (raw file):

# the ordering of y on the index zyx after a scan on xyz with z > 0, a sort is
# required, making the new plan more expensive and therefore not selected.
opt expect=ReplaceFilteredScalarMinMaxWithSubqueries expect=ReplaceScalarMinMaxWithLimit

To expect multipled rules, use expect=(FirstRule,SecondRule) rather than two expects.

Copy link
Collaborator

@rytaft rytaft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: once you address comments from @mgartner. Nice work!

Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @msirek)

Previously, we only rewrote single-table queries with scalar
min and max aggregate expressions into separate subqueries
if there was no WHERE clause, for example:

    SELECT min(pk_col1), max(pk_col1) FROM t1;

becomes:

    SELECT (SELECT min(pk_col1) FROM t1),
           (SELECT max(pk_col1) FROM t1);

The purpose of this transformation was to allow each subquery
to utilize a limited scan via the primary key.

This was inadequate because it is of limited value.
Many user queries use a WHERE clause, so WHERE clause
support is crucial to allow this optimization to be
applied in more cases.

To address this, this patch adds a new rewrite rule named
ReplaceFilteredScalarMinMaxWithSubqueries to detect a Select
as input to two or more scalar MIN or MAX expressions. If present,
a new Select is built for each MIN or MAX expression with the same
Filters (WHERE clause) as the original Select, but with mapped column
IDs in a copy of the tree. Each new Select is used as input
to a newly constructed subquery and combined via a values clause.
The values clause replaces the original ScalarGroupByExpr.

Example:

    CREATE TABLE t1 (a INT, b INT, PRIMARY KEY(a));

    EXPLAIN (OPT) SELECT min(a), max(a) FROM t1 WHERE b < 6;
    ---------------------------------------------
      values
       └── tuple
            ├── subquery
            │    └── scalar-group-by
            │         ├── limit
            │         │    ├── select
            │         │    │    ├── scan t1
            │         │    │    └── filters
            │         │    │         └── b < 6
            │         │    └── 1
            │         └── aggregations
            │              └── const-agg
            │                   └── a
            └── subquery
                 └── scalar-group-by
                      ├── limit
                      │    ├── select
                      │    │    ├── scan t1,rev
                      │    │    └── filters
                      │    │         └── b < 6
                      │    └── 1
                      └── aggregations
                           └── const-agg
                                └── a

Release note (performance improvement): A SELECT query from
a single table with more than one MIN or MAX scalar aggregate
expression and a WHERE clause can now be performed with LIMITED SCANs,
one per aggregate expression, instead of a single FULL SCAN.

Note: No other aggregate function, such as SUM, may be present
      in the query block in order for it to be eligible for
      this transformation. This optimization should occur when
      each MIN or MAX expression involves a leading index column,
      so that a sort is not required for the limit operation, and
      the resulting query plan will appear cheapest to the optimizer.
Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)


-- commits, line 4 at r2:

Previously, mgartner (Marcus Gartner) wrote…

nit: rewrite => rewrote

Done.


-- commits, line 6 at r2:

Previously, mgartner (Marcus Gartner) wrote…

nit: is => was

Done.


-- commits, line 10 at r2:

Previously, mgartner (Marcus Gartner) wrote…

nit: Adding empty lines around your examples here and below would make this easier to read.

Done.


-- commits, line 13 at r2:

The purpose of this transformation was to allow each subquery to
Done.


-- commits, line 31 at r2:

Previously, mgartner (Marcus Gartner) wrote…

This rule applies in more cases than described here. It applies for two or more min or max aggregates. This release note implies that the rule only works for min(a), max(b), but not min(a), min(b), or min(a), max(b), min(c). Maybe there's a way to make it more clear that these are optimized too?

OK, changed it to the following. What do you think?

Release note (performance improvement): A SELECT query from
a single table with more than one MIN or MAX scalar aggregate
expression and a WHERE clause can now be performed with LIMITED SCANs,
one per aggregate expression, instead of a single FULL SCAN.

Note: No other aggregate function, such as SUM, may be present
in the query block in order for it to be eligible for
this transformation. This optimization should occur when
each MIN or MAX expression involves a leading index column,
so that a sort is not required for the limit operation, and
the resulting query plan will appear cheapest to the optimizer.


-- commits, line 35 at r2:

Previously, mgartner (Marcus Gartner) wrote…

nit: I'd move this example above the release note. I don't think an example in the release note is necessary.

Done.


pkg/sql/logictest/testdata/logic_test/aggregate, line 788 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: capitalize keywords from, where, select, having in these 3 tests. I think we have a linter that will enforce this, but I'm not sure.

Done.


pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 909 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: capitalize keywords

Done.


pkg/sql/opt/exec/execbuilder/testdata/aggregate, line 957 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: capitalize keywords

Done.


pkg/sql/opt/xform/groupby_funcs.go, line 61 at r1 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

EliminateSelect should still kick in when you call c.e.f.ConstructSelect. If you look at the internals of that function you'll see the generated code for EliminateSelect. But you'll still want the if len(filters) > 0 { check because MapFilterCols will probably panic with nil filters. I think @rytaft's suggestion was for the sake of simplicity.

OK, thanks.


pkg/sql/opt/xform/groupby_funcs.go, line 133 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

👍 Thanks for adding this!

Sure


pkg/sql/opt/xform/testdata/rules/groupby, line 1378 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: capitalize IS NOT NULL

Done.


pkg/sql/opt/xform/testdata/rules/groupby, line 1447 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: capitalize IS NOT NULL

Done.


pkg/sql/opt/xform/testdata/rules/groupby, line 1480 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

nit: capitalize IS NOT NULL ORDER BY

Done.


pkg/sql/opt/xform/testdata/rules/groupby, line 1550 at r2 (raw file):

Previously, mgartner (Marcus Gartner) wrote…

To expect multipled rules, use expect=(FirstRule,SecondRule) rather than two expects.

Done.

Copy link
Contributor Author

@msirek msirek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner and @rytaft)

Copy link
Collaborator

@mgartner mgartner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thanks again.
:lgtm:

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @mgartner)


-- commits, line 31 at r2:

Previously, msirek (Mark Sirek) wrote…

OK, changed it to the following. What do you think?

Release note (performance improvement): A SELECT query from
a single table with more than one MIN or MAX scalar aggregate
expression and a WHERE clause can now be performed with LIMITED SCANs,
one per aggregate expression, instead of a single FULL SCAN.

Note: No other aggregate function, such as SUM, may be present
in the query block in order for it to be eligible for
this transformation. This optimization should occur when
each MIN or MAX expression involves a leading index column,
so that a sort is not required for the limit operation, and
the resulting query plan will appear cheapest to the optimizer.

Looks great!

@mgartner
Copy link
Collaborator

bors r+

@craig
Copy link
Contributor

craig bot commented Sep 30, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-community Originated from the community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt: support for filters in the multiple MIN/MAX subqueries rewrite
4 participants