Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fix incorrect ordering inside distributed streaming aggregation #37713

Merged

Conversation

Projects
None yet
4 participants
@RaduBerinde
Copy link
Member

commented May 21, 2019

When running a streaming aggregation, groupNode had the list of
ordered columns, but not the exact ordering. The exact ordering is
needed when planning a distributed aggregation - this ordering must be
maintained between the local and the final stage. The distsql planner
was trying to deduce the ordering from the input, which is usually
OK but not in all cases: if another column is known to be equal with
a grouping column, that column could be used in the input ordering
instead of the grouping column.

The fix is to store the ordering in groupNode. We also display the
ordering in EXPLAIN output, and fix the grouping columns to also use
column names for consistency.

Fixes #37211.

Release note (bug fix): Fixed incorrect results or "incorrectly
ordered stream" error in some cases of queries with aggregations.

@RaduBerinde RaduBerinde requested review from jordanlewis and justinj May 21, 2019

@RaduBerinde RaduBerinde requested review from cockroachdb/distsql-prs as code owners May 21, 2019

@cockroach-teamcity

This comment has been minimized.

Copy link
Member

commented May 21, 2019

This change is Reviewable

@RaduBerinde RaduBerinde force-pushed the RaduBerinde:fix-groupby-streaming-ordering branch 2 times, most recently from dae09cb to d23f296 May 22, 2019

@justinj
Copy link
Member

left a comment

:lgtm:

Reviewed 27 of 27 files at r1.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @jordanlewis and @RaduBerinde)


pkg/sql/distsql_physical_planner.go, line 1653 at r1 (raw file):

			}
			if !found {
				return errors.Errorf("group column ordering contains non-grouping column %d", o.ColIdx)

should this be an assertionerror?


pkg/sql/explain_tree_test.go, line 130 at r1 (raw file):

2 .aggregate 0 cid (cid int, sum decimal) weak-key(cid)
2 .aggregate 1 sum(value) (cid int, sum decimal) weak-key(cid)
2 .group by cid (cid int, sum decimal) weak-key(cid)

I didn't know we still made explain plans that looked like this...


pkg/sql/opt/ordering/group_by.go, line 125 at r1 (raw file):

			ordering = append(
				ordering,
				opt.MakeOrderingColumn(opt.ColumnID(col), inputOrdering.Columns[i].Descending),

Not sure I understand what's going on here. Say we have a==b as an equality FD and ORDER BY a, c. Does this mean we'll report a,b,c as the ordering here?

In what circumstances can someone make use of b here? It seems to me that in any situation where a consumer of this ordering just knew the ordering but not about the equality relationship between a and b they wouldn't be able to make any inferences based on knowledge of b and not a, is that wrong?

@RaduBerinde
Copy link
Member Author

left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis and @justinj)


pkg/sql/distsql_physical_planner.go, line 1653 at r1 (raw file):

Previously, justinj (Justin Jaffray) wrote…

should this be an assertionerror?

Done (though there are a lot of other instances in this file that I left alone).


pkg/sql/explain_tree_test.go, line 130 at r1 (raw file):

Previously, justinj (Justin Jaffray) wrote…

I didn't know we still made explain plans that looked like this...

I don't know what this is and I'd love to keep it that way :)


pkg/sql/opt/ordering/group_by.go, line 125 at r1 (raw file):

Previously, justinj (Justin Jaffray) wrote…

Not sure I understand what's going on here. Say we have a==b as an equality FD and ORDER BY a, c. Does this mean we'll report a,b,c as the ordering here?

In what circumstances can someone make use of b here? It seems to me that in any situation where a consumer of this ordering just knew the ordering but not about the equality relationship between a and b they wouldn't be able to make any inferences based on knowledge of b and not a, is that wrong?

In your example, if we are grouping on a,b,c, yes, we will report the ordering +a,+b,+c here. The idea was that the more columns we are streaming on the better. In principle, knowing that we're streaming on all columns could be helpful from the execution side (you always work on one bucket at a time).

In practice, this shouldn't happen as the optimizer has rules to remove redundant grouping columns. So I see no reason why I can't change it to just pick a single grouping column if that makes more sense.

@RaduBerinde RaduBerinde force-pushed the RaduBerinde:fix-groupby-streaming-ordering branch from d23f296 to 163c633 May 22, 2019

@RaduBerinde
Copy link
Member Author

left a comment

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis and @justinj)


pkg/sql/opt/ordering/group_by.go, line 125 at r1 (raw file):

Previously, RaduBerinde wrote…

In your example, if we are grouping on a,b,c, yes, we will report the ordering +a,+b,+c here. The idea was that the more columns we are streaming on the better. In principle, knowing that we're streaming on all columns could be helpful from the execution side (you always work on one bucket at a time).

In practice, this shouldn't happen as the optimizer has rules to remove redundant grouping columns. So I see no reason why I can't change it to just pick a single grouping column if that makes more sense.

Went ahead and made that change.

sql: fix incorrect ordering inside distributed streaming aggregation
When running a streaming aggregation, `groupNode` had the list of
ordered columns, but not the exact ordering. The exact ordering is
needed when planning a distributed aggregation - this ordering must be
maintained between the local and the final stage. The distsql planner
was trying to deduce the ordering from the input, which is usually
OK but not in all cases: if another column is known to be equal with
a grouping column, that column could be used in the input ordering
instead of the grouping column.

The fix is to store the ordering in `groupNode`. We also display the
ordering in EXPLAIN output, and fix the grouping columns to also use
column names for consistency.

Fixes #37211.

Release note (bug fix): Fixed incorrect results or "incorrectly
ordered stream" error in some cases of queries with aggregations.

@RaduBerinde RaduBerinde force-pushed the RaduBerinde:fix-groupby-streaming-ordering branch from 163c633 to bbf7759 May 22, 2019

@jordanlewis
Copy link
Member

left a comment

:lgtm_strong: yay, thanks for dealing with this bug - and the huge win that is saying bye-bye to the @n explain syntax.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @jordanlewis and @justinj)

@RaduBerinde

This comment has been minimized.

Copy link
Member Author

commented May 23, 2019

bors r+

1 similar comment
@RaduBerinde

This comment has been minimized.

Copy link
Member Author

commented May 24, 2019

bors r+

craig bot pushed a commit that referenced this pull request May 24, 2019

Merge #37713
37713: sql: fix incorrect ordering inside distributed streaming aggregation r=RaduBerinde a=RaduBerinde

When running a streaming aggregation, `groupNode` had the list of
ordered columns, but not the exact ordering. The exact ordering is
needed when planning a distributed aggregation - this ordering must be
maintained between the local and the final stage. The distsql planner
was trying to deduce the ordering from the input, which is usually
OK but not in all cases: if another column is known to be equal with
a grouping column, that column could be used in the input ordering
instead of the grouping column.

The fix is to store the ordering in `groupNode`. We also display the
ordering in EXPLAIN output, and fix the grouping columns to also use
column names for consistency.

Fixes #37211.

Release note (bug fix): Fixed incorrect results or "incorrectly
ordered stream" error in some cases of queries with aggregations.

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
@craig

This comment has been minimized.

Copy link

commented May 24, 2019

Build succeeded

@craig craig bot merged commit bbf7759 into cockroachdb:master May 24, 2019

3 checks passed

GitHub CI (Cockroach) TeamCity build finished
Details
bors Build succeeded
Details
license/cla Contributor License Agreement is signed.
Details

@RaduBerinde RaduBerinde deleted the RaduBerinde:fix-groupby-streaming-ordering branch May 24, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.