Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: fix the handling of sorting, grouping and column ordinals. #10799

Merged
merged 1 commit into from
Nov 22, 2016

Conversation

knz
Copy link
Contributor

@knz knz commented Nov 19, 2016

Prior to this patch there was some confusion in the code about how to
interpret the arguments to ORDER BY and GROUP BY clauses.

This patch restores sane behavior:

  • for sorting:

    1. column ordinals. If a simple integer literal is used, optionally
      enclosed within parentheses but not subject to any arithmetic,
      then this refers to one of the columns of the data source. Then
      use the render expression at that ordinal position as sort key.

    2. non-integer constant. If ORDER BY is followed by a constant that
      is not a column ordinal, an error is reported.

    3. if the expression is the aliased (AS) name of a render expression
      in a SELECT clause, then use that render as sort key.
      e.g. SELECT a AS b, b AS c ORDER BY b this sorts on the first
      render (column a).

    4. otherwise, if the expression is already rendered, then use that
      render as sort key. e.g. SELECT b AS c ORDER BY b sorts on the
      first render. (this is a new optimization)

    5. if the sort key is not dependent on the data source (no IndexedVar,
      no stars) then simply do not sort. e.g. SELECT * FROM t ORDER BY 3+3
      simplifies to SELECT * FROM t.
      (this is a new optimization)

    6. otherwise, add a new render with the ORDER BY expression
      and use that as sort key.
      e.g. SELECT a FROM t ORDER by b
      e.g. SELECT a, b FROM t ORDER by a+b
      But also: SELECT a FROM t ORDER BY t.*

  • for grouping:

    1. column ordinals are detected first thing, like for ORDER BY above.

    2. constant expressions are rejected, like for ORDER BY above.

Fixes #10776.

cc @paperstreet @RaduBerinde.


This change is Reviewable

@nvanbenschoten
Copy link
Member

Reviewed 18 of 18 files at r1.
Review status: all files reviewed at latest revision, 5 unresolved discussions, some commit checks failed.


pkg/sql/sort.go, line 95 at r1 (raw file):

The logical data source for ORDER BY is the list of render expressions for a SELECT

Doesn't this sentence contradict cases where renders are added when needed by the ORDER BY but not previously present in the render list?


pkg/sql/sort.go, line 96 at r1 (raw file):

		// The logical data source for ORDER BY is the list of render
		// expressions for a SELECT (or an entire UNION or VALUES clause).
		// Alas, SQL has some historical baggage and there is some

s/is some special case/are some special cases/


pkg/sql/sort.go, line 111 at r1 (raw file):

		//    this sorts on the first render.
		//
		// 3) otherwise, if the expression is already rendered,

What do you mean by "already rendered"? Already added to the render list?


pkg/sql/sort.go, line 218 at r1 (raw file):

				// If more than 1 column were expanded, turn them into sort columns too.
				// Except the last one, which will be added below.
				ordering = append(ordering, sqlbase.ColumnOrderInfo{ColIdx: extraIdx, Direction: direction})

long line


pkg/sql/testdata/order_by, line 204 at r1 (raw file):


query error non-integer constant in ORDER BY: 1 \+ 2
SELECT * FROM t ORDER BY 1+2

Isn't this a valid query? Do we want to be breaking compatibility here?

Example:

postgres=# select * from a order by 2.5+1;
 x
---
 1
 3
 4
(3 rows)

postgres=# select * from a order by 2.5;
ERROR:  non-integer constant in ORDER BY
LINE 1: select * from a order by 2.5;
                                 ^

Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented Nov 21, 2016

Review status: 13 of 18 files reviewed at latest revision, 5 unresolved discussions.


pkg/sql/sort.go, line 95 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote… > > The logical data source for ORDER BY is the list of render expressions for a SELECT > > Doesn't this sentence contradict cases where renders are added when needed by the ORDER BY but not previously present in the render list?
Clarified in comment: the ordinals refer to renders as specified in the SQL text, excluding those added later programmatically.

pkg/sql/sort.go, line 96 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote… > `s/is some special case/are some special cases/`
Done.

pkg/sql/sort.go, line 111 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote… > What do you mean by "already rendered"? Already added to the render list?
Yes this is what I mean, isn't it equivalent?

pkg/sql/sort.go, line 218 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote… > long line
My editor says it's ok... But oh well.

pkg/sql/testdata/order_by, line 204 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote… > Isn't this a valid query? Do we want to be breaking compatibility here? > > Example: > ``` > postgres=# select * from a order by 2.5+1; > x > --- > 1 > 3 > 4 > (3 rows) > > postgres=# select * from a order by 2.5; > ERROR: non-integer constant in ORDER BY > LINE 1: select * from a order by 2.5; > ^ > ```
Yes you are right, I was misreading the pg sources. Fixed.

Comments from Reviewable

@RaduBerinde
Copy link
Member

Review status: 12 of 18 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


pkg/sql/group.go, line 56 at r1 (raw file):

		expr := parser.StripParens(rawExpr)

		col, err := p.colIndex(s.numOriginalCols, expr, "GROUP BY")

maybe add a short comment, "Check if the group by expression refers a render target by index"


pkg/sql/group.go, line 62 at r1 (raw file):

		if col != -1 {
			groupBy[i] = s.render[col]

NYC but as far as I can tell we then add these expressions again using addRender below.. is that normal? or is there something to optimize here


Comments from Reviewable

@knz
Copy link
Contributor Author

knz commented Nov 21, 2016

Review status: 12 of 18 files reviewed at latest revision, 7 unresolved discussions, some commit checks failed.


pkg/sql/group.go, line 56 at r1 (raw file):

Previously, RaduBerinde wrote… > maybe add a short comment, "Check if the group by expression refers a render target by index"
Done.

pkg/sql/group.go, line 62 at r1 (raw file):

Previously, RaduBerinde wrote… > NYC but as far as I can tell we then add these expressions again using `addRender` below.. is that normal? or is there something to optimize here
A lot of the remainder of this code assumes there is a 1-1 mapping between the positions in the groupBy array and the additional renders added to the select; also the groupBy array is used to decide which columns will actually cause aggregation. Optimizing this code to reuse the existing user-defined renders would require some significant refactoring, and I do not feel comfortable doing so myself. Filed as #10919.

Comments from Reviewable

Prior to this patch there was some confusion in the code about how to
interpret the arguments to ORDER BY and GROUP BY clauses.

This patch restores sane behavior:

- for sorting:

  1) column ordinals. If a simple integer literal is used, optionally
     enclosed within parentheses but *not subject to any arithmetic*,
     then this refers to one of the columns of the data source. Then
     use the render expression at that ordinal position as sort key.

  2) non-integer constant. If ORDER BY is followed by a constant that
     is not a column ordinal, an error is reported.

  3) if the expression is the aliased (AS) name of a render expression
     in a SELECT clause, then use that render as sort key.
     e.g. `SELECT a AS b, b AS c ORDER BY b` this sorts on the first
     render (column `a`).

  4) otherwise, if the expression is already rendered, then use that
     render as sort key.  e.g. `SELECT b AS c ORDER BY b` sorts on the
     first render.  (this is a new optimization)

  5) if the sort key is not dependent on the data source (no IndexedVar,
     no stars) then simply do not sort. e.g. `SELECT * FROM t ORDER BY lower('A')`
     simplifies to `SELECT * FROM t`.
     (this is a new optimization)

  6) otherwise, add a new render with the ORDER BY expression
     and use that as sort key.
     e.g. `SELECT a FROM t ORDER by b`
     e.g. `SELECT a, b FROM t ORDER by a+b`
     But also: `SELECT a FROM t ORDER BY t.*`

- for grouping:

  1) column ordinals are detected first thing, like for ORDER BY above.

  2) constant expressions are rejected, like for ORDER BY above.
@nvanbenschoten
Copy link
Member

:lgtm:


Reviewed 7 of 8 files at r2.
Review status: 16 of 17 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.


Comments from Reviewable

@knz knz merged commit a83c960 into cockroachdb:master Nov 22, 2016
@knz knz deleted the fix-ord branch November 22, 2016 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

sql: fix basic ordinal references
3 participants