sql: enable referring to columns symbolically #10729

knz · 2016-11-16T15:27:40Z

This patch introduces a new semantic concept that is useful for
pretty-printing the AST back to valid SQL and again back to AST; this
will be useful during development work on optimizations. The
motivation is detailed below.

A side effect is that it also introduces a user-visible feature.
Because it is visible to user it will need to be documented, but some
caveats apply. More details are given below too.

Motivation

During optimizations, column references (IndexedVar) can move around
and be substituted in ways that may lose track of the original name
used in the query to name the column. More specifically, we can move a
reference to a position where there is no name known for the column
yet. For example we want to go from:

SELECT * FROM (SELECT * FROM (VALUES (1), (2))) AS a(x) WHERE x > 2

to:

SELECT * FROM (SELECT * FROM ... WHERE ??? > 2) AS a(x)

Now although we can move the IndexedVar to the position in the ???
it is not so clear which name to give this indexed var if we
pretty-print the resulting tree -- in this example the name "x" does
not exist at that position.

This patch makes it possible to side-step this issue, by introducing a
new SQL syntax that refers to columns at the "current level" by ordinal
position: "@n" where N is an integer literal.

With this patch the example above can then be reliably serialized to:

SELECT * FROM (SELECT * FROM ... WHERE @1 > 2) AS a(x)

User-visible changes

SQL already has a traditional, limited way to do refer to column
numerically in some syntactic positions, specifically ORDER BY and
GROUP BY. For example, in SELECT a + b FROM foo ORDER BY 1, "1"
refers to the first value rendered, i.e. a + b.

These are called "column ordinals"; they are supported in some SQL
engines, sometimes for backward compatibility, sometimes because of
historical reasons.

The feature added in this patch complements and extends this mechanism.

However, the use of column ordinals by client applications is also
customarily strongly discouraged. The use of the new column ordinal
references added in this patch should be equally discouraged. The
reason why is that they are not robust against schema updates. Say, a
table is initially created with columns a, b, c in this order. Then
a query is designed to refer to column a by position, with number 1.
Then later, independently a DB admin changes the schema and removes
column a, and adds a new version of column a with e.g. a different
type. Now the schema is b, c, a, and all the queries that expect to
refer to a by position 1 are now broken. The new feature in this
patch is also subject to this limitation. It is intended primarily for
use during development when the schema updates are tightly controlled
by the operator manipulating the query.

Meanwhile, since the feature is visible to users it should still be
(minimally) documented. The salient aspects that should be
communicated are:

don't use this feature in client applications unless you 100%
understand the limitation described above.
the @ notation refers to a column number in the data source, not
in the rendered columns. The data source is the thing named after
FROM. For example, suppose a table foo has columns a and b
in this order. Then the query

SELECT b, a FROM foo WHERE @2 = 123

is equivalent to SELECT b, a FROM foo WHERE b = 123.
point 2 above means that there is a difference between the new
column ordinal references and the traditional SQL ordinals, which
can be illustrated as follows. With SQL ordinals, the query

SELECT b, a FROM foo ORDER BY 1

sorts with column b, because this is the first value rendered
(columns after SELECT); whereas

SELECT b, a FROM foo ORDER BY @1

sorts with column a, because this is the first column in the data
source (columns after FROM).

This feature is needed for #10632.

cc @nvanbenschoten.

This change is

knz · 2016-11-16T15:28:21Z

I just realized I forgot to add the tests. Please comment on the general direction, I'll add the tests in parallel.

RaduBerinde · 2016-11-17T13:37:08Z

Nice, I like this idea! Seems like something we should use for distsql (I'm currently using $0, $1, ..). Though I would still need to have my own visitor which does just the OrdinalReference->IndexedVar conversion.

Review status: 0 of 7 files reviewed at latest revision, all discussions resolved, some commit checks failed.

Comments from Reviewable

nvanbenschoten · 2016-11-17T20:58:33Z

LGTM so far, but as you mentioned this needs lots of testing.

Also, this is introducing new syntax into SQL, so I'd like to think through the syntax as a user-facing feature here as well. A full RFC isn't necessary, but do you mind writing up a brief description of these new Ordinal References (something like what you'd find in our docs). I get the motivation for their introduction in regard to #10632, but talking through their introduction as a user-facing feature would help justify this approach over others.

Reviewed 7 of 7 files at r1.
Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.

pkg/sql/parser/expr.go, line 548 at r1 (raw file):

func (DefaultVal) ResolvedType() Type { return nil }

var _ VariableExpr = &Placeholder{}

Looks like this assertion got separated from its declaration.

pkg/sql/parser/sql.y, line 32 at r1 (raw file):

const maxUint = ^uint(0)
const maxInt = int(^uint(0) >> 1)

s/^uint(0)/maxUint/?

Comments from Reviewable

RaduBerinde · 2016-11-17T23:40:03Z

Didn't we conclude that it makes more sense to make all node types pointers? E.g. OrdinalReference.Walk returns a new instance.

Anyway, I was thinking that it would be great if we didn't need to modify trees to convert between OrdinalReferences and IndexedVars. Couldn't we simply use IndexedVar for this? We could make an IndexedVar with nil container act as an OrdinalReference (i.e. format as @(idx+1), and disallow evaluation). We would still to populate the container and check the indices, but we wouldn't need to change the tree itself.

knz · 2016-11-18T07:34:31Z

@RaduBerinde thanks for your suggestions.
Indeed a pointer type was needed, that is fixed.

Regarding using IndexedVar I will try it and see if it fits. I like the idea!

@nvanbenschoten: I will add some more text in the commit message to explain with examples.

Review status: all files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.

pkg/sql/parser/expr.go, line 548 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Looks like this assertion got separated from its declaration.

Done.

pkg/sql/parser/sql.y, line 32 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/^uint(0)/maxUint/?

Yes that's what I meant. Thanks.

Comments from Reviewable

knz · 2016-11-20T19:27:17Z

@RaduBerinde I have simplified to use only IndexedVar as you suggested. PTAL.

@nvanbenschoten I have extended the commit message (and the header of this PR) with your suggestions. PTAL.

knz · 2016-11-20T19:27:46Z

Note that there are two commits now but the first one will disappear after #10799 gets merged.

RaduBerinde · 2016-11-20T23:17:06Z

Awesome!!

Review status: 0 of 22 files reviewed at latest revision, 2 unresolved discussions, some commit checks failed.

Comments from Reviewable

RaduBerinde · 2016-11-21T01:44:58Z

Review status: 0 of 23 files reviewed at latest revision, 4 unresolved discussions, some commit checks failed.

pkg/sql/parser/indexed_vars.go, line 93 at r3 (raw file):

}

// IndexedVarHelper is a structure that helps with initialization of IndexVars.

IndexedVars

pkg/sql/parser/indexed_vars.go, line 101 at r3 (raw file):

// BindIfUnbound attaches an IndexedVar to an existing container.
// This is needed for standalone column ordinals created during parsing.
func (h *IndexedVarHelper) BindIfUnbound(ivar *IndexedVar) error {

It feels a bit off that we need to allocate a set of IndexedVars (as part of IndexedVarHelper) just to bind a container to existing IndexedVars. I don't have a better suggestion though (given that we need to support IndexedVarUsed)..

Comments from Reviewable

nvanbenschoten · 2016-11-21T05:03:15Z

I like this approach better than adding a new Expr type.

Review status: 0 of 23 files reviewed at latest revision, 3 unresolved discussions, some commit checks failed.

pkg/sql/parser/expr.go, line 548 at r1 (raw file):

Previously, knz (kena) wrote…

> Done.

The move isn't necessary now that you removed the new expression type. I'd revert the change.

Comments from Reviewable

knz · 2016-11-21T08:09:23Z

TFYRs!

Review status: 0 of 23 files reviewed at latest revision, 3 unresolved discussions.

pkg/sql/parser/expr.go, line 548 at r1 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

> The move isn't necessary now that you removed the new expression type. I'd revert the change.

Done.

pkg/sql/parser/indexed_vars.go, line 93 at r3 (raw file):

Previously, RaduBerinde wrote…

> IndexedVars

Done.

pkg/sql/parser/indexed_vars.go, line 101 at r3 (raw file):

Previously, RaduBerinde wrote…

> It feels a bit off that we need to allocate a set of `IndexedVar`s (as part of `IndexedVarHelper`) just to bind a container to existing `IndexedVars`. I don't have a better suggestion though (given that we need to support IndexedVarUsed)..

Do not forget that the helper is also supporting the indexedvars created by name resolution (`a` -> IndexedVar), and this will be the more common case. I do not feel anything off here.

Comments from Reviewable

@n

This patch introduces a new semantic concept that is useful for pretty-printing the AST back to valid SQL and again back to AST; this will be useful during development work on optimizations. The motivation is detailed below. A side effect is that it also introduces a user-visible feature. Because it is visible to user it will need to be documented, but some caveats apply. More details are given below too. **Motivation** During optimizations, column references (IndexedVar) can move around and be substituted in ways that may lose track of the original name used in the query to name the column. More specifically, we can move a reference to a position where there is no name known for the column *yet*. For example we want to go from: `SELECT * FROM (SELECT * FROM (VALUES (1), (2))) AS a(x) WHERE x > 2` to: `SELECT * FROM (SELECT * FROM ... WHERE ??? > 2) AS a(x)` Now although we can move the IndexedVar to the position in the `???` it is not so clear which *name* to give this indexed var if we pretty-print the resulting tree -- in this example the name "x" does not exist at that position. This patch makes it possible to side-step this issue, by introducing a new SQL syntax that refers to columns at the "current level" by ordinal position: "@n" where N is an integer literal. With this patch the example above can then be reliably serialized to: `SELECT * FROM (SELECT * FROM ... WHERE @1 > 2) AS a(x)` **User-visible changes** SQL already has a traditional, limited way to do refer to column numerically in some syntactic positions, specifically ORDER BY and GROUP BY. For example, in `SELECT a + b FROM foo ORDER BY 1`, "1" refers to the first value rendered, i.e. `a + b`. These are called "column ordinals"; they are supported in *some* SQL engines, sometimes for backward compatibility, sometimes because of historical reasons. The feature added in this patch complements and extends this mechanism. **However, the use of column ordinals by client applications is also customarily strongly discouraged.** The use of the new column ordinal references added in this patch should be equally discouraged. The reason why is that they are not robust against schema updates. Say, a table is initially created with columns `a, b, c` in this order. Then a query is designed to refer to column `a` by position, with number 1. Then later, independently a DB admin changes the schema and removes column `a`, and adds a new version of column `a` with e.g. a different type. Now the schema is `b, c, a`, and all the queries that expect to refer to `a` by position 1 are now broken. The new feature in this patch is also subject to this limitation. It is intended primarily for use during development when the schema updates are tightly controlled by the operator manipulating the query. Meanwhile, since the feature is visible to users it should still be (minimally) documented. The salient aspects that should be communicated are: 1) don't use this feature in client applications unless you 100% understand the limitation described above. 2) **the @ notation refers to a column number in the data source, not in the rendered columns**. The data source is the thing named after FROM. For example, suppose a table `foo` has columns `a` and `b` in this order. Then the query `SELECT b, a FROM foo WHERE @2 = 123` is equivalent to `SELECT b, a FROM foo WHERE b = 123`. 3) point 2 above means that there is a difference between the new column ordinal references and the traditional SQL ordinals, which can be illustrated as follows. With SQL ordinals, the query `SELECT b, a FROM foo ORDER BY 1` sorts with column `b`, because this is the first value rendered (columns after SELECT); whereas `SELECT b, a FROM foo ORDER BY @1` sorts with column `a`, because this is the first column in the data source (columns after FROM).

@1

This commit disallows ordinal column references by default. Any statements using them will result in an error. The session setting `allow_ordinal_column_references` can be set to true to revert to previous behavior. Note that numeric tuple indexing (e.g., `SELECT ((1,2,3)).@2`) is still allowed. This deprecation is motivated by the inconsistent and surprising behavior that ordinal column references provide. Ordinal column references (e.g., `SELECT @1 FROM t`) are a vestigial and undocumented feature originally motivated to aid the heuristic planner. The PR that added them, cockroachdb#10729, discourages their use by users because "they are not robust against schema changes". It also points out a subtle difference between ordinal column references and SQL standard ordinals that could confuse users. As an example, the following statements are not equivalent: ```sql SELECT @2 FROM t ORDER BY @1; SELECT @2 FROM t ORDER BY 1; ``` In the current implementation, an ordinal column reference `@n` resolves to the n-th column in the current scope's slice of columns. This has several implications: 1. Ordinal columns can refer to any column, not just columns of data sources in `FROM` clauses, as was originally intended. This makes it hard to reason about the resolution of an ordinal column reference. For example, it's somewhat surprising that the result of the `SELECT` below is not `(10, 1)`. ``` CREATE TABLE t (a INT, b INT); INSERT INTO t VALUES (1, 10); SELECT @2, @1 FROM (SELECT @2, @1 FROM t); ?column? | ?column? -----------+----------- 1 | 10 (1 row) ``` 2. The ordering of columns in the scope's slice is not guaranteed to be consistent, so any reasonable ordinal column resolution occurs more-or-less by chance. As an example of unexpected behavior, consider: ``` CREATE TABLE t (a INT PRIMARY KEY, INDEX ((a+10))); INSERT INTO t(a) VALUES (1); ALTER TABLE t ADD COLUMN b INT; SELECT @1, @2 FROM t; ?column? | ?column? -----------+----------- 1 | 11 (1 row) ``` Most users would expect the result of the `SELECT` statement to be `(1, NULL)` because `@1` should resolve to `a` and `@2` should resolve to `b`. Instead, the ordinal column reference `@2` circumvents logic that keeps inaccessible columns from being referenced, and resolves to the virtual computed column backing the secondary index. Epic: None Release note (sql change): Ordinal column references (e.g., `SELECT @1, @2 FROM t`) are now deprecated. By default, statements using this syntax will now result in an error. They can be allowed using the session setting `SET allow_ordinal_column_references=true`. Support for ordinal column references is scheduled to be removed in version 23.2.

93754: sql: deprecate ordinal column references r=mgartner a=mgartner #### sql: deprecate ordinal column references This commit disallows ordinal column references by default. Any statements using them will result in an error. The session setting `allow_ordinal_column_references` can be set to true to revert to previous behavior. Note that numeric tuple indexing (e.g., `SELECT ((1,2,3)).`@2`)` is still allowed. This deprecation is motivated by the inconsistent and surprising behavior that ordinal column references provide. Ordinal column references (e.g., `SELECT `@1` FROM t`) are a vestigial and undocumented feature originally motivated to aid the heuristic planner. The PR that added them, #10729, discourages their use by users because "they are not robust against schema changes". It also points out a subtle difference between ordinal column references and SQL standard ordinals that could confuse users. As an example, the following statements are not equivalent: ```sql SELECT `@2` FROM t ORDER BY `@1;` SELECT `@2` FROM t ORDER BY 1; ``` In the current implementation, an ordinal column reference ``@n`` resolves to the n-th column in the current scope's slice of columns. This has several implications: 1. Ordinal columns can refer to any column, not just columns of data sources in `FROM` clauses, as was originally intended. This makes it hard to reason about the resolution of an ordinal column reference. For example, it's somewhat surprising that the result of the `SELECT` below is not `(10, 1)`. ``` CREATE TABLE t (a INT, b INT); INSERT INTO t VALUES (1, 10); SELECT `@2,` `@1` FROM (SELECT `@2,` `@1` FROM t); ?column? | ?column? -----------+----------- 1 | 10 (1 row) ``` 2. The ordering of columns in the scope's slice is not guaranteed to be consistent, so any reasonable ordinal column resolution occurs more-or-less by chance. As an example of unexpected behavior, consider: ``` CREATE TABLE t (a INT PRIMARY KEY, INDEX ((a+10))); INSERT INTO t(a) VALUES (1); ALTER TABLE t ADD COLUMN b INT; SELECT `@1,` `@2` FROM t; ?column? | ?column? -----------+----------- 1 | 11 (1 row) ``` Most users would expect the result of the `SELECT` statement to be `(1, NULL)` because ``@1`` should resolve to `a` and ``@2`` should resolve to `b`. Instead, the ordinal column reference ``@2`` circumvents logic that keeps inaccessible columns from being referenced, and resolves to the virtual computed column backing the secondary index. Epic: None Release note (sql change): Ordinal column references (e.g., `SELECT `@1,` `@2` FROM t`) are now deprecated. By default, statements using this syntax will now result in an error. They can be allowed using the session setting `SET allow_ordinal_column_references=true`. Support for ordinal column references is scheduled to be removed in version 23.2. Co-authored-by: Marcus Gartner <marcus@cockroachlabs.com>

knz assigned RaduBerinde Nov 16, 2016

knz mentioned this pull request Nov 16, 2016

sql: replace JOIN ... ON with equality operations to JOIN with an equalityPredicate #10631

Closed

knz force-pushed the col-ref branch from b50177f to b5e3bb2 Compare November 16, 2016 17:10

knz mentioned this pull request Nov 17, 2016

sql: fix basic ordinal references #10776

Closed

knz force-pushed the col-ref branch 4 times, most recently from b98aa48 to 1be19c9 Compare November 20, 2016 19:24

knz added the docs-todo label Nov 20, 2016

knz force-pushed the col-ref branch from 1be19c9 to 5a1b8d4 Compare November 20, 2016 21:19

knz force-pushed the col-ref branch from 5a1b8d4 to 8309d54 Compare November 20, 2016 23:36

knz force-pushed the col-ref branch 2 times, most recently from 24399fc to 0929a77 Compare November 21, 2016 08:09

knz force-pushed the col-ref branch 2 times, most recently from b147b8b to ee6d44a Compare November 21, 2016 21:36

knz force-pushed the col-ref branch from ee6d44a to 146344b Compare November 22, 2016 00:04

knz merged commit 172f1d8 into cockroachdb:master Nov 22, 2016

knz deleted the col-ref branch November 22, 2016 00:31

knz mentioned this pull request Nov 30, 2016

sql: translate predicates coming from an outer context in selectNode #11723

Closed

mgartner mentioned this pull request Dec 16, 2022

sql: deprecate ordinal column references #93754

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: enable referring to columns symbolically #10729

sql: enable referring to columns symbolically #10729

knz commented Nov 16, 2016 •

edited

knz commented Nov 16, 2016

RaduBerinde commented Nov 17, 2016

nvanbenschoten commented Nov 17, 2016

RaduBerinde commented Nov 17, 2016 •

edited

knz commented Nov 18, 2016

knz commented Nov 20, 2016

knz commented Nov 20, 2016

RaduBerinde commented Nov 20, 2016

RaduBerinde commented Nov 21, 2016

nvanbenschoten commented Nov 21, 2016

knz commented Nov 21, 2016

sql: enable referring to columns symbolically #10729

sql: enable referring to columns symbolically #10729

Conversation

knz commented Nov 16, 2016 • edited

knz commented Nov 16, 2016

RaduBerinde commented Nov 17, 2016

nvanbenschoten commented Nov 17, 2016

RaduBerinde commented Nov 17, 2016 • edited

knz commented Nov 18, 2016

knz commented Nov 20, 2016

knz commented Nov 20, 2016

RaduBerinde commented Nov 20, 2016

RaduBerinde commented Nov 21, 2016

nvanbenschoten commented Nov 21, 2016

knz commented Nov 21, 2016

knz commented Nov 16, 2016 •

edited

RaduBerinde commented Nov 17, 2016 •

edited