This patch introduces a new semantic concept that is useful for
pretty-printing the AST back to valid SQL and again back to AST; this
will be useful during development work on optimizations. The
motivation is detailed below.
A side effect is that it also introduces a user-visible feature.
Because it is visible to user it will need to be documented, but some
caveats apply. More details are given below too.
**Motivation**
During optimizations, column references (IndexedVar) can move around
and be substituted in ways that may lose track of the original name
used in the query to name the column. More specifically, we can move a
reference to a position where there is no name known for the column
*yet*. For example we want to go from:
`SELECT * FROM (SELECT * FROM (VALUES (1), (2))) AS a(x) WHERE x > 2`
to:
`SELECT * FROM (SELECT * FROM ... WHERE ??? > 2) AS a(x)`
Now although we can move the IndexedVar to the position in the `???`
it is not so clear which *name* to give this indexed var if we
pretty-print the resulting tree -- in this example the name "x" does
not exist at that position.
This patch makes it possible to side-step this issue, by introducing a
new SQL syntax that refers to columns at the "current level" by ordinal
position: "@n" where N is an integer literal.
With this patch the example above can then be reliably serialized to:
`SELECT * FROM (SELECT * FROM ... WHERE @1 > 2) AS a(x)`
**User-visible changes**
SQL already has a traditional, limited way to do refer to column
numerically in some syntactic positions, specifically ORDER BY and
GROUP BY. For example, in `SELECT a + b FROM foo ORDER BY 1`, "1"
refers to the first value rendered, i.e. `a + b`.
These are called "column ordinals"; they are supported in *some* SQL
engines, sometimes for backward compatibility, sometimes because of
historical reasons.
The feature added in this patch complements and extends this mechanism.
**However, the use of column ordinals by client applications is also
customarily strongly discouraged.** The use of the new column ordinal
references added in this patch should be equally discouraged. The
reason why is that they are not robust against schema updates. Say, a
table is initially created with columns `a, b, c` in this order. Then
a query is designed to refer to column `a` by position, with number
1. Then later, independently a DB admin changes the schema and removes
column `a`, and adds a new version of column `a` with e.g. a different
type. Now the schema is `b, c, a`, and all the queries that expect to
refer to `a` by position 1 are now broken. The new feature in this
patch is also subject to this limitation. It is intended primarily for
use during development when the schema updates are tightly controlled
by the operator manipulating the query.
Meanwhile, since the feature is visible to users it should still be
(minimally) documented. The salient aspects that should be
communicated are:
1) don't use this feature in client applications unless you 100%
understand the limitation described above.
2) **the @ notation refers to a column number in the data source, not
in the rendered columns**. The data source is the thing named after
FROM. For example, suppose a table `foo` has columns `a` and `b`
in this order. Then the query
`SELECT b, a FROM foo WHERE @2 = 123`
is equivalent to `SELECT b, a FROM foo WHERE b = 123`.
3) point 2 above means that there is a difference between the new
column ordinal references and the traditional SQL ordinals, which
can be illustrated as follows. With SQL ordinals, the query
`SELECT b, a FROM foo ORDER BY 1`
sorts with column `b`, because this is the first value rendered
(columns after SELECT); whereas
`SELECT b, a FROM foo ORDER BY @1`
sorts with column `a`, because this is the first column in the data
source (columns after FROM).