Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: enable referring to columns symbolically #10729

Merged
merged 1 commit into from
Nov 22, 2016
Merged

Commits on Nov 22, 2016

  1. sql: enable referring to columns symbolically

    This patch introduces a new semantic concept that is useful for
    pretty-printing the AST back to valid SQL and again back to AST; this
    will be useful during development work on optimizations. The
    motivation is detailed below.
    
    A side effect is that it also introduces a user-visible feature.
    Because it is visible to user it will need to be documented, but some
    caveats apply. More details are given below too.
    
    **Motivation**
    
    During optimizations, column references (IndexedVar) can move around
    and be substituted in ways that may lose track of the original name
    used in the query to name the column. More specifically, we can move a
    reference to a position where there is no name known for the column
    *yet*. For example we want to go from:
    
      `SELECT * FROM (SELECT * FROM (VALUES (1), (2))) AS a(x) WHERE x > 2`
    
    to:
    
      `SELECT * FROM (SELECT * FROM ... WHERE ??? > 2) AS a(x)`
    
    Now although we can move the IndexedVar to the position in the `???`
    it is not so clear which *name* to give this indexed var if we
    pretty-print the resulting tree -- in this example the name "x" does
    not exist at that position.
    
    This patch makes it possible to side-step this issue, by introducing a
    new SQL syntax that refers to columns at the "current level" by ordinal
    position: "@n" where N is an integer literal.
    
    With this patch the example above can then be reliably serialized to:
    
      `SELECT * FROM (SELECT * FROM ... WHERE @1 > 2) AS a(x)`
    
    **User-visible changes**
    
    SQL already has a traditional, limited way to do refer to column
    numerically in some syntactic positions, specifically ORDER BY and
    GROUP BY.  For example, in `SELECT a + b FROM foo ORDER BY 1`, "1"
    refers to the first value rendered, i.e. `a + b`.
    
    These are called "column ordinals"; they are supported in *some* SQL
    engines, sometimes for backward compatibility, sometimes because of
    historical reasons.
    
    The feature added in this patch complements and extends this mechanism.
    
    **However, the use of column ordinals by client applications is also
    customarily strongly discouraged.** The use of the new column ordinal
    references added in this patch should be equally discouraged. The
    reason why is that they are not robust against schema updates. Say, a
    table is initially created with columns `a, b, c` in this order. Then
    a query is designed to refer to column `a` by position, with number
    1. Then later, independently a DB admin changes the schema and removes
    column `a`, and adds a new version of column `a` with e.g. a different
    type. Now the schema is `b, c, a`, and all the queries that expect to
    refer to `a` by position 1 are now broken. The new feature in this
    patch is also subject to this limitation. It is intended primarily for
    use during development when the schema updates are tightly controlled
    by the operator manipulating the query.
    
    Meanwhile, since the feature is visible to users it should still be
    (minimally) documented. The salient aspects that should be
    communicated are:
    
    1) don't use this feature in client applications unless you 100%
       understand the limitation described above.
    
    2) **the @ notation refers to a column number in the data source, not
       in the rendered columns**. The data source is the thing named after
       FROM.  For example, suppose a table `foo` has columns `a` and `b`
       in this order. Then the query
    
         `SELECT b, a FROM foo WHERE @2 = 123`
    
       is equivalent to `SELECT b, a FROM foo WHERE b = 123`.
    
    3) point 2 above means that there is a difference between the new
       column ordinal references and the traditional SQL ordinals, which
       can be illustrated as follows. With SQL ordinals, the query
    
         `SELECT b, a FROM foo ORDER BY 1`
    
       sorts with column `b`, because this is the first value rendered
       (columns after SELECT); whereas
    
         `SELECT b, a FROM foo ORDER BY @1`
    
       sorts with column `a`, because this is the first column in the data
       source (columns after FROM).
    knz committed Nov 22, 2016
    Configuration menu
    Copy the full SHA
    146344b View commit details
    Browse the repository at this point in the history