New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe how identifiers in SELECT queries are resolved #23194
Comments
Should also consider #6571 |
Do we want to support UTF identifiers? #25594 |
@CurtizJ I have no idea why they are not already supported. |
See also #18185. |
See also #26030. |
Notes about current Let's consider two tables:
So,
As for return type, it should be the same as in original table except case with |
psql -- OK with d as (select 'key'::Varchar(255) c, 'x'::Varchar(255) s)
SELECT r1, c as r2
FROM (
SELECT t as s, c as r1
FROM ( SELECT 'y'::Varchar(255) as t, 'x'::Varchar(255) as s) t1
LEFT JOIN d USING (s)
) t2
LEFT JOIN d using (s)
r1 | r2
-----+----
key | CH -- not OK with d as (select 'key'::Varchar(255) c, 'x'::Varchar(255) s)
SELECT r1, c as r2
FROM (
SELECT t as s, c as r1
FROM ( SELECT 'y'::Varchar(255) as t, 'x'::Varchar(255) as s) t1
LEFT JOIN d USING (s)
) t2
LEFT JOIN d using (s);
┌─r1─┬─r2─┐
│ │ │
└────┴────┘
|
Query analysis in ClickHouse is more complicated than it is in standard SQL due to the following extensions:
These extensions are very convenient for writing SQL queries we don't regret of them :)
But it is not convenient from implementation standpoint - many things are in different places in code and are underspecified. It makes further development difficult.
Let's try to describe how identifiers in SELECT queries are resolved.
Then we will have a chance to implement it in more consistent, easy to maintain fashion.
I will try to describe how it works right now and how it is supposed to work after upcoming extensions.
The text below is only for ClickHouse experts.
FROM system.one
is present.This query also works:
But we may remove the support for asterisk expansion in case if section FROM is omitted.
This query works in current version of ClickHouse, but we are not proud of this fact:
In this example,
x
is resolved to a column in tablet
, andtable
,db.table
andt
are resolved to tables.In this example,
nest.key.subkey
is subcolumn of complex columnnest
. Ambiguity is possible. In case of ambiguity, exception should be thrown.On the left is a table and on the right is a column of Array type.
A column to array join can be specified by identifier (
arr
in the example above) or as immediate array (literal):If it is specified as literal, alias is required. A question: we may not require alias.
A column to array join can be resolved to complex column (nested data structure) as well:
Nested data type is a syntactic sugar for
Array(Tuple(...))
:If alias is specified for the right hand side of array join:
ARRAY JOIN arr AS joined
, thejoined
alias will reference to array elements and the original namearr
will reference to the original value.If alias is not specified for the right hand side of array join:
ARRAY JOIN arr
, the original namearr
(can be qualified or not) will reference to the array elements and the original arrays cannot be referenced.Unless they are specified in WITH clause and
enable_global_with_statement
is set.We want to change it in the following way:
A question: should we allow to refer to aliases from subqueries by the names qualified by subquery name?
Nothing is "exported" from scalar subqueries or IN subqueries, subquery expressions:
prefer_column_name_to_alias
.This query will return either
2, 1
or2, 2
depending on the value ofprefer_column_name_to_alias
.The order of alias usage and alias definition does not matter (this may be a subject to change).
Here
x
is a formal parameter.Parameters to lambda functions are scoped. Their names can be ambiguous to other identifiers. In case of ambiguity, parameter of lambda function is preferred. The nearest to the current scope parameter is preferred.
This query disambiguated as follows:
In expression
x -> expr(x)
the scope ofx
isexpr(x)
. So, the expressionarrayMap(x -> x, x)
should be read asarrayMap(x1 -> x1, x2)
.If alias referes to an expression that is using a name that is unavailable in the place where alias is substituted, exception should be thrown.
The last
y
refers to an expression that depends onx
that is formal parameter of lambda function and is not available out of its scope.This will work as alias
z
is not scoped.This should throw exception as aliases
y
refers to different expressions.The scope of formal parameter
x
is not related to any aliases that are usingx
.In right hand side of ARRAY JOIN it is resolved to a column.
In other parts of FROM it is resolved to a table.
In all other places it is resolved to a column. This includes right hand side of IN operator, dictGet, joinGet.
This is disambiguated as follows:
Note: we can make more cases when we simply throw exception in case of ambiguity.
The previous rule can be theoretically applied to subcolumns of complex columns as well:
In this example,
db.tbl.col
can be resolved to the name of subcolumn of the Nested column.But it will be much better to simply throw exception in all cases of ambiguity with subcolumns.
asterisk_include_materialized_columns
,asterisk_include_alias_columns
.In case of ambiguous column names in tables, asterisk is expanded to an identifier qualified by table name (and if table name is ambiguous, also by database name). In case of no ambiguity, unqualified name is used.
A question. Should we allow asterisk qualified by database only?
For compatibility reasons, asterisk expansion also expands all complex columns recursively, so they are flattened into multiple columns. This maybe a subject for adjustment by settings.
Column transformers can be chained:
COLUMNS
expression or qualifiedCOLUMNS
expression can be used in place of asterisk or qualified asterisk. It is processed similarly but with the specified filter.The filter in form of regular expression is applied to unqualified names from the corresponding tables.
A question. We can leave this automatic names only for outermost query and simply not allow to refer unnamed expressions from subqueries.
This is needed to allow SQL UDFs.
Aggregate functions can be found as well:
In case of ambiguity, function is less preferred than alias or column name or table name.
These columns can be referred as usual columns, but they are expanded during query analysis (for better possibilities of optimization).
If an expression of ALIAS column defines other aliases, these aliases are only visible inside this expression.
ALIAS columns may refer other ALIAS columns in the same table. The order of ALIAS columns does not matter for that.
The text was updated successfully, but these errors were encountered: