-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interpolate feature #35349
Interpolate feature #35349
Conversation
this is a preliminary version, some polishing is required
|
9439f29
to
a8e1671
Compare
INTERPOLATE is added
INTERPOLATE is added
INTERPOLATE is added
INTERPOLATE is added
INTERPOLATE fix
INTERPOLATE fix
ce9a6e3
to
5ae6f80
Compare
use range-based for loop
bugfix: check column existence for INTERPOLATE expression target
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally ok
NOTE: we have decided to refactor filling transform itself to simplify everything and possibly optimize |
… are selected or not, some optimization on expressionless INTERPOLATE
wrong merge fix
@@ -2499,7 +2557,9 @@ void InterpreterSelectQuery::executeWithFill(QueryPlan & query_plan) | |||
if (fill_descr.empty()) | |||
return; | |||
|
|||
auto filling_step = std::make_unique<FillingStep>(query_plan.getCurrentDataStream(), std::move(fill_descr)); | |||
InterpolateDescriptionPtr interpolate_descr = | |||
getInterpolateDescription(query, source_header, result_header, syntax_analyzer_result->aliases, context); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use a source_header
here? I think it should be query_plan.getCurrentDataStream().header
instead.
SELECT
n,
any(source),
sum(inter) AS inter_s
FROM
(
SELECT
toFloat32(number % 10) AS n,
'original' AS source,
number AS inter
FROM numbers(10)
WHERE (number % 3) = 1
)
GROUP BY n
ORDER BY n ASC WITH FILL FROM 0 TO 11.51 STEP 0.5
INTERPOLATE ( inter_s AS inter_s + 1 )
Query id: c4a37d26-ec08-42dc-9e0b-eee9a4419eb3
0 rows in set. Elapsed: 0.002 sec.
Received exception from server (version 22.4.1):
Code: 47. DB::Exception: Received from localhost:9000. DB::Exception: Missing columns: 'inter_s' while processing query: 'SELECT n, any(source), sum(inter) AS inter_s FROM (SELECT toFloat32(number % 10) AS n, 'original' AS source, number AS inter FROM numbers(10) WHERE (number % 3) = 1) GROUP BY n ORDER BY n ASC WITH FILL FROM 0 TO 11.51 STEP 0.5 INTERPOLATE ( inter_s AS inter_s + 1 )', required columns: 'n' 'inter_s' 'source' 'inter' 'n' 'inter_s' 'source' 'inter'. (UNKNOWN_IDENTIFIER)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, why spaces inside interpolate expression? Let's fix formatting
INTERPOLATE ( inter_s AS inter_s + 1 )
-> INTERPOLATE (inter_s AS inter_s + 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually I need columns which participate in interpolate expressions - they can be from source_header and result_header, but getCurrentDataStream().header seems the same as result_header
{ | ||
if (auto * ident = fn->as<ASTIdentifier>()) | ||
{ | ||
step.addRequiredOutput(ident->getColumnName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like here we add alias to required output instead of full column name.
At list this does not work:
SELECT
n,
source,
inter + 1 AS inter_p
FROM
(
SELECT
toFloat32(number % 10) AS n,
'original' AS source,
number AS inter
FROM numbers(10)
WHERE (number % 3) = 1
)
ORDER BY n ASC WITH FILL FROM 0 TO 11.51 STEP 0.5
INTERPOLATE ( inter_p AS inter_p + 1 )
Query id: 9c5b264e-0dca-406e-a2c3-7d9dd43f6609
0 rows in set. Elapsed: 0.002 sec.
Received exception from server (version 22.4.1):
Code: 47. DB::Exception: Received from localhost:9000. DB::Exception: Missing columns: 'inter_p' while processing query: 'SELECT n, source, inter + 1 AS inter_p FROM (SELECT toFloat32(number % 10) AS n, 'original' AS source, number AS inter FROM numbers(10) WHERE (number % 3) = 1) ORDER BY n ASC WITH FILL FROM 0 TO 11.51 STEP 0.5 INTERPOLATE ( inter_p AS inter_p + 1 )', required columns: 'n' 'inter_p' 'source' 'inter' 'n' 'inter_p' 'source' 'inter'. (UNKNOWN_IDENTIFIER)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can build actions for interpolate
here as well?
Now I think that maybe it would be easier to substitute aliases for interpolate.
Like, fix QueryNormalizer a little bit (add ASTInterpolateElement)
ClickHouse/src/Interpreters/QueryNormalizer.cpp
Lines 251 to 262 in 5a1392a
if (auto * node_id = ast->as<ASTIdentifier>()) | |
visit(*node_id, ast, data); | |
else if (auto * node_tables = ast->as<ASTTablesInSelectQueryElement>()) | |
visit(*node_tables, ast, data); | |
else if (auto * node_select = ast->as<ASTSelectQuery>()) | |
visit(*node_select, ast, data); | |
else if (auto * node_param = ast->as<ASTQueryParameter>()) | |
throw Exception("Query parameter " + backQuote(node_param->name) + " was not set", ErrorCodes::UNKNOWN_QUERY_PARAMETER); | |
else if (auto * node_function = ast->as<ASTFunction>()) | |
if (node_function->parameters) | |
visit(node_function->parameters, data); | |
so that in AST
inter_p AS inter_p + 1
would be rewritten into inter_p AS (inter + 1) + 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's rather I shouldn't add this column as required because it's not a source column...
src/Parsers/ASTInterpolateElement.h
Outdated
ASTPtr clone() const override | ||
{ | ||
auto clone = std::make_shared<ASTInterpolateElement>(*this); | ||
clone->cloneChildren(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, I think you need to fix ptr for clone->expr
; otherwise it still points to old ast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe like here
ClickHouse/src/Parsers/ASTColumnDeclaration.cpp
Lines 10 to 48 in 5a1392a
ASTPtr ASTColumnDeclaration::clone() const | |
{ | |
const auto res = std::make_shared<ASTColumnDeclaration>(*this); | |
res->children.clear(); | |
if (type) | |
{ | |
// Type may be an ASTFunction (e.g. `create table t (a Decimal(9,0))`), | |
// so we have to clone it properly as well. | |
res->type = type->clone(); | |
res->children.push_back(res->type); | |
} | |
if (default_expression) | |
{ | |
res->default_expression = default_expression->clone(); | |
res->children.push_back(res->default_expression); | |
} | |
if (comment) | |
{ | |
res->comment = comment->clone(); | |
res->children.push_back(res->comment); | |
} | |
if (codec) | |
{ | |
res->codec = codec->clone(); | |
res->children.push_back(res->codec); | |
} | |
if (ttl) | |
{ | |
res->ttl = ttl->clone(); | |
res->children.push_back(res->ttl); | |
} | |
return res; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did something, but probably still wrong - will revisit tomorrow...
|
||
auto elem = std::make_shared<ASTInterpolateElement>(); | ||
elem->column = ident->getColumnName(); | ||
elem->expr = expr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You also need to add expr
to children. Right now, when I do EXPLAIN AST
, it does not show interpolate child:
EXPLAIN AST
SELECT
n,
source,
inter + 1 AS inter_p
FROM
(
SELECT
toFloat32(number % 10) AS n,
'original' AS source,
number AS inter
FROM numbers(10)
WHERE (number % 3) = 1
)
ORDER BY n ASC WITH FILL FROM 0 TO 11.51 STEP 0.5
INTERPOLATE ( inter_p AS inter_p + 1 )
Query id: a0864985-0068-40ae-91d1-64e528442940
┌─explain──────────────────────────────────────────────┐
│ SelectWithUnionQuery (children 1) │
...
│ ExpressionList (children 1) │
│ InterpolateElement │
└──────────────────────────────────────────────────────┘
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
found by fuzzer
|
Another one with logical error
|
…ASTInterpolateElement::children
conversion for nullable column is not fixed yet... |
@@ -134,7 +135,7 @@ void QueryNormalizer::visit(ASTTablesInSelectQueryElement & node, const ASTPtr & | |||
|
|||
static bool needVisitChild(const ASTPtr & child) | |||
{ | |||
return !(child->as<ASTSelectQuery>() || child->as<ASTTableExpression>()); | |||
return !(child->as<ASTSelectQuery>() || child->as<ASTTableExpression>() || child->as<ASTInterpolateElement>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a comment why do we skip interpolate.
auto find_columns = [&step, &select](IAST * function) | ||
{ | ||
auto f_impl = [&step, &select](IAST * fn, auto fi) | ||
{ | ||
if (auto * ident = fn->as<ASTIdentifier>()) | ||
{ | ||
if (select.count(ident->getColumnName()) == 0) | ||
step.addRequiredOutput(ident->getColumnName()); | ||
return; | ||
} | ||
if (fn->as<ASTFunction>() || fn->as<ASTExpressionList>()) | ||
for (const auto & ch : fn->children) | ||
fi(ch.get(), fi); | ||
return; | ||
}; | ||
f_impl(function, f_impl); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can add a small comment here as well
+1 for adding PARTITION BY |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
added INTERPOLATE extension to the ORDER BY ... WITH FILL
closes #34903