-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support optimize where
clause with sorting key expression move to prewhere
for query with final
#38950
Conversation
/// Do not take into consideration the conditions consisting only of the first primary key column | ||
&& !hasPrimaryKeyAtoms(node) | ||
/// Do not take into consideration the conditions consisting of column that not belong to the primary key columns | ||
&& isColumnAllPrimaryKey(node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this change is unclear to me.
Previously, we did not allow to move condition to PREWHERE in case this condition was an expression over first PK column. Like, if we had PK (a, b, c)
then we could not move condition where a = 42
, but could move where b = 42
and where another_column = 42
. This logic is understandable : most of the data would be filtered by PK if condition contains the first PK column (and this is mostly not true for other PK columns). Actually, I thought that maybe we can remove this check - because even for the condition over first PK column we will likely read 1-2 granules per part.
Now, we allow to move condition only if it is expression over PK key. Maybe we should check for FINAL as well? Something like && (!is_final || isColumnAllPrimaryKey(node))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, why here we use PK, not sorting key? Final requires sorting key to be processed, not just PK (which is a prefix of sorting key). PK is a set of columns for which we write an index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
bool hasPrimaryKeyAtoms(const ASTPtr & ast) const; | ||
|
||
bool isPrimaryKeyAtom(const ASTPtr & ast) const; | ||
bool isColumnAllPrimaryKey(const ASTPtr & ast) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a very good name. Maybe isExpressionOverSortingKey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is better
|| (first_primary_key_column == first_arg_name && functionIsInOrGlobalInOperator(func->name))) | ||
return true; | ||
for (const auto & arg : args) | ||
if (!isColumnAllPrimaryKey(arg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PK itself can contain an expression.
I think you should check pk_names.contains(arg->getColumnName())
first, otherwise the case
order by (toHour(event_time));
select ... where toHour(event_time) = toHour(now());
will not work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
where
clause with sorting key expression move to prewhere
for query with final
Now it looks fine. |
@KochetovNicolai
I'm not sure the new result of query |
@KochetovNicolai latest review was about one month ago. |
@@ -193,8 +193,8 @@ void MergeTreeWhereOptimizer::analyzeImpl(Conditions & res, const ASTPtr & node, | |||
/// Condition depend on some column. Constant expressions are not moved. | |||
!cond.identifiers.empty() | |||
&& !cannotBeMoved(node, is_final) | |||
/// Do not take into consideration the conditions consisting only of the first primary key column | |||
&& !hasPrimaryKeyAtoms(node) | |||
/// when use final, do not take into consideration the conditions consisting of column that not belong to the sorting key columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't understand this: why we will not consider non-sorting columns when using final? Any correctness issues? @KochetovNicolai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is a correctness issue.
For queries with FINAL we may want to merge some rows with the same sorting key values. So, we cannot filter (except if the condition is fully on top of sorting key) before FINAL happens (=cannot move the condition to prewhere)
test failure unrelated |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
close issue: #38893