Support optimize `where` clause with sorting key expression move to `prewhere` for query with `final` #38950

hexiaoting · 2022-07-07T09:42:22Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

close issue: #38893

KochetovNicolai · 2022-07-07T10:30:24Z

src/Storages/MergeTree/MergeTreeWhereOptimizer.cpp

-            /// Do not take into consideration the conditions consisting only of the first primary key column
-            && !hasPrimaryKeyAtoms(node)
+            /// Do not take into consideration the conditions consisting of column that not belong to the primary key columns
+            && isColumnAllPrimaryKey(node)


Hm, this change is unclear to me.
Previously, we did not allow to move condition to PREWHERE in case this condition was an expression over first PK column. Like, if we had PK (a, b, c) then we could not move condition where a = 42, but could move where b = 42 and where another_column = 42. This logic is understandable : most of the data would be filtered by PK if condition contains the first PK column (and this is mostly not true for other PK columns). Actually, I thought that maybe we can remove this check - because even for the condition over first PK column we will likely read 1-2 granules per part.

Now, we allow to move condition only if it is expression over PK key. Maybe we should check for FINAL as well? Something like && (!is_final || isColumnAllPrimaryKey(node))

btw, why here we use PK, not sorting key? Final requires sorting key to be processed, not just PK (which is a prefix of sorting key). PK is a set of columns for which we write an index.

KochetovNicolai · 2022-07-07T10:33:33Z

src/Storages/MergeTree/MergeTreeWhereOptimizer.h

-    bool hasPrimaryKeyAtoms(const ASTPtr & ast) const;
-
-    bool isPrimaryKeyAtom(const ASTPtr & ast) const;
+    bool isColumnAllPrimaryKey(const ASTPtr & ast) const;


Not a very good name. Maybe isExpressionOverSortingKey

Yeah, this is better

KochetovNicolai · 2022-07-07T10:40:49Z

src/Storages/MergeTree/MergeTreeWhereOptimizer.cpp

-            || (first_primary_key_column == first_arg_name && functionIsInOrGlobalInOperator(func->name)))
-            return true;
+        for (const auto & arg : args)
+            if (!isColumnAllPrimaryKey(arg))


PK itself can contain an expression.
I think you should check pk_names.contains(arg->getColumnName()) first, otherwise the case

order by (toHour(event_time)); select ... where toHour(event_time) = toHour(now());

will not work

KochetovNicolai · 2022-07-08T09:23:31Z

Now it looks fine.
Tests are broken mostly because more conditions were moved to prewhere

hexiaoting · 2022-07-11T02:09:29Z

@KochetovNicolai
The error message in CI:

2022-07-08 13:51:03 --- /ClickHouse/tests/queries/0_stateless/01947_mv_subquery.reference	2022-07-08 13:30:17.834988543 +0300
2022-07-08 13:51:03 +++ /ClickHouse/tests/queries/0_stateless/01947_mv_subquery.stdout	2022-07-08 13:51:03.066948009 +0300
2022-07-08 13:51:03 @@ -1,6 +1,6 @@
2022-07-08 13:51:03  {"test":"1947 #1 CHECK - TRUE","sleep_calls":"0","sleep_microseconds":"0"}
2022-07-08 13:51:03  {"test":"1947 #2 CHECK - TRUE","sleep_calls":"2","sleep_microseconds":"2000"}
2022-07-08 13:51:03 -{"test":"1947 #3 CHECK - TRUE","sleep_calls":"0","sleep_microseconds":"0"}
2022-07-08 13:51:03 -{"test":"1947 #1 CHECK - FALSE","sleep_calls":"0","sleep_microseconds":"0"}
2022-07-08 13:51:03 -{"test":"1947 #2 CHECK - FALSE","sleep_calls":"2","sleep_microseconds":"2000"}
2022-07-08 13:51:03 -{"test":"1947 #3 CHECK - FALSE","sleep_calls":"0","sleep_microseconds":"0"}
2022-07-08 13:51:03 +{"test":"1947 #3 CHECK - TRUE","sleep_calls":"4","sleep_microseconds":"4000"}
2022-07-08 13:51:03 +{"test":"1947 #1 CHECK - FALSE","sleep_calls":"6","sleep_microseconds":"6000"}
2022-07-08 13:51:03 +{"test":"1947 #2 CHECK - FALSE","sleep_calls":"6","sleep_microseconds":"6000"}
2022-07-08 13:51:03 +{"test":"1947 #3 CHECK - FALSE","sleep_calls":"4","sleep_microseconds":"4000"}
2022-07-08 13:51:03 
2022-07-08 13:51:03 
2022-07-08 13:51:03 Database: test_po0t2c

I'm not sure the new result of query SELECT '1947 #3 CHECK - FALSE' as test, ProfileEvents['SleepFunctionCalls'] as sleep_calls, ProfileEvents['SleepFunctionMicroseconds'] as sleep_microseconds FROM system.query_log .... in test 01947_mv_subquery is right. Can you help me ?

alexey-milovidov · 2022-09-04T02:03:30Z

@KochetovNicolai latest review was about one month ago.

hanfei1991 · 2023-02-13T12:48:21Z

src/Storages/MergeTree/MergeTreeWhereOptimizer.cpp

@@ -193,8 +193,8 @@ void MergeTreeWhereOptimizer::analyzeImpl(Conditions & res, const ASTPtr & node,
            /// Condition depend on some column. Constant expressions are not moved.
            !cond.identifiers.empty()
            && !cannotBeMoved(node, is_final)
-            /// Do not take into consideration the conditions consisting only of the first primary key column
-            && !hasPrimaryKeyAtoms(node)
+            /// when use final, do not take into consideration the conditions consisting of column that not belong to the sorting key columns


I can't understand this: why we will not consider non-sorting columns when using final? Any correctness issues? @KochetovNicolai

Yes, this is a correctness issue.
For queries with FINAL we may want to merge some rows with the same sorting key values. So, we cannot filter (except if the condition is fully on top of sorting key) before FINAL happens (=cannot move the condition to prewhere)

hanfei1991 · 2023-02-15T17:18:12Z

test failure unrelated

hexiaoting added 2 commits July 7, 2022 17:26

Allow more situations with final to push where to prewhere

99995f5

fix style

009fd0c

robot-ch-test-poll1 added the pr-improvement Pull request with some product improvements label Jul 7, 2022

KochetovNicolai reviewed Jul 7, 2022

View reviewed changes

KochetovNicolai self-assigned this Jul 7, 2022

fix bug

ee4137c

hexiaoting changed the title ~~Dev prewhere~~ Support optimize where clause with sorting key expression move to prewhere for query with final Jul 8, 2022

fix test cases

f19511a

KochetovNicolai and others added 6 commits July 18, 2022 10:52

Do not sleep at key kondition analysis.

b589f87

Merge branch 'master' into dev-prewhere

12c14b2

Merge branch 'master' into dev-prewhere

d0b6363

Fixing tests.

8888058

Merge branch 'master' into dev-prewhere

cb9c82f

Fixing tests.

ffe8c86

alexey-milovidov unassigned KochetovNicolai Feb 8, 2023

hanfei1991 self-assigned this Feb 9, 2023

hanfei1991 reviewed Feb 13, 2023

View reviewed changes

hanfei1991 added 5 commits February 14, 2023 18:39

Merge branch 'master' into dev-prewhere

32050ac

some clean up

86fda9b

clean up

937fade

change tests

051f551

Merge branch 'master' into dev-prewhere

5458d42

hanfei1991 approved these changes Feb 15, 2023

View reviewed changes

hanfei1991 merged commit b152419 into ClickHouse:master Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support optimize `where` clause with sorting key expression move to `prewhere` for query with `final` #38950

Support optimize `where` clause with sorting key expression move to `prewhere` for query with `final` #38950

hexiaoting commented Jul 7, 2022

KochetovNicolai Jul 7, 2022

KochetovNicolai Jul 7, 2022

hexiaoting Jul 8, 2022

KochetovNicolai Jul 7, 2022

hexiaoting Jul 8, 2022

KochetovNicolai Jul 7, 2022

hexiaoting Jul 8, 2022

KochetovNicolai commented Jul 8, 2022

hexiaoting commented Jul 11, 2022

alexey-milovidov commented Sep 4, 2022

hanfei1991 Feb 13, 2023

KochetovNicolai Feb 14, 2023

hanfei1991 commented Feb 15, 2023

Support optimize where clause with sorting key expression move to prewhere for query with final #38950

Support optimize where clause with sorting key expression move to prewhere for query with final #38950

Conversation

hexiaoting commented Jul 7, 2022

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

KochetovNicolai commented Jul 8, 2022

hexiaoting commented Jul 11, 2022

alexey-milovidov commented Sep 4, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hanfei1991 commented Feb 15, 2023

Support optimize `where` clause with sorting key expression move to `prewhere` for query with `final` #38950

Support optimize `where` clause with sorting key expression move to `prewhere` for query with `final` #38950