New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a pass in Analyzer for time filter optimization with preimage #52091
Add a pass in Analyzer for time filter optimization with preimage #52091
Conversation
bfa81bb
to
c933d6a
Compare
This is an automated comment for commit 9ff409f with description of existing statuses. It's updated for the latest CI running
|
c933d6a
to
4b54ba8
Compare
A new issue is observed in the functional test 00515_enhanced_time_zones when the Analyzer is enabled by default. I've basically identified the root cause and will suspend the code review temporarily until the issue is fixed. |
The preimage has been identified as the general solution to the optimization of predicates with time converters (ClickHouse#15257). PR ClickHouse#50951 implemented this solution via the AST rewrite. As a follow-up, this commit extends the optimization to the experi- mental analyzer by replacing the sub-QueryTree of the time filter with its preimage-transformed one. The optimization is implemented as a new pass in the Analyzer.
2195b70
to
9ff409f
Compare
Date/Date32/DateTime/DateTime64 columns are required for arguments of time converters, such as toYear and toYYYYMM.
Hi @hanfei1991, with the failed cases fixed, I think this PR is ready for the code review. This PR basically follows the implementation of the AST rewrite (#50951), and could you kindly help me review this one as well? Thanks a lot for your continuous help! |
Thanks for your review and acceptance! @hanfei1991 |
Why is this optimization not under the setting? There is no way to disable it. If it's a performance improvement why is there no new performance test? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The whole cpp file is written not like in the style guide. Very hard to read, especially because of huge amount of the copy-paste.
size_t literal_id = 1 - func_id; | ||
const auto * literal = function->getArguments().getNodes()[literal_id]->as<ConstantNode>(); | ||
|
||
if (!literal || literal->getValue().getType() != Field::Types::UInt64) return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the field type used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The literal in a SQL query is parsed as one of UInt64, Int64, Float64, String, Null (as ParserLiteral described),
and in this case, where the literal is compared with results from toYear or toYYYYMM converters, we expect the type of the literal to be UInt64, and decide to return early elsewise.
const auto lhs = std::make_shared<FunctionNode>("less"); | ||
lhs->getArguments().getNodes().push_back(std::make_shared<ColumnNode>(column_node.getColumn(), column_node.getColumnSource())); | ||
lhs->getArguments().getNodes().push_back(std::make_shared<ConstantNode>(start_date_or_date_time)); | ||
resolveOrdinaryFunctionNode(*lhs, lhs->getFunctionName()); | ||
|
||
const auto rhs = std::make_shared<FunctionNode>("greaterOrEquals"); | ||
rhs->getArguments().getNodes().push_back(std::make_shared<ColumnNode>(column_node.getColumn(), column_node.getColumnSource())); | ||
rhs->getArguments().getNodes().push_back(std::make_shared<ConstantNode>(end_date_or_date_time)); | ||
resolveOrdinaryFunctionNode(*rhs, rhs->getFunctionName()); | ||
|
||
const auto new_date_filter = std::make_shared<FunctionNode>("or"); | ||
new_date_filter->getArguments().getNodes() = {lhs, rhs}; | ||
resolveOrdinaryFunctionNode(*new_date_filter, new_date_filter->getFunctionName()); | ||
|
||
return new_date_filter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost a complete copy-paste of equals
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing out this flaw! Maybe I should implement a utility function for constructing FunctionNode
and have it build the subtrees of equals
, notEquals
, etc. instead, which would make the code neater.
Hi Dmitry, I think it's required to add this optimization under the settings if it unintendedly breaks another optimization, and I will soon send out another PR for it. Meanwhile, I'd also like to explore if this preimage optimization could co-work with the read-in-order one. And exactly, this optimization is supported more by the performance data of SSB than a CI performance test. I'm not sure if it is still necessary to add the performance test in this case, but I will try to add one if so. The code style is also to be refined in the upcoming PR, where the code duplication problem will be further mitigated. Thanks for your comments! @novikd |
The fuzzer is reporting a problem: https://s3.amazonaws.com/clickhouse-test-reports/0/f69bfa452633d2fcc2052a258f7202666053c787/ast_fuzzer__debug_.html The comments from @novikid still stands by the way. The format of the file is bad, and there is no setting to disable this optimization. I'm having a look to see if it's something easy or I'll simply disable this completely until fixed |
Extremely sorry for failing to make progress in this code refinement in the last months due to lots of urgent internal tasks. I will try to revise the code accordingly in the coming days to 1) fix the reported problem 2) refine the code style 3) add new setting to turn the optimization on optionally. But it's still OK if it needs to be disabled temporarily before I could completely resolve the known issues. Thanks for your comments and picking up this thread. |
All 3 items should be covered by #60453 |
Thanks for the fixes from which I've learned a lot. And I also noticed that the implementation of generating |
#60875 has been created for refactoring this part of code. Could you help me review it? Thanks in advance for your help! |
The preimage has been identified as the general solution to the
optimization of predicates with time converters (#15257). PR #50951
implemented this solution via the AST rewrite.
As a follow-up, this commit extends the optimization to the experi-
mental analyzer by replacing the sub-QueryTree of the time filter
with its preimage-transformed one. The optimization is implemented
as a new pass in the Analyzer.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
The performance experiments of SSB on the ICX device (Intel Xeon Platinum 8380 CPU, 80 cores, 160 threads) show that this change could bring an improvement of 8.5% to the geomean QPS when the experimental analyzer is enabled. The details are shown below: