Fix race condition in IPartitionStrategy::computePartitionKey (used by object storage sinks)#99400
Fix race condition in IPartitionStrategy::computePartitionKey (used by object storage sinks)#99400arthurpassos wants to merge 5 commits intoClickHouse:masterfrom
IPartitionStrategy::computePartitionKey (used by object storage sinks)#99400Conversation
|
Workflow [PR], commit [20b9624] Summary: ❌
AI ReviewSummaryThis PR makes Missing context
ClickHouse Rules
Final Verdict
|
| WildcardPartitionStrategy(KeyDescription partition_key_description_, const Block & sample_block_, ContextPtr context_); | ||
|
|
||
| ColumnPtr computePartitionKey(const Chunk & chunk) override; | ||
| ColumnPtr computePartitionKey(const Chunk & chunk) const override; |
There was a problem hiding this comment.
❌ WildcardPartitionStrategy::computePartitionKey signature was made const in the header, but the .cpp definition is still non-const (src/Storages/IPartitionStrategy.cpp, near line 299). This makes the member definition not match any declaration and breaks compilation.
Please update the definition to ColumnPtr WildcardPartitionStrategy::computePartitionKey(const Chunk & chunk) const.
| BuildAST && build_ast) | ||
| { | ||
| if (cached_result) | ||
| return *cached_result; |
There was a problem hiding this comment.
To be honest code structure seems strange - like we pass cached_result into this method as an arg and return from this method if it already has a value, so I see no point of calling this method with cached_result arg unless we would have set cached_result value at the end of this method when it turned out empty and we rebuilt it.
There was a problem hiding this comment.
I am also not a big fan of it. The idea of the cached_result if statement inside the method was to avoid duplicating it in the two computePartitionKey implementations.
I can make it simpler if you want.
There was a problem hiding this comment.
diff --git a/src/Storages/IPartitionStrategy.cpp b/src/Storages/IPartitionStrategy.cpp
index 77826c80afb..87a503c8685 100644
--- a/src/Storages/IPartitionStrategy.cpp
+++ b/src/Storages/IPartitionStrategy.cpp
@@ -97,27 +97,6 @@ namespace
return makeASTFunction("toString", std::move(arguments));
}
- template <typename BuildAST>
- PartitionExpressionActionsAndColumnName getCachedOrBuildActions(
- const std::optional<PartitionExpressionActionsAndColumnName> & cached_result,
- const IPartitionStrategy & partition_strategy,
- BuildAST && build_ast)
- {
- if (cached_result)
- return *cached_result;
-
- auto expression_ast = build_ast();
- return partition_strategy.getPartitionExpressionActions(expression_ast);
- }
-
- void cacheDeterministicActions(
- std::optional<PartitionExpressionActionsAndColumnName> & cached_result,
- const PartitionExpressionActionsAndColumnName & actions_with_column)
- {
- if (!actions_with_column.actions->getActionsDAG().hasNonDeterministic())
- cached_result = actions_with_column;
- }
-
std::shared_ptr<IPartitionStrategy> createHivePartitionStrategy(
ASTPtr partition_by,
const Block & sample_block,
@@ -289,19 +268,29 @@ std::shared_ptr<IPartitionStrategy> PartitionStrategyFactory::get(StrategyType s
WildcardPartitionStrategy::WildcardPartitionStrategy(KeyDescription partition_key_description_, const Block & sample_block_, ContextPtr context_)
: IPartitionStrategy(partition_key_description_, sample_block_, context_)
{
- auto actions_with_column = getCachedOrBuildActions(
- cached_result,
- *this,
- [&] { return buildToStringPartitionAST(partition_key_description.definition_ast); });
- cacheDeterministicActions(cached_result, actions_with_column);
+ ASTs arguments(1, partition_key_description.definition_ast);
+ ASTPtr partition_by_string = makeASTFunction("toString", std::move(arguments));
+ auto actions_with_column = getPartitionExpressionActions(partition_by_string);
+ if (!actions_with_column.actions->getActionsDAG().hasNonDeterministic())
+ {
+ cached_result = actions_with_column;
+ }
}
ColumnPtr WildcardPartitionStrategy::computePartitionKey(const Chunk & chunk) const
{
- auto actions_with_column = getCachedOrBuildActions(
- cached_result,
- *this,
- [&] { return buildToStringPartitionAST(partition_key_description.definition_ast); });
+ PartitionExpressionActionsAndColumnName actions_with_column;
+
+ if (cached_result)
+ {
+ actions_with_column = *cached_result;
+ }
+ else
+ {
+ ASTs arguments(1, partition_key_description.definition_ast);
+ ASTPtr partition_by_string = makeASTFunction("toString", std::move(arguments));
+ actions_with_column = getPartitionExpressionActions(partition_by_string);
+ }
Block block_with_partition_by_expr = sample_block.cloneWithoutColumns();
block_with_partition_by_expr.setColumns(chunk.getColumns());
@@ -341,11 +330,13 @@ HiveStylePartitionStrategy::HiveStylePartitionStrategy(
block_without_partition_columns = buildBlockWithoutPartitionColumns(sample_block, partition_columns_name_set);
- auto actions_with_column = getCachedOrBuildActions(
- cached_result,
- *this,
- [&] { return buildHivePartitionAST(partition_key_description.definition_ast, getPartitionColumns()); });
- cacheDeterministicActions(cached_result, actions_with_column);
+ auto hive_ast = buildHivePartitionAST(partition_key_description.definition_ast, getPartitionColumns());
+ auto actions_with_column = getPartitionExpressionActions(hive_ast);
+
+ if (!actions_with_column.actions->getActionsDAG().hasNonDeterministic())
+ {
+ cached_result = actions_with_column;
+ }
}
std::string HiveStylePartitionStrategy::getPathForRead(const std::string & prefix)
@@ -385,10 +376,17 @@ std::string HiveStylePartitionStrategy::getPathForWrite(
ColumnPtr HiveStylePartitionStrategy::computePartitionKey(const Chunk & chunk) const
{
- auto actions_with_column = getCachedOrBuildActions(
- cached_result,
- *this,
- [&] { return buildHivePartitionAST(partition_key_description.definition_ast, getPartitionColumns()); });
+ PartitionExpressionActionsAndColumnName actions_with_column;
+
+ if (cached_result)
+ {
+ actions_with_column = *cached_result;
+ }
+ else
+ {
+ auto hive_ast = buildHivePartitionAST(partition_key_description.definition_ast, getPartitionColumns());
+ actions_with_column = getPartitionExpressionActions(hive_ast);
+ }
Let me know if you want me to push this diff
There was a problem hiding this comment.
Ok let's leave as it already is, but add a comment at the beginning of this function mentioning that cached_result must have been cached in constructor if deterministic, otherwise we always rebuild from scratch...
| auto hive_ast = buildHivePartitionAST(partition_key_description.definition_ast, getPartitionColumns()); | ||
| auto actions_with_column = getPartitionExpressionActions(hive_ast); | ||
| auto actions_with_column = getCachedOrBuildActions( | ||
| cached_result, |
There was a problem hiding this comment.
We've cached the cached_result in HiveStylePartitionStrategy constructor, why call getCachedOrBuildActions again?
There was a problem hiding this comment.
cacheDeterministicActions might decide not to cache it because of ActionsDAG::hasNonDeterministic
LLVM Coverage Report
PR changed lines: PR changed-lines coverage: 100.00% (57/57, 0 noise lines excluded) |

IPartitionStrategy::computePartitionKeymight be called from different threads, and it writes tocached_resultconcurrently without any sort of protection. It would be easier to add a mutex around it, but we can actually make it lock-free by moving the cache write to the constructor.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix possible race condition in
IPartitionStrategy::cached_resultintroduced in #92844Documentation entry for user-facing changes
Note
Medium Risk
Touches object-storage partition key computation and caching; while intended to be behavior-preserving, it changes when expression actions are built/cached and could impact partitioning/perf under concurrency.
Overview
Fixes a potential race on
IPartitionStrategy::cached_resultby removing cache writes duringcomputePartitionKey/getPartitionExpressionActionsand instead precomputing and caching deterministic expression actions in theWildcardPartitionStrategyandHiveStylePartitionStrategyconstructors.Refactors action creation into small helpers (
buildToStringPartitionAST,getCachedOrBuildActions,cacheDeterministicActions) and makescomputePartitionKey/getPartitionExpressionActionsconstto reflect thread-safe usage.Written by Cursor Bugbot for commit fb89e89. This will update automatically on new commits. Configure here.