26.1 Antalya port - improvements for cluster requests#1414
26.1 Antalya port - improvements for cluster requests#1414zvonand merged 24 commits intoantalya-26.1from
Conversation
…e_count_in_datalake 25.8 Antalya ports: Read optimization using Iceberg metadata
…wipes_my_snot 25.8 Antalya: Fixes for #1069
…el_segfailt Check multiplexed connection before usage
…_in_task Small iceberg read optimization
…orage_cluster_profile_events Profile events for task distribution in ObjectStorageCluster requests
…_storage_cluster_distribution
…n_unexpected_shutdown Fix segfault on unexpected node shutdown
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9bd95cb8cd
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| replica_to_files_to_be_processed.erase(number_of_current_replica); | ||
| for (const auto & file : processed_file_list_ptr->second) |
There was a problem hiding this comment.
Keep task list before removing lost replica entry
In rescheduleTasksFromReplica, the code erases replica_to_files_to_be_processed[number_of_current_replica] and then immediately iterates processed_file_list_ptr->second. That iterator points to the erased element, so this is undefined behavior on the replica-failover path. When a replica drops mid-query, this can crash the coordinator or prevent tasks from being re-queued reliably.
Useful? React with 👍 / 👎.
| if (read_context.packet.type == Protocol::Server::Data) | ||
| read_context.has_data_packets = true; |
There was a problem hiding this comment.
Ignore empty Data packets when deciding task rerun
The failover guard marks has_data_packets true for any Protocol::Server::Data packet, but empty Data blocks are valid and do not imply rows were processed. If a replica sends only empty Data blocks and then disconnects, the EOF handler will skip the ConnectionLost rerun path (!has_data_packets is false), so unprocessed tasks are not reassigned even with skip_unavailable_shards enabled.
Useful? React with 👍 / 👎.
|
Failed stateless and integration tests seem unrelated. |
| # Weird, bu looks like ReadFileMetadata does not used local file cache in 26.1 | ||
| # metadata.json always downloaded in 26.1, once per query or subquery | ||
| # In 25.8 count was equal to expected, in 26.1 it is expected * 3 + 1 for Local case | ||
| # expected * 3 + 4 for Cluster case, because each subquery loads mettadata.json |
|
|
PR #1414 Audit Review
1) Scope and partitionsPR reviewed: Altinity/ClickHouse#1414 Large PR; reviewed in functional partitions:
2) Call graphMain changed runtime path for cluster table functions:
State/caches touched:
3) Transition matrix
4) Logical code-path testing summaryReviewed changed branches:
Concurrency/interleavings checked:
5) Fault categories and category-by-category injection results
6) Confirmed defects (High/Medium/Low)High — Reschedule path uses a different task identity than normal scheduling, causing task collapse/misrouting after replica loss
Evidence: for (const auto & file : processed_file_list_ptr->second)
{
auto file_replica_idx = getReplicaForFile(file->getPath());
unprocessed_files.emplace(file->getPath(), std::make_pair(file, file_replica_idx));
connection_to_files[file_replica_idx].push_back(file);
}String file_identifier;
if (send_over_whole_archive && object_info->isArchive())
file_identifier = object_info->getPathOrPathToArchiveIfArchive();
else
file_identifier = object_info->getIdentifier();String ObjectInfo::getIdentifier() const
{
String result = getPath();
if (file_bucket_info)
result += file_bucket_info->getIdentifier();
Medium — Any JSON-looking file path is interpreted as a control command and can be dropped as empty path
Evidence: auto json = parser.parse(task).extract<Poco::JSON::Object::Ptr>();
if (!json)
return;
is_valid = true;
if (json->has("file_path"))
file_path = json->getValue<std::string>("file_path");
if (json->has("retry_after_us"))
retry_after_us = json->getValue<size_t>("retry_after_us");explicit RelativePathWithMetadata(String command_or_path, std::optional<ObjectMetadata> metadata_ = std::nullopt)
: relative_path(std::move(command_or_path))
, metadata(std::move(metadata_))
, command(relative_path)
{
if (command.isValid())
{
relative_path = command.getFilePath().value_or("");if (object_info->relative_path_with_metadata.getCommand().isValid())
{
auto retry_after_us = object_info->relative_path_with_metadata.getCommand().getRetryAfterUs();
...
}
if (object_info->getPath().empty())
return {};
7) Coverage accounting + stop-condition status
8) Assumptions & Limits
9) Confidence rating and confidence-raising evidence
10) Residual risks and untested paths
|
QA VerificationPR #1414 is a frontport of 9 Antalya sub-PRs from 25.8 to 26.1, covering swarm mode, Iceberg read optimization, cluster task distribution improvements, and crash fixes. Verdict: All new integration tests pass, no CI failures are related to the PR. Approved. CI Failures Analysis (non-regression)All builds passed. No test failures are related to PR #1414. 1.
|
| Test | Coverage | CI Result |
|---|---|---|
test_s3_cache_locality (3 tests) |
Cache locality, rendezvous hashing, lock_object_storage_task_distribution_ms |
OK on amd_binary/arm_binary. BROKEN on ASAN/TSAN |
test_s3_cluster::test_graceful_shutdown |
Graceful shutdown of S3 cluster node (SYSTEM STOP SWARM MODE) | OK on all builds |
test_read_constant_columns_optimization (4 variants: S3/Azure × True/False) |
Iceberg constant columns read optimization | OK on all builds |
gtest_rendezvous_hashing (unit test, updated) |
Rendezvous hashing algorithm | OK |
Note: Sub-PR #1201 (segfault on unexpected node shutdown) does not have a dedicated test. The scenario would be covered by the swarms/node_failure regression suite, but it cannot run yet due to the missing object_storage_cluster setting.
Regression Tests (Altinity/clickhouse-regression)
All failures are pre-existing and affect all antalya-26.1 PRs:
| Suite | x86 | arm64 | Reason |
|---|---|---|---|
| swarms | FAIL | FAIL | object_storage_cluster setting not yet forward-ported to 26.1 (will be added by PR #1390) |
| iceberg_1, iceberg_2 | FAIL | FAIL | Same object_storage_cluster missing setting; swarm and iterator race condition tests use it |
| parquet | FAIL | FAIL | Antalya features still being forward-ported to 26.1 |
| settings | FAIL | FAIL | Settings list/default values not updated for 26.1 yet |
The swarms and iceberg suites test functionality related to this PR, but all tests fail immediately because the object_storage_cluster setting (String, cluster name) does not exist yet in antalya-26.1. It exists in antalya-25.8 and will be forward-ported by PR #1390.
The new integration tests added by this PR (test_graceful_shutdown, test_s3_cache_locality, test_read_constant_columns_optimization) confirm the PR code works correctly.
|
@CarlosFelipeOR |
|
Issue created for "Reschedule path uses a different task identity" : #1486 |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Frontports for Antalya 26.1
CI/CD Options
Exclude tests:
Regression jobs to run: