Skip to content

Fix cluster functions with hive partitioning#1863

Open
ianton-ru wants to merge 5 commits into
antalya-26.3from
bugfix/antalya-26.3/1855_s3cluster_hive
Open

Fix cluster functions with hive partitioning#1863
ianton-ru wants to merge 5 commits into
antalya-26.3from
bugfix/antalya-26.3/1855_s3cluster_hive

Conversation

@ianton-ru

Copy link
Copy Markdown

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix cluster functions with hive partitioning

Documentation entry for user-facing changes

Solved #1855
List of virtual columns can include hive columns.
In non-cluster case iterator is created with filtered list of hive columns, where virtual columns columns are removed (see https://github.com/Altinity/ClickHouse/blob/antalya-26.3/src/Storages/prepareReadingFromFormat.cpp#L49).
In cluster case iterator was created with full lists. As result, hive columns added twice, later first column was filled with data, second was empty, and attempt to filter failed with error SIZES_OF_COLUMNS_DOESNT_MATCH,

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

Workflow [PR], commit [9d5e1d1]

@ianton-ru

Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f8cc3a0eb6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/integration/test_s3_cluster/test.py Outdated
@ianton-ru

Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9850be48c3

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tests/integration/test_file_cluster/test.py Outdated
@ianton-ru

Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. What shall we delve into next?

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


NamesAndTypesList IStorageCluster::getHivePartitionColumnsWithoutVirtuals() const
{
// Virtual columns can contain hive columns, so we remove these hive coulmns to avoid duplicates.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, that wasn't supposed to be the case

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hive columns added to virtuals in getVirtualsForFileLikeStorage method when partition_strategy == PartitionStrategyFactory::StrategyType::NONE.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I have just checked the source code and the behavior has indeed been modified. Upstream is now maintaining this. When I implemented it, it wasn't supposed to be like this.

Regardless, then I think this change is fine. Will approve it.

@Selfeer

Selfeer commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

AI audit note: This review comment was generated by AI (gpt-5.3-codex).

Audit update for PR #1863 (Fix cluster functions with hive partitioning):

Confirmed defects:

No confirmed defects in reviewed scope.

Coverage summary:

  • Scope reviewed: PR diff in src/Storages/IStorageCluster.cpp, src/Storages/IStorageCluster.h, src/Storages/ObjectStorage/StorageObjectStorageCluster.cpp, src/Storages/ObjectStorage/StorageObjectStorageCluster.h, src/Storages/StorageFileCluster.cpp, src/Storages/StorageFileCluster.h, src/Storages/StorageURLCluster.cpp, src/Storages/StorageURLCluster.h, and added integration tests in tests/integration/test_file_cluster/test.py, tests/integration/test_s3_cluster/test.py, tests/integration/test_storage_url/test.py.
  • Categories failed: None confirmed.
  • Categories passed: Call-graph and transition checks for task iterator construction/filtering; fail-open/fail-closed checks for missing/filtered paths; shared-state/interleaving checks for iterator/filter state; exception/partial-update checks (no new mutation-heavy state transitions); required C++ bug classes reviewed (lifetime, iterator invalidation, races/deadlocks, exception safety, integer/signedness, ownership/RAII, UB) with no confirmed defect in changed paths.
  • Assumptions/limits: Static audit only (no runtime execution in this pass); conclusions are limited to changed code paths and their direct interactions.

@Selfeer

Selfeer commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

PR #1863 CI Triage: "Fix cluster functions with hive partitioning"

PR: #1863
Title: Fix cluster functions with hive partitioning
Branch: bugfix/antalya-26.3/1855_s3cluster_hive
Head SHA: 9d5e1d15b7e88e2017a778370f5c8239f843f0e1
Author: Anton Ivashkin (@ianton-ru)

Files Changed

  • src/Storages/IStorageCluster.cpp / .h — new getHivePartitionColumnsWithoutVirtuals() helper; moves hive_partition_columns_to_read_from_file_path to base class
  • src/Storages/ObjectStorage/StorageObjectStorageCluster.cpp / .h — use getHivePartitionColumnsWithoutVirtuals()
  • src/Storages/StorageFileCluster.cpp / .h — use getHivePartitionColumnsWithoutVirtuals(); fix setVirtuals() to pass partition strategy for hive column dedup
  • src/Storages/StorageURLCluster.cpp / .h — use getHivePartitionColumnsWithoutVirtuals()
  • tests/integration/test_file_cluster/test.py — new hive partitioning + WHERE filter test
  • tests/integration/test_s3_cluster/test.py — new hive partitioning + WHERE filter test
  • tests/integration/test_storage_url/test.py — new hive partitioning + WHERE filter test

Scope of change: Purely hive partition column deduplication in cluster storage task iterators. No fault tolerance, retry logic, or node failure handling is modified.


Summary

Category Count Tests
PR-caused regression 0
Pre-existing flaky 4 00157_cache_dictionary, 03443_shared_storage_snapshots, stress test state pollution, swarms node failure timing
Infrastructure issue (known broken) 1 test_different_versions (NETLINK_ERROR)

Conclusion: None of the 5 CI failures are caused by this PR. Safe to merge.


Failure-by-Failure Analysis

1. Stateless tests (arm_binary, sequential)00157_cache_dictionary FAIL


2. Stateless tests (arm_asan, azure, sequential, 2/2)03443_shared_storage_snapshots FAIL


3. Stress test (amd_debug)Cannot start clickhouse-server FAIL

Error:

Code: 60. DB::Exception: Table test_1.local_hot does not exist:
Cannot attach table `test_1`.`t3` ...
ENGINE = Hybrid(remote('localhost:9000', 'test_1', 'local_hot'), ...)
  • PR does not touch Hybrid engine code
  • Verdict: Pre-existing flaky — stress test state pollution from Altinity-specific Hybrid engine table not cleaned up between runs

4. Integration tests (arm_binary, distributed plan, 4/4) — Two failures

4a. test_storage_s3_queue/test_0.py::test_move_after_processing[another_bucket-AzureQueue] (counted as Failure: 1/1039)

Error:

assert 0 == (5 * 10)
  where 0 = int('0\n')
  where '0\n' = query("SELECT sum(rows_processed) FROM system.azure_queue_log
                       WHERE table = 'move_after_processing_unordered_AzureQueue_ztwdcp'")
  • AzureQueue log shows 0 rows processed — queue message delivery/timing issue
  • No relation to cluster functions or hive partitioning
  • Verdict: Pre-existing flaky test — AzureQueue processing timing

4b. test_backup_restore_on_cluster/test_different_versions.py::test_different_versions (filtered out by broken tests handler)

Broken tests handler log:

test_backup_restore_on_cluster/test_different_versions.py::test_different_versions
Matched rule: {'reason': 'INVESTIGATE - NETLINK_ERROR', 'message': 'NETLINK_ERROR'}
  • Correctly filtered as a known infrastructure networking issue (Docker NETLINK_ERROR)
  • Not counted in the CI failure summary
  • Verdict: Known infrastructure issue — CI properly recognized and filtered this

5. Regression release swarms/swarms/feature/node failure/check restart swarm node FAIL

Test purpose: Verify that a query running on a swarm cluster fails when a swarm node is killed during execution.

What the test does:

  1. Loads 100 rows into an Iceberg table
  2. Runs SELECT count(), hostName() FROM iceberg_table WHERE NOT ignore(sleepEachRow(1)) GROUP BY hostName() SETTINGS object_storage_cluster='static_swarm_cluster' — expected to take ~100 seconds
  3. In parallel: stops clickhouse3 and immediately restarts it
  4. Expects the query to fail with exit code 138 (DB::Exception: Query was cancelled.)

What actually happened:

[clickhouse1] 100    clickhouse2
[clickhouse1] bash# echo $?
[clickhouse1] 0
  • The query succeeded (exit code 0, expected 138)
  • All 100 rows returned from clickhouse2 (original distribution was clickhouse2: 47, clickhouse3: 53)
  • Query took 1m 47s ≈ 100 rows × ~1 second/row (all rows processed by clickhouse2 after clickhouse3 recovery)
  • clickhouse3 was stopped for only ~1 second (docker-compose stop + start took ~3s total)

Why this is NOT caused by the PR:

The PR changes are limited to:

  • Adding getHivePartitionColumnsWithoutVirtuals() — filters duplicate hive columns in task iterators
  • Moving hive_partition_columns_to_read_from_file_path to the base class

Neither change touches fault tolerance, retry logic, or node failure handling in the cluster query execution path. The getTaskIteratorExtension change only affects which columns are passed to the file iterator at query setup time.

@Selfeer

Selfeer commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

Need to take one more closer look at the swarm testes failures before marking as verifeid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants