Skip to content

Antalya 26.3: Query condition cache for iceberg tables#1804

Merged
zvonand merged 2 commits into
antalya-26.3from
feature/antalya-26.3/ClickHouse-ClickHouse-pr-102115
May 18, 2026
Merged

Antalya 26.3: Query condition cache for iceberg tables#1804
zvonand merged 2 commits into
antalya-26.3from
feature/antalya-26.3/ClickHouse-ClickHouse-pr-102115

Conversation

@zvonand
Copy link
Copy Markdown
Member

@zvonand zvonand commented May 15, 2026

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Query condition cache for iceberg tables (ClickHouse#102115 by @scanhex12).

Cherry-picked from ClickHouse#102115.


Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

scanhex12 and others added 2 commits May 15, 2026 21:35
…solution in next commit)

---
Original cherry-pick message follows:

Merge pull request ClickHouse#102115 from scanhex12/iceberg_qcc

Query condition cache for iceberg tables

# Conflicts:
#	src/Databases/DataLake/DatabaseDataLake.cpp
#	src/Processors/Formats/Impl/ParquetV3BlockInputFormat.cpp
#	src/Processors/Formats/Impl/ParquetV3BlockInputFormat.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFileIterator.cpp
#	src/Storages/ObjectStorage/StorageObjectStorageSource.cpp
@zvonand zvonand added releasy Created/managed by RelEasy ai-resolved Port conflict auto-resolved by Claude labels May 15, 2026
@github-actions
Copy link
Copy Markdown

Workflow [PR], commit [9966fee]

@svb-alt svb-alt added the backport Backport label May 16, 2026
@zvonand
Copy link
Copy Markdown
Member Author

zvonand commented May 17, 2026

RelEasy analyze-fails — UNRESOLVED

run completed at 2026-05-17T17:44:01Z

  • Head SHA: 9966fee1df (feature/antalya-26.3/ClickHouse-ClickHouse-pr-102115)
  • Tests considered: 8 across 4 CI shard(s)
  • Outcomes: 0 done · 0 partial · 3 unrelated · 1 unresolved
  • Commits added by AI: 0 (—)
  • Anthropic cost: $1.9734

Per-shard outcomes

⏭️ UNRELATED — Stateless tests (arm_asan, azure, sequential, 1/2)

2 failed test(s) considered — cost $0.6093
full report

AI narration

Both failures are clearly unrelated to this PR. Let me confirm by examining what each test does and what this PR actually changes.
Both failing tests are clearly unrelated to this PR. The PR only changes iceberg query condition cache code (DataLake catalogs, Parquet format filters, object storage iteration). The failing tests are:

  1. 03760_backup_tar_archive — MergeTree BACKUP/RESTORE to tar archive; failed with TOO_SLOW exception on a 500-row INSERT FROM numbers(500). No code path here touches iceberg, parquet, object storage, or QCC. Flaky-elsewhere on 4 other tracked PRs — strong evidence of master/infra flake on this azure-sequential shard.

  2. Scraping system tablesclickhouse-local Aborted while dumping query_log and transactions_info_log. Wholly unrelated to QCC for iceberg. Flaky-elsewhere on 1 other PR.

No code changes are needed; this is a clean UNRELATED outcome.

Final summary

  • 03760_backup_tar_archive[unrelated] INSERT FROM numbers(500) aborted with TOO_SLOW (612s estimated for 500 rows) on the arm_asan azure sequential shard. This PR only changes iceberg/parquet/object-storage code (cherry-pick of upstream Query condition cache for iceberg tables ClickHouse/ClickHouse#102115 — query condition cache for iceberg tables); the test exercises MergeTree BACKUP/RESTORE to a tar archive and never touches any code in the diff. Marked flaky on 4 other tracked PRs — overwhelming evidence of azure-shard / CI-environment flake.

  • Scraping system tables[unrelated] clickhouse-local Aborted when dumping query_log / transactions_info_log. No connection to QCC for iceberg, parquet readers, or IObjectIterator changes. Marked flaky on 1 other PR.

No commits made; no code changed.

UNRELATED

Final summary

  • 03760_backup_tar_archive[unrelated] INSERT FROM numbers(500) aborted with TOO_SLOW (612s estimated for 500 rows) on the arm_asan azure sequential shard. This PR only changes iceberg/parquet/object-storage code (cherry-pick of upstream Query condition cache for iceberg tables ClickHouse/ClickHouse#102115 — query condition cache for iceberg tables); the test exercises MergeTree BACKUP/RESTORE to a tar archive and never touches any code in the diff. Marked flaky on 4 other tracked PRs — overwhelming evidence of azure-shard / CI-environment flake.

  • Scraping system tables[unrelated] clickhouse-local Aborted when dumping query_log / transactions_info_log. No connection to QCC for iceberg, parquet readers, or IObjectIterator changes. Marked flaky on 1 other PR.

No commits made; no code changed.

UNRELATED

⏭️ UNRELATED — Stateless tests (arm_asan, targeted)

1 failed test(s) considered — cost $0.6944
full report

AI narration

Triage summary

Test: 01161_all_system_tables

What happened: Test timed out after 300s on SELECT * FROM system.parts_columns LIMIT 10e3, total runtime 446.2s in a debug+ASan arm shard.

This PR's diff: A cherry-pick of upstream PR ClickHouse#102115 — adds query condition cache for Iceberg tables. The changes are confined to:

  • src/Databases/DataLake/*
  • src/Formats/FormatFilterInfo.{cpp,h} and src/Processors/Formats/IInputFormat.{cpp,h}
  • src/Processors/Formats/Impl/ParquetV3BlockInputFormat.{cpp,h}
  • src/Processors/QueryPlan/ReadFromObjectStorageStep.{cpp,h}
  • src/Storages/ObjectStorage/* and Iceberg manifest iterator
  • new integration tests for the cache

None of these touch system.parts_columns, MergeTreeData part enumeration, or anything in the hot path of the failing query. system.parts_columns is a virtual MergeTree-parts table and the changed code only kicks in for Iceberg/Parquet object-storage reads.

01161_all_system_tables is a well-known long-running test (tagged long, no-parallel, no-object-storage, no-distributed-cache); under debug+ASan+arm it is right at the edge of the 300s per-query timeout, and a single slow system.parts_columns block-read (across the parts left behind by prior tests in the same DB) is enough to push it over. Cannot write a one-sentence "this PR broke this test because " justification, so per the scoping rule this is reported and not patched.

No code changes made. No commits.


Final report

  • 01161_all_system_tables[unrelated] — timeout on SELECT * FROM system.parts_columns LIMIT 10e3 in debug+ASan+arm. PR diff is scoped to Iceberg/Parquet query condition cache (PR Query condition cache for iceberg tables ClickHouse/ClickHouse#102115 cherry-pick) and touches no MergeTree parts code paths exercised by this query; the test is a known slow long/no-parallel system-tables sweep that lives near its 300s per-query timeout in this shard configuration.

UNRELATED

Triage summary

Test: 01161_all_system_tables

What happened: Test timed out after 300s on SELECT * FROM system.parts_columns LIMIT 10e3, total runtime 446.2s in a debug+ASan arm shard.

This PR's diff: A cherry-pick of upstream PR ClickHouse#102115 — adds query condition cache for Iceberg tables. The changes are confined to:

  • src/Databases/DataLake/*
  • src/Formats/FormatFilterInfo.{cpp,h} and src/Processors/Formats/IInputFormat.{cpp,h}
  • src/Processors/Formats/Impl/ParquetV3BlockInputFormat.{cpp,h}
  • src/Processors/QueryPlan/ReadFromObjectStorageStep.{cpp,h}
  • src/Storages/ObjectStorage/* and Iceberg manifest iterator
  • new integration tests for the cache

None of these touch system.parts_columns, MergeTreeData part enumeration, or anything in the hot path of the failing query. system.parts_columns is a virtual MergeTree-parts table and the changed code only kicks in for Iceberg/Parquet object-storage reads.

01161_all_system_tables is a well-known long-running test (tagged long, no-parallel, no-object-storage, no-distributed-cache); under debug+ASan+arm it is right at the edge of the 300s per-query timeout, and a single slow system.parts_columns block-read (across the parts left behind by prior tests in the same DB) is enough to push it over. Cannot write a one-sentence "this PR broke this test because " justification, so per the scoping rule this is reported and not patched.

No code changes made. No commits.


Final report

  • 01161_all_system_tables[unrelated] — timeout on SELECT * FROM system.parts_columns LIMIT 10e3 in debug+ASan+arm. PR diff is scoped to Iceberg/Parquet query condition cache (PR Query condition cache for iceberg tables ClickHouse/ClickHouse#102115 cherry-pick) and touches no MergeTree parts code paths exercised by this query; the test is a known slow long/no-parallel system-tables sweep that lives near its 300s per-query timeout in this shard configuration.

UNRELATED

⏭️ UNRELATED — Integration tests (amd_msan, 2/6)

1 failed test(s) considered — cost $0.6697
full report

AI narration

The PR's actual diff (vs its true parent 59bfcc082dc) is entirely about Iceberg/Parquet query condition cache. It does not touch src/Common/Scheduler/ or test_scheduler_cpu_preemptive/test.py. The failing test exercises CPU lease allocation fairness — an unrelated subsystem — and is already annotated as failing on 2 other tracked PRs (strong UNRELATED signal). The failure mode (timeout in thread.join()) is a classic scheduler-test flake, not something an Iceberg/QCC patch could plausibly cause.

No code changes to make.


Summary

  • test_scheduler_cpu_preemptive/test.py::test_cpu_time_fairness[fixed_longer_prd][unrelated] — PR Antalya 26.3: Query condition cache for iceberg tables #1804 is a cherry-pick of ClickHouse#102115 (Iceberg query condition cache). Its diff vs parent touches only Iceberg/Parquet/ObjectStorage code and test_query_condition_cache.py / test_storage_s3/test.py; nothing in src/Common/Scheduler/ or in this test. The CPULeaseAllocation changes on the branch came from an earlier backport (a44e12daa09), not this PR. Test is flagged flaky-elsewhere on 2 other tracked PRs, and the failure mode (900s pytest-timeout in development.stop()thread.join()) is a generic scheduler-test hang unrelated to QCC.

UNRELATED
The PR's actual diff (vs its true parent 59bfcc082dc) is entirely about Iceberg/Parquet query condition cache. It does not touch src/Common/Scheduler/ or test_scheduler_cpu_preemptive/test.py. The failing test exercises CPU lease allocation fairness — an unrelated subsystem — and is already annotated as failing on 2 other tracked PRs (strong UNRELATED signal). The failure mode (timeout in thread.join()) is a classic scheduler-test flake, not something an Iceberg/QCC patch could plausibly cause.

No code changes to make.


Summary

  • test_scheduler_cpu_preemptive/test.py::test_cpu_time_fairness[fixed_longer_prd][unrelated] — PR Antalya 26.3: Query condition cache for iceberg tables #1804 is a cherry-pick of ClickHouse#102115 (Iceberg query condition cache). Its diff vs parent touches only Iceberg/Parquet/ObjectStorage code and test_query_condition_cache.py / test_storage_s3/test.py; nothing in src/Common/Scheduler/ or in this test. The CPULeaseAllocation changes on the branch came from an earlier backport (a44e12daa09), not this PR. Test is flagged flaky-elsewhere on 2 other tracked PRs, and the failure mode (900s pytest-timeout in development.stop()thread.join()) is a generic scheduler-test hang unrelated to QCC.

UNRELATED

❌ UNRESOLVED — Integration tests (amd_tsan, 2/6)

4 failed test(s) considered
full report

AI narration

I found the root cause: ThreadSanitizer detected a lock-order-inversion (potential deadlock) in TokenAccessStorage that was introduced by this PR. This is CAUSED-BY-THIS-PR. Let me look at the code.
Found the root cause. The TSan report identifies a lock-order-inversion between two mutexes:

  • M0 = AccessChangesNotifier mutex
  • M1 = TokenAccessStorage::mutex

Thread T5 (CREATE ROLE → notifier drain): acquires M0 → M1 (AccessChangesNotifier::sendNotifications at line 89, then calls into TokenAccessStorage::processRoleChange at line 338).

Thread T6 (JWT authentication): acquires M1 → M0 (TokenAccessStorage::authenticateImpl at line 571, then calls sendNotifications at line 759 which takes M0).

The newly-added sendNotifications call at line 759 (inside authenticateImpl, while holding the storage mutex) is what creates the inversion. The fix is to drop the storage mutex before draining the notifier.

Let me look at AccessChangesNotifier to be sure of the lock interaction:
Confirmed: AccessChangesNotifier::sendNotifications holds its sending_notifications mutex (M0) while dispatching handlers; one of those handlers is TokenAccessStorage::processRoleChange which takes M1. Meanwhile authenticateImpl takes M1 and then calls sendNotifications which takes M0. Classic lock-order inversion.

The fix: release the storage mutex before draining the notifier.
Now let me run the build:
Build succeeded. Now run the 4 failing tests as a batch.
[runner] error streaming claude output: closing tag '[/home/zvonand/releases/ant263/build/programs/clickhouse]' at position 13285 doesn't match any open tag


🤖 Posted automatically by releasy analyze-fails. Re-run the command to refresh.

@zvonand
Copy link
Copy Markdown
Member Author

zvonand commented May 18, 2026

the failing Token-auth test is not related to this one, will be fixed in another PR (#1799)

@zvonand zvonand merged commit 5779b86 into antalya-26.3 May 18, 2026
539 of 580 checks passed
@zvonand zvonand added the verified Approved for release label May 18, 2026
@zvonand zvonand added the port-antalya PRs to be ported to all new Antalya releases label May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-resolved Port conflict auto-resolved by Claude antalya antalya-26.3 backport Backport port-antalya PRs to be ported to all new Antalya releases releasy Created/managed by RelEasy verified Approved for release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants