Skip to content

Antalya 26.3: Resolve problems with paths and compatibility problems with Spark in Azure (v2)#1801

Open
zvonand wants to merge 5 commits into
antalya-26.3from
feature/antalya-26.3/ClickHouse-ClickHouse-pr-99127
Open

Antalya 26.3: Resolve problems with paths and compatibility problems with Spark in Azure (v2)#1801
zvonand wants to merge 5 commits into
antalya-26.3from
feature/antalya-26.3/ClickHouse-ClickHouse-pr-99127

Conversation

@zvonand
Copy link
Copy Markdown
Collaborator

@zvonand zvonand commented May 15, 2026

Auto-ported prerequisites: RelEasy detected that the requested port depended on PR(s) not yet on the target branch and auto-ported them first (1 PR(s) added). Reviewers: please confirm the prereq scope is appropriate.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

This PR addresses several issues: fixes inconsistent path handling in Iceberg caused by mixed usage of storage paths and metadata paths; enforces that Iceberg tables write down a table location which is either a URL or an absolute path; adds a fallback for counting file sizes in Azure because some ClickHouse readers don't support byte counting after traversal; version-hint.txt is now handled in a manner compatible with Spark; introduces type-level abstractions that make it harder to mix up path types in the future; adds tests for Azure and Local that verify cross-engine interoperability without intermediate uploading/downloading; fixes usage of position deletes, which previously relied on path inference heuristics where that approach is inappropriate (ClickHouse#100420 by @divanik, ClickHouse#99127 by @murphy-4o).

Combined port of 2 PR(s) (group ClickHouse-ClickHouse-pr-99127). Cherry-picked from ClickHouse#100420, ClickHouse#99127.

divanik and others added 4 commits May 15, 2026 20:14
…solution in next commit)

---
Original cherry-pick message follows:

Merge pull request ClickHouse#100420 from ClickHouse/divanik/rerevert_spark_azure_fixes

Resolve problems with paths and compatibility problems with Spark in Azure (v2)

# Conflicts:
#	src/Interpreters/IcebergMetadataLog.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergWrites.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/MultipleFileWriter.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/MultipleFileWriter.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/PersistentTableComponents.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.h
…olution in next commit)

---
Original cherry-pick message follows:

Merge pull request ClickHouse#99127 from murphy-4o/murphy_issue_99030

Support remove_orphan_files for Iceberg tables

# Conflicts:
#	docs/en/sql-reference/table-functions/iceberg.md
#	src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp
@zvonand zvonand added releasy Created/managed by RelEasy ai-resolved Port conflict auto-resolved by Claude auto-prereq-added Combined PR includes auto-added prerequisite PR(s) labels May 15, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

Workflow [PR], commit [79fa51f]

@svb-alt svb-alt added the backport Backport label May 16, 2026
…ot defined in this branch

The cherry-pick brought in `SettingsChangesHistory.cpp` entries for 20 settings
(such as `output_format_arrow_unsupported_types_as_binary`,
`asterisk_include_virtual_columns`, `optimize_truncate_order_by_after_group_by_keys`, ...)
whose declarations from upstream were not included. When a query sets
`compatibility = '<version>'`, `applyCompatibilitySetting` walks the history and
calls `get` on every referenced setting, throwing `UNKNOWN_SETTING` for any that
does not exist on this branch.

Drops the entries for settings absent from `Settings.cpp`, keeping the entries
for settings that are actually present (`allow_iceberg_remove_orphan_files`,
`iceberg_orphan_files_older_than_seconds`, `enable_materialized_cte`,
`materialize_statistics_on_insert`).

Addresses 10 failing test(s) in Fast test on
#1801. After this fix the still-failing
set shrank from 10 -> 0 for the `compatibility`-driven `UNKNOWN_SETTING` regression.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@zvonand
Copy link
Copy Markdown
Collaborator Author

zvonand commented May 17, 2026

RelEasy analyze-fails — PARTIAL

run completed at 2026-05-17T17:05:56Z

  • Head SHA: 218b5905d8 (feature/antalya-26.3/ClickHouse-ClickHouse-pr-99127)
  • Tests considered: 10 across 1 CI shard(s)
  • Outcomes: 0 done · 1 partial · 0 unrelated · 0 unresolved
  • Commits added by AI: 1 (✅ pushed)
  • Anthropic cost: $12.8179

Per-shard outcomes

🟡 PARTIAL — Fast test

10 failed test(s) considered — +1 commit(s) — cost $12.8179
full report

AI narration

…(narration truncated; last 4500 chars)

4_join_algorithm_default[fixed](full.sql` diff against reference is clean)

  • clickhouse-test[fixed] (this is the umbrella runner status — it failed because the underlying tests above failed; with those green, the runner status passes too)

PARTIAL

Note: I couldn't run the praktika Fast test end-to-end locally — fast_test.py requires ccache/sccache (not installed) and triggers its own configure step. I verified each test's queries directly with the freshly built build/programs/clickhouse local binary instead, which exercises the same applyCompatibilitySetting code path as the failing CI invocations. Marked as PARTIAL rather than DONE only because 02933 still has a small portion that depends on a Replicated-DB server I can't stand up locally — but the part of it broken by this PR (the --compatibility='24.6' calls) is verified fixed.

PARTIAL
All done. Summary:

Per-test outcome

All 10 failing tests share the same root cause: the cherry-pick added 20 entries to SettingsChangesHistory.cpp for settings whose actual declarations were never imported. applyCompatibilitySetting walks the history and calls get on each entry, so SET compatibility='<version>' (or --compatibility=<version>) raises UNKNOWN_SETTING on the first dangling name (output_format_arrow_unsupported_types_as_binary).

Fix: drop the 20 dangling entries (kept the 4 entries whose settings do exist: allow_iceberg_remove_orphan_files, iceberg_orphan_files_older_than_seconds, enable_materialized_cte, materialize_statistics_on_insert). Verified SET compatibility=<v> no longer errors for 22.3, 23.3, 23.12, 24.1, 24.6, 24.8, 24.11, and re-running each affected .sql test through clickhouse local reproduces the reference output exactly.

  • 03011_adaptative_timeout_compatibility[fixed] (re-ran query, output matches reference)
  • 03006_mv_deduplication_throw_if_async_insert[fixed] (compatibility='24.1' now succeeds; full .sql diff against reference is clean)
  • 03773_nullable_sparse_join[fixed] (compatibility='23.3' now succeeds; full .sql diff against reference is clean)
  • 02933_replicated_database_forbid_create_as_select[fixed] (the --compatibility='24.6' invocations that triggered UNKNOWN_SETTING in the trace now succeed; remaining test logic depends on a Replicated DB server that I don't have running locally, but the PR-broken signature is gone)
  • 02970_visible_width_behavior[fixed] (full .sql diff against reference is clean)
  • 03243_compatibility_setting_with_alias[fixed] (full .sql diff against reference is clean)
  • 02324_compatibility_setting[fixed] (verified --compatibility=22.3/21.5/... no longer errors and returns expected values)
  • 02325_compatibility_setting_2[fixed] (full .sql diff against reference is clean)
  • 03274_join_algorithm_default[fixed] (full .sql diff against reference is clean)
  • clickhouse-test[fixed] (this is the umbrella runner status — it failed because the underlying tests above failed; with those green, the runner status passes too)

PARTIAL

Note: I couldn't run the praktika Fast test end-to-end locally — fast_test.py requires ccache/sccache (not installed) and triggers its own configure step. I verified each test's queries directly with the freshly built build/programs/clickhouse local binary instead, which exercises the same applyCompatibilitySetting code path as the failing CI invocations. Marked as PARTIAL rather than DONE only because 02933 still has a small portion that depends on a Replicated-DB server I can't stand up locally — but the part of it broken by this PR (the --compatibility='24.6' calls) is verified fixed.

PARTIAL
Monitor timed out — no action needed. The fix is committed (79fa51fe997) and verified by running each failing test's queries directly against the rebuilt binary. Final outcome: PARTIAL (all 10 fixed; 02933 partially verified because its Replicated-DB portion needs a server I can't run locally, but its PR-broken --compatibility='24.6' calls are confirmed working).
Monitor timed out — no action needed. The fix is committed (79fa51fe997) and verified by running each failing test's queries directly against the rebuilt binary. Final outcome: PARTIAL (all 10 fixed; 02933 partially verified because its Replicated-DB portion needs a server I can't run locally, but its PR-broken --compatibility='24.6' calls are confirmed working).


🤖 Posted automatically by releasy analyze-fails. Re-run the command to refresh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-resolved Port conflict auto-resolved by Claude antalya-26.3 auto-prereq-added Combined PR includes auto-added prerequisite PR(s) backport Backport releasy Created/managed by RelEasy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants