Skip to content

Antalya 26.3: Resolve problems with paths and compatibility problems with Spark in Azure (v2)#1812

Open
zvonand wants to merge 5 commits into
antalya-26.3from
feature/antalya-26.3/ClickHouse-ClickHouse-pr-90740
Open

Antalya 26.3: Resolve problems with paths and compatibility problems with Spark in Azure (v2)#1812
zvonand wants to merge 5 commits into
antalya-26.3from
feature/antalya-26.3/ClickHouse-ClickHouse-pr-90740

Conversation

@zvonand
Copy link
Copy Markdown
Member

@zvonand zvonand commented May 19, 2026

Dropped from this backport: the AI dropped these surfaces rather than pulling in a missing prerequisite. Reviewers: confirm each is genuinely optional.

  • src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.cpp — depends on PR #933f564a71e (Extract per-command EXECUTE handlers) not on antalya-26.3
  • src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.h — depends on PR #933f564a71e (Extract per-command EXECUTE handlers) not on antalya-26.3
  • src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.cpp — depends on PR #933f564a71e not on antalya-26.3
  • src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.h — depends on PR #933f564a71e not on antalya-26.3
  • src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.cpp — depends on PR #6a0ed7ff912 (Reuse snapshot traversal) not on antalya-26.3
  • src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.h — depends on PR #6a0ed7ff912 not on antalya-26.3

Auto-ported prerequisites: RelEasy detected that the requested port depended on PR(s) not yet on the target branch and auto-ported them first (1 PR(s) added). Reviewers: please confirm the prereq scope is appropriate.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

This PR addresses several issues: fixes inconsistent path handling in Iceberg caused by mixed usage of storage paths and metadata paths; enforces that Iceberg tables write down a table location which is either a URL or an absolute path; adds a fallback for counting file sizes in Azure because some ClickHouse readers don't support byte counting after traversal; version-hint.txt is now handled in a manner compatible with Spark; introduces type-level abstractions that make it harder to mix up path types in the future; adds tests for Azure and Local that verify cross-engine interoperability without intermediate uploading/downloading; fixes usage of position deletes, which previously relied on path inference heuristics where that approach is inappropriate (ClickHouse#100420 by @divanik, ClickHouse#90740 by @zvonand).

Combined port of 2 PR(s) (group ClickHouse-ClickHouse-pr-90740). Cherry-picked from ClickHouse#100420, ClickHouse#90740.

divanik and others added 4 commits May 19, 2026 22:31
…solution in next commit)

---
Original cherry-pick message follows:

Merge pull request ClickHouse#100420 from ClickHouse/divanik/rerevert_spark_azure_fixes

Resolve problems with paths and compatibility problems with Spark in Azure (v2)

# Conflicts:
#	src/Interpreters/IcebergMetadataLog.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergWrites.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/ManifestFileIterator.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/MultipleFileWriter.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/MultipleFileWriter.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/PersistentTableComponents.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.h
…olution in next commit)

---
Original cherry-pick message follows:

Merge 136b6b2 into a1bf94d

# Conflicts:
#	src/IO/S3/URI.cpp
#	src/IO/S3/URI.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergIterator.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergIterator.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.cpp
#	src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.h
#	src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.h
#	src/Storages/ObjectStorage/StorageObjectStorageSource.cpp
#	src/Storages/ObjectStorage/StorageObjectStorageStableTaskDistributor.cpp
Adapted PR 90740 to antalya-26.3:

- src/IO/S3/URI.{cpp,h}: dropped the `S3UriStyle uri_style` parameter from the
  constructor (S3UriStyle does not exist on antalya-26.3) and kept only the new
  `enable_url_encoding` parameter the PR introduces.
- src/Storages/ObjectStorage/Utils.cpp: removed `S3UriStyle::AUTO` arguments
  from URI ctor calls to match the simplified signature.
- IcebergIterator.{cpp,h}: kept antalya-26.3 `table_schema_id` member/init and
  the `setFileMetaInfo` call alongside the PR's `secondary_storages` member,
  new constructor parameter, and `requires_external_storage` check loops.
- IcebergMetadata.{cpp,h}: added `secondary_storages` member, threaded it
  through `Iceberg::getManifestFile` in the prefetcher; kept antalya-26.3's
  `expire_snapshots` dispatcher (`Iceberg::expireSnapshots` +
  `expireSnapshotsResultToPipe`) unchanged.
- StorageObjectStorageSource.cpp: kept `IcebergDataObjectInfo.h` include along
  with PR's `Utils.h` include; kept antalya-26.3 `getCompressionMethod` getter
  and swarm-mode guard; renamed local `task` to `raw` to match PR's variable
  name used in the iceberg-aware block below. Did not import `.storage_id` for
  virtual columns (not part of PR 90740's diff).
- StorageObjectStorageStableTaskDistributor.cpp: kept antalya-26.3 helper
  `getFileIdentifier` and the rich `getAnyUnprocessedFile` / iceberg
  optimization paths; pulled PR's `getMetadataPathFromObjectInfo`
  fallback into `getFileIdentifier` so all callers benefit.
- Mutations.cpp: added local empty `SecondaryStorages` to
  `collectRetainedFiles` / `collectExpiredFiles` so the new mandatory
  parameter on `getManifestList` / `getManifestFileEntriesHandle` compiles
  without threading external storages into `Iceberg::expireSnapshots`.

Dropped: ExpireSnapshotsExecute.{cpp,h}, RemoveOrphanFilesExecute.{cpp,h},
SnapshotFilesTraversal.{cpp,h} — extracted EXECUTE handlers introduced by
upstream commit 933f564 (and 6a0ed7f) which is not on antalya-26.3.
PR 90740 only modifies these files to thread `secondary_storages`; the
underlying refactor is the dependency, not PR 90740 itself.

Dropped: arrow `executeExpireSnapshots` / `executeRemoveOrphanFiles` dispatch
in `IcebergMetadata::executeCommand` — depends on dropped files above; the
antalya-26.3 `Iceberg::expireSnapshots` path is kept instead.

Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.cpp — depends on PR #933f564a71e (Extract per-command EXECUTE handlers) not on antalya-26.3
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/ExpireSnapshotsExecute.h — depends on PR #933f564a71e (Extract per-command EXECUTE handlers) not on antalya-26.3
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.cpp — depends on PR #933f564a71e not on antalya-26.3
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/RemoveOrphanFilesExecute.h — depends on PR #933f564a71e not on antalya-26.3
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.cpp — depends on PR #6a0ed7ff912 (Reuse snapshot traversal) not on antalya-26.3
Dropped: src/Storages/ObjectStorage/DataLakes/Iceberg/SnapshotFilesTraversal.h — depends on PR #6a0ed7ff912 not on antalya-26.3
Adapted: src/IO/S3/URI.{cpp,h} — dropped S3UriStyle parameter (type missing on antalya-26.3)
Adapted: src/Storages/ObjectStorage/Utils.cpp — removed S3UriStyle::AUTO from URI ctor calls
Adapted: src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp — added local empty SecondaryStorages to compile against new mandatory parameter
Adapted: src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp — kept Iceberg::expireSnapshots path; switched manifest prefetch to pass *secondary_storages
Adapted: src/Storages/ObjectStorage/StorageObjectStorageStableTaskDistributor.cpp — folded PR's getMetadataPathFromObjectInfo into existing getFileIdentifier helper
@zvonand zvonand added releasy Created/managed by RelEasy antalya-26.3 ai-resolved Port conflict auto-resolved by Claude auto-prereq-added Combined PR includes auto-added prerequisite PR(s) labels May 19, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

Workflow [PR], commit [90519de]

zvonand added a commit to Altinity/RelEasy that referenced this pull request May 19, 2026
`commit_cherry_pick_conflict_as_is` and `commit_conflict_markers`
were doing `git add --all` before committing the with-conflict-
markers checkpoint. That sweeps everything in the working tree that
isn't gitignored — and real C++ repos accumulate plenty outside
.gitignore: ClickHouse leaves server runtime data under
`tmp/server_data*/store/<uuid>/<part>/...cmrk2`, build pipelines
spit out generated headers, autosaves, etc.

bug seen 2026-05-19: Altinity/ClickHouse#1812 ended up with
**696 429 additions across 19 683 files** because tmp/server_data*
was tracked-modified at the time of cherry-pick and got swept in.

new helper `_stage_unmerged_paths` uses `git diff --name-only
--diff-filter=U` to stage exactly the conflict-marked files. The
clean parts of the cherry-pick are already staged by git
automatically — only the unmerged paths (whose textual content is
the markers themselves) need explicit staging.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@zvonand zvonand force-pushed the feature/antalya-26.3/ClickHouse-ClickHouse-pr-90740 branch from 5636ee3 to 90519de Compare May 19, 2026 22:45
 cherry-pick

The cherry-pick of PR ClickHouse#90740 (90519de) was applied against an upstream
merge commit that pulled in `SettingsChangesHistory.cpp` entries from many
unrelated PRs. Those PRs were not cherry-picked to antalya-26.3, so the
referenced settings are not declared in `Settings.cpp` / `FormatFactorySettings.h`.

This broke any `SET compatibility = 'X.Y'` whose target version is older than
26.3 — `SettingsImpl::applyCompatibilitySetting` iterates the history and
throws `UNKNOWN_SETTING` (e.g. `optimize_dictget_tuple_element`) before
reaching real settings.

Failing fast tests:
- `02324_compatibility_setting`
- `02325_compatibility_setting_2`
- `02970_visible_width_behavior`
- `03006_mv_deduplication_throw_if_async_insert`
- `03011_adaptative_timeout_compatibility`
- `03243_compatibility_setting_with_alias`
- `03274_join_algorithm_default`
- `03773_nullable_sparse_join`

Kept only the entries whose settings exist on antalya-26.3:
- `object_storage_cluster_join_mode` (pre-existing)
- `output_format_parquet_use_custom_encoder`, `output_format_parquet_version`,
  `output_format_parquet_compliant_nested_types`,
  `input_format_parquet_use_native_reader_v3` (in `FormatFactorySettings.h`)
- `s3_propagate_credentials_to_other_storages` (the one PR ClickHouse#90740 introduces)

CI report:
https://altinity-build-artifacts.s3.amazonaws.com/json.html?PR=1812&sha=90519de4fcfccf62720a84e54fe851a094e696c0&name_0=PR&name_1=Fast%20test

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-resolved Port conflict auto-resolved by Claude antalya-26.3 auto-prereq-added Combined PR includes auto-added prerequisite PR(s) releasy Created/managed by RelEasy

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants