Antalya 26.3: Resolve problems with paths and compatibility problems with Spark in Azure (v2)#1801
Conversation
…solution in next commit) --- Original cherry-pick message follows: Merge pull request ClickHouse#100420 from ClickHouse/divanik/rerevert_spark_azure_fixes Resolve problems with paths and compatibility problems with Spark in Azure (v2) # Conflicts: # src/Interpreters/IcebergMetadataLog.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergWrites.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/MultipleFileWriter.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/MultipleFileWriter.h # src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/PersistentTableComponents.h # src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.h
…olution in next commit) --- Original cherry-pick message follows: Merge pull request ClickHouse#99127 from murphy-4o/murphy_issue_99030 Support remove_orphan_files for Iceberg tables # Conflicts: # docs/en/sql-reference/table-functions/iceberg.md # src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp # src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.h # src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp
…ot defined in this branch The cherry-pick brought in `SettingsChangesHistory.cpp` entries for 20 settings (such as `output_format_arrow_unsupported_types_as_binary`, `asterisk_include_virtual_columns`, `optimize_truncate_order_by_after_group_by_keys`, ...) whose declarations from upstream were not included. When a query sets `compatibility = '<version>'`, `applyCompatibilitySetting` walks the history and calls `get` on every referenced setting, throwing `UNKNOWN_SETTING` for any that does not exist on this branch. Drops the entries for settings absent from `Settings.cpp`, keeping the entries for settings that are actually present (`allow_iceberg_remove_orphan_files`, `iceberg_orphan_files_older_than_seconds`, `enable_materialized_cte`, `materialize_statistics_on_insert`). Addresses 10 failing test(s) in Fast test on #1801. After this fix the still-failing set shrank from 10 -> 0 for the `compatibility`-driven `UNKNOWN_SETTING` regression. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
RelEasy
|
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
This PR addresses several issues: fixes inconsistent path handling in Iceberg caused by mixed usage of storage paths and metadata paths; enforces that Iceberg tables write down a table location which is either a URL or an absolute path; adds a fallback for counting file sizes in Azure because some ClickHouse readers don't support byte counting after traversal; version-hint.txt is now handled in a manner compatible with Spark; introduces type-level abstractions that make it harder to mix up path types in the future; adds tests for Azure and Local that verify cross-engine interoperability without intermediate uploading/downloading; fixes usage of position deletes, which previously relied on path inference heuristics where that approach is inappropriate (ClickHouse#100420 by @divanik, ClickHouse#99127 by @murphy-4o).
Combined port of 2 PR(s) (group
ClickHouse-ClickHouse-pr-99127). Cherry-picked from ClickHouse#100420, ClickHouse#99127.