Skip to content

Fix silent ignore of delta_lake_snapshot_version without DeltaKernel#101489

Merged
alexey-milovidov merged 11 commits intoClickHouse:masterfrom
Desel72:fix/issue-100502
Apr 10, 2026
Merged

Fix silent ignore of delta_lake_snapshot_version without DeltaKernel#101489
alexey-milovidov merged 11 commits intoClickHouse:masterfrom
Desel72:fix/issue-100502

Conversation

@Desel72
Copy link
Copy Markdown
Contributor

@Desel72 Desel72 commented Apr 1, 2026

Closes #100502

Previously, delta_lake_snapshot_version and CDF settings (delta_lake_snapshot_start_version / delta_lake_snapshot_end_version) were silently ignored when DeltaKernel was not active (Azure, GCS, or allow_experimental_delta_kernel_rs = 0), causing queries to return the latest table state instead of the requested snapshot with no indication of error.

Now the legacy metadata reader raises UNSUPPORTED_METHOD if any of these settings are set to a non-default value, so users get a clear error instead of wrong data.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Throw an error when delta_lake_snapshot_version or CDF version settings are used without DeltaKernel enabled, instead of silently returning wrong data.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

When using DeltaLake time travel (delta_lake_snapshot_version) or change data feed (delta_lake_snapshot_start_version / delta_lake_snapshot_end_version) with storage backends that do not support DeltaKernel (e.g. Azure, GCS), ClickHouse now raises an UNSUPPORTED_METHOD error instead of silently ignoring the setting. To use these features, use S3 or Local storage with allow_experimental_delta_kernel_rs = 1.

Desel72 added 3 commits April 1, 2026 04:12
The legacy `DeltaLakeMetadataImpl` (used for Azure and non-kernel S3
paths) threw `NOT_IMPLEMENTED` when encountering schema changes in
`_delta_log` metadata entries or checkpoint files. This made
`deltaLakeAzure` fail on tables with schema evolution (e.g., added
columns).

Instead of throwing, adopt the latest schema — matching the behavior
of the DeltaKernel path. Delta log entries are processed in version
order, so the last `metaData` entry always represents the current
table schema. Older data files with fewer columns will return NULLs
for missing columns, which is the expected behavior.

Closes ClickHouse#100438
…ernel

Previously, `delta_lake_snapshot_version` and CDF settings
(`delta_lake_snapshot_start_version` / `delta_lake_snapshot_end_version`)
were silently ignored when DeltaKernel was not active (Azure, GCS, or
`allow_experimental_delta_kernel_rs = 0`), causing queries to return the
latest table state instead of the requested snapshot with no indication
of error.

Now the legacy metadata reader raises `UNSUPPORTED_METHOD` if any of
these settings are set to a non-default value, so users get a clear error
instead of wrong data.

Closes ClickHouse#100502
@Avogar Avogar added the can be tested Allows running workflows for external contributors label Apr 1, 2026
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 1, 2026

Workflow [PR], commit [6a6d55c]

Summary:


AI Review

Summary

This PR fixes a real correctness issue in the legacy Delta Lake metadata path: non-default delta_lake_snapshot_version and CDF settings are now rejected when DeltaKernel is unavailable, instead of being silently ignored and returning unintended data. The implementation is focused, the guard now correctly handles negative non-default values (!= -1), and regression coverage was added for both time-travel and CDF setting usage in the non-kernel path. I did not find additional correctness, safety, or compatibility issues in the current PR head.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Apr 1, 2026
"Reading from files with different schema is not possible "
"({} is different from {})",
file_schema.toString(), current_schema.toString());
LOG_INFO(log, "Schema evolved: {} -> {}", file_schema.toString(), current_schema.toString());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change silently drops the previous NOT_IMPLEMENTED guard and effectively enables schema evolution by replacing file_schema with current_schema.

That is risky here because the rest of this path still assumes a single stable schema (see the comment above processMetadataFile), and this PR does not add coverage for mixed-schema logs/checkpoints. A query may now return inconsistent results instead of raising a clear exception.

Please keep the strict exception in this PR (focused on delta_lake_snapshot_* settings), or add dedicated mixed-schema tests and explicit compatibility handling in both JSON log and checkpoint paths.

@Avogar Avogar self-assigned this Apr 1, 2026
Copy link
Copy Markdown
Member

@Avogar Avogar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR also contains commit from #101446, please, clean commits up

Comment thread tests/queries/0_stateless/04075_delta_lake_snapshot_version_without_kernel.sql Outdated
Comment thread src/Storages/ObjectStorage/DataLakes/DeltaLakeMetadata.cpp Outdated
@alexey-milovidov
Copy link
Copy Markdown
Member

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

…eltaKernel

Change guard from `>= 0` to `!= -1` so that invalid negative values
like `-2` are also rejected instead of being silently ignored.
Add regression test for negative non-default value.
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 10, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 100.00% (20/20) · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov added this pull request to the merge queue Apr 10, 2026
Merged via the queue into ClickHouse:master with commit f4b8323 Apr 10, 2026
163 checks passed
@clickgapai
Copy link
Copy Markdown
Contributor

Hi — this PR may need backporting to 26.3 (LTS), 26.2, 26.1, 25.8 (LTS), but no backport label was found.

Affected code: DeltaLakeMetadata::create in DeltaLakeMetadata.cpp — introduced in 25.8.

Why: delta_lake_snapshot_version was introduced in 25.8 (SettingsChangesHistory.cpp line 362), and delta_lake_snapshot_start_version / delta_lake_snapshot_end_version were introduced in 25.12 (lines 187-188). All supported branches contain at least delta_lake_snapshot_version and exhibit the silent-ignore bug. On 25.8, only the delta_lake_snapshot_version check is relevant; on 26.1+ all three settings need the validation.

If this should be backported, consider adding pr-must-backport or a version-specific label (e.g. v26.3-must-backport). Ignore this if backporting is not applicable.

@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 10, 2026
@Desel72
Copy link
Copy Markdown
Contributor Author

Desel72 commented Apr 10, 2026

Thanks @alexey-milovidov

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DeltaLake: delta_lake_snapshot_version (time travel) silently ignored for Azure — returns wrong data instead of error

5 participants