Skip to content

Antalya 26.1: Fix view over iceberg#1671

Open
ianton-ru wants to merge 3 commits intoantalya-26.1from
bugfix/antalya-26.1/view_over_iceberg
Open

Antalya 26.1: Fix view over iceberg#1671
ianton-ru wants to merge 3 commits intoantalya-26.1from
bugfix/antalya-26.1/view_over_iceberg

Conversation

@ianton-ru
Copy link
Copy Markdown

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix incorrect work view over Iceberg table

Documentation entry for user-facing changes

Solved #1669 - do not make null values for complex column names like materialized(time).
Solved #1629 - use ParquetReadRowGroups in test instead of S3GetObject/AzureGetObject.

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 21, 2026

Workflow [PR], commit [2ceb1ab]

@ianton-ru ianton-ru added antalya bugfix port-antalya PRs to be ported to all new Antalya releases antalya-26.1 labels Apr 22, 2026
@ianton-ru ianton-ru changed the title Fix view over iceberg Antalya 26.1: Fix view over iceberg Apr 22, 2026
@ianton-ru
Copy link
Copy Markdown
Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 024385ff6f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


/// With View over Iceberg table we have someting like 'materialize(time)' as column_name
ParserExpression parser;
ASTPtr expr = parseQuery(parser, column_name, 0, DBMS_DEFAULT_MAX_PARSER_DEPTH, DBMS_DEFAULT_MAX_PARSER_BACKTRACKS);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard expression parsing for non-SQL-safe column names

parseQuery(parser, column_name, ...) throws on syntax errors, but column_name here comes from getNameInStorage() (raw storage names), not guaranteed to be a standalone SQL expression. When schema evolution leaves a nullable column absent in some files and that column name requires quoting (for example names with spaces/symbols), this new path raises SYNTAX_ERROR and aborts the read instead of applying the NULL-constant fallback.

Useful? React with 👍 / 👎.

Comment on lines +778 to +779
if (expr->as<ASTFunction>())
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid treating arbitrary parsed expressions as generated columns

Using expr->as<ASTFunction>() as the skip condition is too broad: quoted physical column names like a-b are parsed as expression ASTs, so this branch skips NULL-constant synthesis for genuinely missing nullable columns. In evolved Iceberg schemas, that leaves such columns in requested_columns_copy, and later readers may try to fetch non-existent parquet fields instead of returning NULLs.

Useful? React with 👍 / 👎.

@ianton-ru ianton-ru force-pushed the bugfix/antalya-26.1/view_over_iceberg branch from 6a7740c to 12337b9 Compare April 23, 2026 10:33
@ianton-ru
Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. 👍

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ianton-ru
Copy link
Copy Markdown
Author

PR #1671 — Antalya 26.1: Fix view over Iceberg

AI audit note: This document was generated by AI (gpt-5.3-codex).

Scope: Altinity fork, branch vs antalya-26.1StorageObjectStorageSource.cpp, tests/integration/test_storage_iceberg_with_spark/test_read_constant_columns_optimization.py.


Confirmed defects

No confirmed defects in reviewed scope.

(Earlier revision had orphaned parser includes after dropping parseQuery; the current branch tip removes those includes, and the net diff vs antalya-26.1 adds only the materialize() heuristic in createReader.)


Coverage summary

Item Detail
Scope reviewed Iceberg experimental read optimization path that synthesizes NULL constants for nullable columns missing from per-file metadata, and skips that for names shaped like materialize(...) (view-over-Iceberg). Integration test changes: ParquetReadRowGroups assertions and new view test.
Categories failed
Categories passed No SQL parse of raw storage names (no throw on odd names); no broad “any function AST” skip; includes unchanged vs base in net diff; changed logic uses only local state (no new shared-state / locking concerns in the diff hunk).

Call graph (in-scope)

Entry: StorageObjectStorageSource::createReader
Branch: allow_experimental_iceberg_read_optimization → file metadata present → iterate columns_info (constants / all-null) → second loop over requested_columns_list keyed by getNameInStorage()
Effect: Optional constant_columns / constant_columns_with_valuesrequested_columns_copy.eraseNames → downstream pipe / count fast path.


Residual risks (not confirmed defects)

  1. Heuristic breadth: Only names matching materialize( prefix and ) suffix are skipped. If another computed storage name can appear in the same nullable-missing-metadata situation, wrong NULL synthesis remains theoretically possible; no in-repo reproduction was traced in review.
  2. Tests: Assertions use ParquetReadRowGroups and assume one row group per data file as in the fixture. check_events accepts is_cluster but the SQL does not use it (harmless, slightly misleading API).

Changelog alignment (from PR description)

  • Addresses incorrect behavior for views over Iceberg when complex names like materialize(time) must not be forced to NULL-from-missing-field.
  • Tests: prefer ParquetReadRowGroups over object-store get-object profile events where appropriate.

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-26.1 bugfix port-antalya PRs to be ported to all new Antalya releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants