Skip to content

Change the interface for Iceberg inserts with the catalog#100334

Merged
scanhex12 merged 7 commits intomasterfrom
change_insert_interface
Apr 8, 2026
Merged

Change the interface for Iceberg inserts with the catalog#100334
scanhex12 merged 7 commits intomasterfrom
change_insert_interface

Conversation

@scanhex12
Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Change the interface for Iceberg inserts with the catalog.
Deprecate settings: storage_catalog_type, storage_aws_access_key_id, etc.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Mar 22, 2026

Workflow [PR], commit [b479af7]

Summary:

job_name test_name status info comment
Stateless tests (arm_asan_ubsan, flaky check) failure
00746_sql_fuzzy FAIL cidb
Stress test (arm_tsan) failure
Hung check failed, possible deadlock found FAIL cidb, issue

AI Review

Summary

This PR migrates IcebergS3 catalog integration to reuse the enclosing DataLakeCatalog database catalog and removes old per-table storage_catalog_* settings from examples/tests. I found one additional docs issue and confirmed existing blocker comments from clickhouse-gh[bot] already cover the critical code/test problems (backward compatibility guard mismatch and Python syntax exceptions in integration tests). Verdict is to request changes until those blockers are fixed.

Findings
  • ❌ Blockers
    • [src/Storages/ObjectStorage/DataLakes/DataLakeConfiguration.h:340] Backward-compatibility regression on ATTACH/upgrade path and deprecated-setting validation mismatch are already correctly reported by existing clickhouse-gh[bot] inline comments; they must be fixed before merge.
    • [tests/integration/test_database_iceberg/test.py:154, tests/integration/test_database_glue/test.py:228] Both files contain invalid f-strings (f-string: expecting '}'), which breaks test module import/collection. This is already reported by existing clickhouse-gh[bot] inline comments and must be fixed.
  • ⚠️ Majors
    • [docs/en/sql-reference/table-functions/iceberg.md:83] Docs text says to create IcebergS3 tables with necessary catalog settings, but examples now omit these settings entirely. This is misleading after interface change. Suggested fix: explicitly state that catalog routing now comes from DataLakeCatalog database context and legacy storage_catalog_* table settings are deprecated.
Tests
  • ⚠️ python3 -m py_compile tests/integration/test_database_iceberg/test.py tests/integration/test_database_glue/test.py fails with SyntaxError: f-string: expecting '}' in both modified test files; fix syntax first so integration tests can run.
ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility Deprecated-setting rejection currently impacts old metadata paths (already flagged inline by clickhouse-gh[bot]).
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout ⚠️ Rollout blocked by current backward-compatibility and test-syntax issues.
Compilation time
Final Verdict
  • Status: ⚠️ Request changes
  • Minimum required actions:
    • Fix deprecated-setting validation in DataLakeConfiguration::getCatalog so checks and error text match, and preserve backward compatibility for existing ATTACH/upgraded metadata.
    • Fix invalid Python f-strings in both integration tests so files import and test collection works.
    • Update iceberg docs wording to match the new catalog interface and avoid implying deprecated table-level catalog settings.

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label Mar 22, 2026

#endif
#if USE_AVRO
if ((*settings)[DataLakeStorageSetting::storage_catalog_type].changed || (*settings)[DataLakeStorageSetting::storage_aws_access_key_id].changed)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ The deprecation guard checks storage_aws_access_key_id, but the exception text says storage_catalog_url is deprecated. As written, storage_catalog_url can still be provided without triggering this error, so a deprecated setting is silently ignored.

Please either:

  • check storage_catalog_url here (likely intended), or
  • update both the validation and message so they match the actual deprecated set.


#endif
#if USE_AVRO
if ((*settings)[DataLakeStorageSetting::storage_catalog_type].changed || (*settings)[DataLakeStorageSetting::storage_aws_access_key_id].changed)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ This validation is applied during storage construction for both CREATE and ATTACH paths (via configuration->getCatalog(...)). As a result, tables created in older versions with explicit storage_catalog_* settings can fail to attach after upgrade, which is a backward-incompatible startup/query regression.

Please gate this check so it does not reject existing metadata on ATTACH (or behind compatibility), and enforce it only for new CREATE statements.

@divanik divanik self-assigned this Mar 26, 2026
Comment thread tests/integration/test_database_iceberg/test.py
Comment thread src/Storages/ObjectStorage/DataLakes/DataLakeConfiguration.h Outdated
Comment thread src/Storages/ObjectStorage/DataLakes/DataLakeConfiguration.h Outdated
@scanhex12 scanhex12 force-pushed the change_insert_interface branch from 415d00e to d073d01 Compare April 7, 2026 15:23
Comment thread tests/integration/test_database_iceberg/test.py
@@ -151,23 +153,16 @@ def create_clickhouse_iceberg_database(
def create_clickhouse_iceberg_table(
started_cluster, node, database_name, table_name, schema, additional_settings={}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ This line is a Python syntax error: the f-string expression starts with {" , ...} due to nested double quotes, so the module fails to parse (f-string: expecting '}').

This breaks test collection for this integration suite.

Please build the suffix without nested quote collisions, for example:

settings_suffix = ""
if additional_settings:
    settings_suffix = "SETTINGS " + ",".join(
        k + "=" + repr(v) for k, v in additional_settings.items()
    )


settings.update(additional_settings)
settings_suffix = "" if len(additional_settings) == 0 else f"SETTINGS {",".join((k+"="+repr(v) for k, v in additional_settings.items()))}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❌ Same syntax issue as in test_database_iceberg: this f-string cannot be parsed because of nested double quotes inside the expression (f-string: expecting '}').

As written, the test_database_glue module also fails to import/collect.

Please construct settings_suffix without nested quote collisions (same fix pattern as above).

@scanhex12 scanhex12 force-pushed the change_insert_interface branch from d073d01 to b479af7 Compare April 7, 2026 21:31
'minio_access_key',
'minio_secret_key'
)
SETTINGS
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ The narrative says to create IcebergS3 tables with "necessary settings", but both examples now omit catalog settings entirely.

Please update the surrounding text to match the new interface (for example, explain that catalog routing now comes from the DataLakeCatalog database context) so users do not try obsolete storage_catalog_* settings.

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh Bot commented Apr 8, 2026

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.10% +0.10%
Functions 90.90% 90.90% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 87.18% (34/39) | lost baseline coverage: 12 line(s) · Uncovered code

Full report · Diff report

@scanhex12 scanhex12 enabled auto-merge April 8, 2026 09:05
@scanhex12 scanhex12 added this pull request to the merge queue Apr 8, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 8, 2026
@scanhex12 scanhex12 enabled auto-merge April 8, 2026 12:12
@scanhex12 scanhex12 added this pull request to the merge queue Apr 8, 2026
Merged via the queue into master with commit 4f3766e Apr 8, 2026
160 of 163 checks passed
@scanhex12 scanhex12 deleted the change_insert_interface branch April 8, 2026 12:32
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 8, 2026
@clickgapai
Copy link
Copy Markdown
Contributor

Hi @scanhex12 @divanik — while reviewing this PR I found the following:

Happy to discuss — close anything that's wrong or already addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants