Skip to content

Profile events for Iceberg catalogs#1868

Open
ianton-ru wants to merge 2 commits into
antalya-26.3from
fix/datalake-rest-catalog-profile-events
Open

Profile events for Iceberg catalogs#1868
ianton-ru wants to merge 2 commits into
antalya-26.3from
fix/datalake-rest-catalog-profile-events

Conversation

@ianton-ru

Copy link
Copy Markdown

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Profile events for Iceberg catalogs

Documentation entry for user-facing changes

Add several profile events for requests to Iceberg catalogs (REST, Glue, Unity)

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • S3 Export (2h)
  • Swarms (30m)
  • Tiered Storage (2h)

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown

Workflow [PR], commit [a2339bf]

@ianton-ru ianton-ru changed the title Fix/datalake rest catalog profile events Profile events for Iceberg catalogs Jun 4, 2026
@ianton-ru

Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Hooray!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ianton-ru ianton-ru added antalya antalya-26.3 port-antalya PRs to be ported to all new Antalya releases labels Jun 5, 2026
@svb-alt svb-alt added the roadmap Key features and improvements for Antalya project label Jun 7, 2026
@Selfeer

Selfeer commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

AI audit note: This review comment was generated by AI (gpt-5.3-codex).

Audit update for PR #1868 (Profile events for Iceberg catalogs):

Confirmed defects:

  • Low: DataLakeRestCatalogDropTable* is incremented for S3TablesCatalog::dropTable
    • Impact: DataLakeRestCatalogDropTable and DataLakeRestCatalogDropTableMicroseconds include S3 Tables deletions, so REST-catalog observability is polluted and cannot be interpreted as "Iceberg REST only".
    • Anchor: src/Databases/DataLake/S3TablesCatalog.cpp / S3TablesCatalog::dropTable; src/Common/ProfileEvents.cpp / DataLakeRestCatalogDropTable documentation
    • Trigger: Run DROP TABLE against a S3TablesCatalog-backed database.
    • Why defect: The S3 Tables path emits a profile event explicitly documented as "requests to Iceberg REST catalog", creating systematic metric misattribution.
    • Evidence: S3TablesCatalog::dropTable calls ProfileEvents::increment(ProfileEvents::DataLakeRestCatalogDropTable) while the event text says "Number of 'drop table' requests to Iceberg REST catalog."
    • Fix direction (short): Add S3 Tables-specific profile events (e.g. DataLakeS3TablesCatalogDropTable*) or remove REST event emission from the S3 Tables override.
    • Regression test direction (short): Add a profile-events test that executes S3 Tables dropTable and verifies only S3 Tables counters move, while REST counters remain unchanged.

Coverage summary:

  • Scope reviewed: PR diff in src/Common/ProfileEvents.cpp, src/Databases/DataLake/GlueCatalog.cpp, src/Databases/DataLake/RestCatalog.cpp, src/Databases/DataLake/S3TablesCatalog.cpp, and src/Databases/DataLake/UnityCatalog.cpp; audited request entrypoints, response/error branches, and event emission points.
  • Categories failed: Metrics attribution/isolation across catalog implementations (REST vs S3 Tables).
  • Categories passed: Call-path coverage for changed branches; fail-open/fail-closed behavior on HTTP/error paths; exception/partial-update safety (instrumentation-only edits); multithread profile-counter access safety; C++ lifetime/iterator/overflow/UB classes (no new risky mutation in reviewed scope).
  • Assumptions/limits: Static audit of changed code paths only; no runtime execution or CI log validation in this pass.

@Selfeer

Selfeer commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

PR #1868 CI Triage: "Profile events for Iceberg catalogs"

PR: #1868
Title: Profile events for Iceberg catalogs
Branch: fix/datalake-rest-catalog-profile-events
Head SHA: a2339bf021d5781e389141829e8a3f4264d06673
Author: Anton Ivashkin (@ianton-ru)

Files Changed

  • src/Common/ProfileEvents.cpp
  • src/Databases/DataLake/GlueCatalog.cpp
  • src/Databases/DataLake/RestCatalog.cpp
  • src/Databases/DataLake/S3TablesCatalog.cpp
  • src/Databases/DataLake/UnityCatalog.cpp

Summary

Category Count Tests
PR-caused regression 0
Pre-existing flaky 3 00157_cache_dictionary, stress test state pollution, test_sync_replica
Pre-existing crash (unrelated) 1 SIGSEGV in executeQuery.cpp on shardNum() + serialize_query_plan=1

Conclusion: None of the 4 CI failures are caused by this PR. Safe to merge.


Failure-by-Failure Analysis

1. Stateless tests (amd_debug, sequential)00157_cache_dictionary FAIL


2. Stress test (amd_debug)Cannot start clickhouse-server FAIL

Error:

Code: 60. DB::Exception: Table default.test_hybrid_alias_predicate_right does not exist.
Cannot attach table `test_2`.`test_hybrid_alias_predicate` from metadata file
...ENGINE = Hybrid(remote('127.0.0.1:9000', 'test_2', 'test_hybrid_alias_predicate_left'), ...)
  • The PR touches zero Hybrid engine code
  • Verdict: Pre-existing flaky — stress test state pollution from an Altinity-specific Hybrid engine table not cleaned up between runs

3. Stress test (arm_asan, s3)Server died (SIGSEGV) FAIL

Crash location: executeQuery.cpp:1649:73typeid_cast<ASTSelectQuery*>(iast) returned null

Stack trace (abbreviated):

3. executeQuery.cpp:1649:73 - DB::executeQueryImpl (typeid_cast<ASTSelectQuery*>)
4. executeQuery.cpp:2127:11 - DB::executeQuery
5. TCPHandler.cpp:830:68   - DB::TCPHandler::runImpl

Crashing query:

SELECT shardNum() AS _shard_num, __table1.key AS key
FROM test_3.dist_2 AS __table1
ORDER BY shardNum() ASC, __table1.key ASC
-- settings: serialize_query_plan=1, prefer_localhost_replica=0, enable_analyzer=1
  • The PR does not touch executeQuery.cpp, AST processing, or distributed query planning
  • Verdict: Pre-existing bug — NULL pointer in new analyzer path when processing shardNum() over a distributed table with serialize_query_plan=1. Unrelated to Iceberg ProfileEvents.

4. Integration tests (amd_asan, db disk, old analyzer, 4/6)test_sync_replica timeout FAIL

Error:

FAILED test_replicated_database/test.py::test_sync_replica - Failed: Timeout ...
E   Failed: Timeout (>900.0s) from pytest-timeout.
    dummy_node.query("DROP DATABASE test_sync_database SYNC")
  • Replicated database synchronization is completely unrelated to Iceberg catalog ProfileEvents
  • Verdict: Pre-existing flaky test — replicated database DROP SYNC timeouts are a known flaky pattern in the CI

@DimensionWieldr

Copy link
Copy Markdown
Collaborator

Based on CI Triage (above comment), failures are not related to PR. Can be verified once reviewed by dev, given that no other changes are made.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

antalya antalya-26.3 port-antalya PRs to be ported to all new Antalya releases roadmap Key features and improvements for Antalya project

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants