Skip to content

Iceberg writes: data files min-max pruning for simple types#85746

Merged
scanhex12 merged 21 commits intoClickHouse:masterfrom
scanhex12:iceberg_stats
Aug 21, 2025
Merged

Iceberg writes: data files min-max pruning for simple types#85746
scanhex12 merged 21 commits intoClickHouse:masterfrom
scanhex12:iceberg_stats

Conversation

@scanhex12
Copy link
Copy Markdown
Member

@scanhex12 scanhex12 commented Aug 16, 2025

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Write more iceberg statistics (column sizes, lower and upper bounds) in metadata (manifest entries) for min-max pruning

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@scanhex12 scanhex12 marked this pull request as draft August 16, 2025 19:07
@scanhex12 scanhex12 marked this pull request as ready for review August 17, 2025 10:44
@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Aug 17, 2025

Workflow [PR], commit [5adab06]

Summary:

job_name test_name status info comment
Stateless tests (amd_binary, old analyzer, s3 storage, DatabaseReplicated, parallel) failure
03595_alter_drop_column_comment_if_exists FAIL
03573_json_keys_with_dots FAIL
Integration tests (amd_asan, old analyzer, 4/6) failure
test_storage_kafka/test_batch_fast.py::test_kafka_no_holes_when_write_suffix_failed[generate_old_create_table_query] FAIL
Integration tests (arm_binary, distributed plan, 2/4) failure
test_database_delta/test.py::test_complex_table_schema[1] FAIL
test_database_delta/test.py::test_embedded_database_and_tables[0] FAIL
test_database_delta/test.py::test_timestamp_ntz[1] FAIL
test_database_delta/test.py::test_complex_table_schema[0] FAIL
test_database_delta/test.py::test_multiple_schemes_tables FAIL
test_database_delta/test.py::test_timestamp_ntz[0] FAIL
test_database_delta/test.py::test_embedded_database_and_tables[1] FAIL
test_storage_rabbitmq/test_failed_connection.py::test_rabbitmq_restore_failed_connection_without_losses_2 FAIL
Integration tests (amd_asan, flaky check) failure
test_storage_iceberg/test.py::test_filesystem_cache[s3-1-3] FAIL
test_storage_iceberg/test.py::test_metadata_cache[azure-1-3] FAIL
test_storage_iceberg/test.py::test_metadata_cache[azure-2-3] FAIL
test_storage_iceberg/test.py::test_filesystem_cache[s3-2-3] FAIL

@clickhouse-gh clickhouse-gh bot added the pr-improvement Pull request with some product improvements label Aug 17, 2025
@divanik divanik self-assigned this Aug 18, 2025
Copy link
Copy Markdown
Member

@divanik divanik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Waiting for the test which show that added functionality is added correctly
  2. Waiting for the documentation to be written

@scanhex12 scanhex12 changed the title Write more iceberg statistics in metadata Data files pruning for simple types Aug 20, 2025
@scanhex12 scanhex12 changed the title Data files pruning for simple types Data files min-max pruning for simple types Aug 20, 2025
@scanhex12 scanhex12 changed the title Data files min-max pruning for simple types Iceberg writes: data files min-max pruning for simple types Aug 20, 2025
@scanhex12 scanhex12 requested a review from divanik August 20, 2025 16:08
@scanhex12 scanhex12 added pr-feature Pull request with new product feature and removed pr-improvement Pull request with some product improvements labels Aug 20, 2025
@scanhex12 scanhex12 requested a review from divanik August 20, 2025 23:06
@divanik
Copy link
Copy Markdown
Member

divanik commented Aug 21, 2025

Ok, the test seems ok, documentation for Iceberg writes is not even started, so I think it is not sane to block this PR because of unwritten documentation

@scanhex12 scanhex12 enabled auto-merge August 21, 2025 14:31
@scanhex12 scanhex12 added this pull request to the merge queue Aug 21, 2025
Merged via the queue into ClickHouse:master with commit bd6a5a8 Aug 21, 2025
114 of 121 checks passed
@scanhex12 scanhex12 deleted the iceberg_stats branch August 21, 2025 16:30
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Aug 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants