[feature](be) Support Iceberg position delete predicates#63799
Merged
Conversation
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Add file-layer DeletePredicate execution for Parquet row positions and wire IcebergTableReader v2 to convert Iceberg position deletes and deletion vectors into file-local deleted row positions. Equality delete files are detected and fail explicitly instead of being silently ignored.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- Added NewParquetReaderTest.DeletePredicateFiltersRowPositions. Local targeted run was attempted but blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.
- Behavior changed: Yes. Iceberg v2 position deletes and deletion vectors are applied by TableReader; equality deletes now return NotSupported until the table-level equality-delete path is implemented.
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Paimon and Iceberg TableReader v2 parsed deletion vector metadata and buffers through separate paths. This change routes both formats through the TableReader delete-file interface and shared deletion-vector read/decode handling, while keeping position/equality delete handling in Iceberg table semantics.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- Added TableReaderTest coverage for Iceberg deletion vector descriptor parsing and multiple deletion-vector rejection.
- Ran git diff --check.
- Attempted build-support/clang-format.sh, blocked because llvm@16 is not installed locally.
- Attempted ./run-be-ut.sh --run --filter=TableReaderTest.IcebergDeletionVectorUsesTableReaderDeleteFileInterface:TableReaderTest.IcebergDeletionVectorRejectsMultipleDeleteFiles, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.
- Behavior changed: No
- Does this need documentation: No
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Paimon and Iceberg deletion vector decoding shared the same length and magic header parsing but used separate functions. This change unifies header validation and dispatches only the bitmap decoding based on the deletion vector format.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- Ran git diff --check.
- Attempted ./run-be-ut.sh --run --filter=TableReaderTest.IcebergDeletionVectorUsesTableReaderDeleteFileInterface:TableReaderTest.IcebergDeletionVectorRejectsMultipleDeleteFiles, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Iceberg position delete and deletion vector both produce deleted row positions, but DeletePredicate planning lived in IcebergTableReader. This change moves row-position DeletePredicate planning into TableReader and lets Iceberg only materialize deleted row positions, so position delete and deletion vector share the same planning path.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- Ran git diff --check.
- Attempted build-support/clang-format.sh be/src/format/reader/table_reader.h be/src/format/table/iceberg_reader_v2.h, blocked because llvm@16 is not installed locally.
- Attempted ./run-be-ut.sh --run --filter=TableReaderTest.IcebergDeletionVectorUsesTableReaderDeleteFileInterface:TableReaderTest.IcebergDeletionVectorRejectsMultipleDeleteFiles:NewParquetReaderTest.DeletePredicateFiltersRowPositions, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Row position can be required by both DeletePredicate evaluation and Iceberg row lineage output. This change makes predicate columns take precedence over non-predicate columns when appending scan columns, so a file column is not scanned through both paths.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- Added TableReaderTest.RowPositionDeletePredicateColumnIsNotRepeatedAsOutputColumn.
- Ran git diff --check.
- Attempted build-support/clang-format.sh be/src/format/reader/table_reader.h be/test/format/reader/table_reader_test.cpp, blocked because llvm@16 is not installed locally.
- Attempted ./run-be-ut.sh --run --filter=TableReaderTest.RowPositionDeletePredicateColumnIsNotRepeatedAsOutputColumn, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: Add an inline comment explaining why Iceberg position delete rows are merged into member storage before wiring them to the common DeletePredicate path. ### Release note None ### Check List (For Author) - Test: No need to test (comment-only change) - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: TableReader deletion handling used one delete-file hook name for deletion vectors while Iceberg position delete rows were collected later in scan request customization. This change splits the hooks into deletion-vector descriptor parsing and position-delete row collection, so TableReader prepares all row-position deletes before common DeletePredicate planning.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- Ran git diff --check.
- Attempted build-support/clang-format.sh for modified files, blocked because llvm@16 is not installed locally.
- Attempted ./run-be-ut.sh --run --filter=TableReaderTest.IcebergDeletionVectorUsesTableReaderDeleteFileInterface:TableReaderTest.IcebergDeletionVectorRejectsMultipleDeleteFiles:TableReaderTest.RowPositionDeletePredicateColumnIsNotRepeatedAsOutputColumn, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Position delete is Iceberg-specific, so TableReader should not expose a generic position-delete collection hook. This change keeps TableReader responsible for deletion vector parsing and common DeletePredicate planning, while IcebergTableReader handles position-delete row collection in its own prepare_split flow.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- Ran git diff --check.
- Attempted build-support/clang-format.sh for modified files, blocked because llvm@16 is not installed locally.
- Attempted ./run-be-ut.sh --run --filter=TableReaderTest.IcebergDeletionVectorUsesTableReaderDeleteFileInterface:TableReaderTest.IcebergDeletionVectorRejectsMultipleDeleteFiles:TableReaderTest.RowPositionDeletePredicateColumnIsNotRepeatedAsOutputColumn, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: The common DeletePredicate planning helper applies to any row-position delete rows, including deletion vectors and Iceberg position deletes. Rename the helper to avoid implying it is position-delete specific. ### Release note None ### Check List (For Author) - Test: No need to test (rename-only change) - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: DeletePredicate used Block::rows() and returned early when the first block column was empty. Parquet filter blocks may have predicate columns materialized at later positions while non-predicate columns are still empty, so DeletePredicate must use its row-position child column size and always append a result column.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- Ran git diff --check.
- Attempted ./run-be-ut.sh --run --filter=NewParquetReaderTest.DeletePredicateFiltersRowPositions, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.
- Attempted build-support/clang-format.sh be/src/format/reader/expr/delete_predicate.cpp, blocked because llvm@16 is not installed locally.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: Remove a trailing whitespace left in DeletePredicate source. ### Release note None ### Check List (For Author) - Test: No need to test (whitespace-only change) - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: ParquetReader can execute DeletePredicate directly from a FileScanRequest test path without opening the expression first. Remove the open-state DCHECK from DeletePredicate::execute so the expression follows the same direct execution contract as its child slot ref.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- Ran git diff --check.
- Attempted ./run-be-ut.sh --run --filter=NewParquetReaderTest.DeletePredicateFiltersRowPositions, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Add ParquetReader coverage for the combined path where a normal query conjunct and DeletePredicate are both present, verifying that row-position delete filtering composes correctly with query predicate selection.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- Added NewParquetReaderTest.QueryPredicateAndDeletePredicateFilterRowPositions.
- Ran git diff --check.
- Attempted ./run-be-ut.sh --run --filter=NewParquetReaderTest.QueryPredicateAndDeletePredicateFilterRowPositions:NewParquetReaderTest.DeletePredicateFiltersRowPositions, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.
- Attempted build-support/clang-format.sh be/test/format/new_parquet/parquet_reader_test.cpp, blocked because llvm@16 is not installed locally.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Make the Iceberg delete file table reader test helper stop immediately when get_block returns an error, so failing reads do not loop on stale eos state.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- git diff --check
- ./run-be-ut.sh --run --filter=TableReaderTest.IcebergTableReaderAppliesDeletionVectorFile:TableReaderTest.IcebergTableReaderAppliesPositionDeleteFile:TableReaderTest.IcebergTableReaderMergesDeletionVectorAndPositionDeleteFiles (blocked: JAVA_HOME points to JDK 11 and JDK_17 is not set)
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Initialize IOContext file reader and cache statistics in Iceberg delete file table reader tests so TracingFileReader can record reads without dereferencing null stats.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- git diff --check
- ./run-be-ut.sh --run --filter=TableReaderTest.IcebergTableReaderAppliesDeletionVectorFile:TableReaderTest.IcebergTableReaderAppliesPositionDeleteFile:TableReaderTest.IcebergTableReaderMergesDeletionVectorAndPositionDeleteFiles (blocked: JAVA_HOME points to JDK 11 and JDK_17 is not set)
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Move non-trivial IcebergTableReader v2 method bodies from the header into iceberg_reader_v2.cpp to reduce header weight and keep implementation details local.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- git diff --check
- build-support/clang-format.sh be/src/format/table/iceberg_reader_v2.h be/src/format/table/iceberg_reader_v2.cpp (blocked: local llvm@16 is not installed)
- ./run-be-ut.sh --run --filter=TableReaderTest.IcebergTableReaderAppliesDeletionVectorFile:TableReaderTest.IcebergTableReaderAppliesPositionDeleteFile:TableReaderTest.IcebergTableReaderMergesDeletionVectorAndPositionDeleteFiles (blocked: JAVA_HOME points to JDK 11 and JDK_17 is not set)
- Behavior changed: No
- Does this need documentation: No
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Add file-layer DeletePredicate execution for Parquet row positions and wire IcebergTableReader v2 to convert Iceberg position deletes and deletion vectors into file-local deleted row positions. Equality delete files are detected and fail explicitly instead of being silently ignored.
Release note
None
Check List (For Author)
Test: Unit Test
Behavior changed: Yes. Iceberg v2 position deletes and deletion vectors are applied by TableReader; equality deletes now return NotSupported until the table-level equality-delete path is implemented.
Does this need documentation: No
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)