Skip to content

[feature](be) Support Iceberg position delete predicates#63799

Merged
Gabriel39 merged 18 commits into
apache:refact_reader_branchfrom
Gabriel39:refactor_0528
May 28, 2026
Merged

[feature](be) Support Iceberg position delete predicates#63799
Gabriel39 merged 18 commits into
apache:refact_reader_branchfrom
Gabriel39:refactor_0528

Conversation

@Gabriel39
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Add file-layer DeletePredicate execution for Parquet row positions and wire IcebergTableReader v2 to convert Iceberg position deletes and deletion vectors into file-local deleted row positions. Equality delete files are detected and fail explicitly instead of being silently ignored.

Release note

None

Check List (For Author)

  • Test: Unit Test

    • Added NewParquetReaderTest.DeletePredicateFiltersRowPositions. Local targeted run was attempted but blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.
  • Behavior changed: Yes. Iceberg v2 position deletes and deletion vectors are applied by TableReader; equality deletes now return NotSupported until the table-level equality-delete path is implemented.

  • Does this need documentation: No

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Gabriel39 added 2 commits May 28, 2026 14:15
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Add file-layer DeletePredicate execution for Parquet row positions and wire IcebergTableReader v2 to convert Iceberg position deletes and deletion vectors into file-local deleted row positions. Equality delete files are detected and fail explicitly instead of being silently ignored.

### Release note

None

### Check List (For Author)

- Test: Unit Test

    - Added NewParquetReaderTest.DeletePredicateFiltersRowPositions. Local targeted run was attempted but blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.

- Behavior changed: Yes. Iceberg v2 position deletes and deletion vectors are applied by TableReader; equality deletes now return NotSupported until the table-level equality-delete path is implemented.

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Paimon and Iceberg TableReader v2 parsed deletion vector metadata and buffers through separate paths. This change routes both formats through the TableReader delete-file interface and shared deletion-vector read/decode handling, while keeping position/equality delete handling in Iceberg table semantics.

### Release note

None

### Check List (For Author)

- Test: Unit Test

    - Added TableReaderTest coverage for Iceberg deletion vector descriptor parsing and multiple deletion-vector rejection.

    - Ran git diff --check.

    - Attempted build-support/clang-format.sh, blocked because llvm@16 is not installed locally.

    - Attempted ./run-be-ut.sh --run --filter=TableReaderTest.IcebergDeletionVectorUsesTableReaderDeleteFileInterface:TableReaderTest.IcebergDeletionVectorRejectsMultipleDeleteFiles, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.

- Behavior changed: No

- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Gabriel39 added 16 commits May 28, 2026 15:01
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Paimon and Iceberg deletion vector decoding shared the same length and magic header parsing but used separate functions. This change unifies header validation and dispatches only the bitmap decoding based on the deletion vector format.

### Release note

None

### Check List (For Author)

- Test: Unit Test

    - Ran git diff --check.

    - Attempted ./run-be-ut.sh --run --filter=TableReaderTest.IcebergDeletionVectorUsesTableReaderDeleteFileInterface:TableReaderTest.IcebergDeletionVectorRejectsMultipleDeleteFiles, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.

- Behavior changed: No

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Iceberg position delete and deletion vector both produce deleted row positions, but DeletePredicate planning lived in IcebergTableReader. This change moves row-position DeletePredicate planning into TableReader and lets Iceberg only materialize deleted row positions, so position delete and deletion vector share the same planning path.

### Release note

None

### Check List (For Author)

- Test: Unit Test

    - Ran git diff --check.

    - Attempted build-support/clang-format.sh be/src/format/reader/table_reader.h be/src/format/table/iceberg_reader_v2.h, blocked because llvm@16 is not installed locally.

    - Attempted ./run-be-ut.sh --run --filter=TableReaderTest.IcebergDeletionVectorUsesTableReaderDeleteFileInterface:TableReaderTest.IcebergDeletionVectorRejectsMultipleDeleteFiles:NewParquetReaderTest.DeletePredicateFiltersRowPositions, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.

- Behavior changed: No

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Row position can be required by both DeletePredicate evaluation and Iceberg row lineage output. This change makes predicate columns take precedence over non-predicate columns when appending scan columns, so a file column is not scanned through both paths.

### Release note

None

### Check List (For Author)

- Test: Unit Test

    - Added TableReaderTest.RowPositionDeletePredicateColumnIsNotRepeatedAsOutputColumn.

    - Ran git diff --check.

    - Attempted build-support/clang-format.sh be/src/format/reader/table_reader.h be/test/format/reader/table_reader_test.cpp, blocked because llvm@16 is not installed locally.

    - Attempted ./run-be-ut.sh --run --filter=TableReaderTest.RowPositionDeletePredicateColumnIsNotRepeatedAsOutputColumn, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.

- Behavior changed: No

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Add an inline comment explaining why Iceberg position delete rows are merged into member storage before wiring them to the common DeletePredicate path.

### Release note

None

### Check List (For Author)

- Test: No need to test (comment-only change)

- Behavior changed: No

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: TableReader deletion handling used one delete-file hook name for deletion vectors while Iceberg position delete rows were collected later in scan request customization. This change splits the hooks into deletion-vector descriptor parsing and position-delete row collection, so TableReader prepares all row-position deletes before common DeletePredicate planning.

### Release note

None

### Check List (For Author)

- Test: Unit Test

    - Ran git diff --check.

    - Attempted build-support/clang-format.sh for modified files, blocked because llvm@16 is not installed locally.

    - Attempted ./run-be-ut.sh --run --filter=TableReaderTest.IcebergDeletionVectorUsesTableReaderDeleteFileInterface:TableReaderTest.IcebergDeletionVectorRejectsMultipleDeleteFiles:TableReaderTest.RowPositionDeletePredicateColumnIsNotRepeatedAsOutputColumn, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.

- Behavior changed: No

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Position delete is Iceberg-specific, so TableReader should not expose a generic position-delete collection hook. This change keeps TableReader responsible for deletion vector parsing and common DeletePredicate planning, while IcebergTableReader handles position-delete row collection in its own prepare_split flow.

### Release note

None

### Check List (For Author)

- Test: Unit Test

    - Ran git diff --check.

    - Attempted build-support/clang-format.sh for modified files, blocked because llvm@16 is not installed locally.

    - Attempted ./run-be-ut.sh --run --filter=TableReaderTest.IcebergDeletionVectorUsesTableReaderDeleteFileInterface:TableReaderTest.IcebergDeletionVectorRejectsMultipleDeleteFiles:TableReaderTest.RowPositionDeletePredicateColumnIsNotRepeatedAsOutputColumn, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.

- Behavior changed: No

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: The common DeletePredicate planning helper applies to any row-position delete rows, including deletion vectors and Iceberg position deletes. Rename the helper to avoid implying it is position-delete specific.

### Release note

None

### Check List (For Author)

- Test: No need to test (rename-only change)

- Behavior changed: No

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: DeletePredicate used Block::rows() and returned early when the first block column was empty. Parquet filter blocks may have predicate columns materialized at later positions while non-predicate columns are still empty, so DeletePredicate must use its row-position child column size and always append a result column.

### Release note

None

### Check List (For Author)

- Test: Unit Test

    - Ran git diff --check.

    - Attempted ./run-be-ut.sh --run --filter=NewParquetReaderTest.DeletePredicateFiltersRowPositions, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.

    - Attempted build-support/clang-format.sh be/src/format/reader/expr/delete_predicate.cpp, blocked because llvm@16 is not installed locally.

- Behavior changed: No

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Remove a trailing whitespace left in DeletePredicate source.

### Release note

None

### Check List (For Author)

- Test: No need to test (whitespace-only change)

- Behavior changed: No

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: ParquetReader can execute DeletePredicate directly from a FileScanRequest test path without opening the expression first. Remove the open-state DCHECK from DeletePredicate::execute so the expression follows the same direct execution contract as its child slot ref.

### Release note

None

### Check List (For Author)

- Test: Unit Test

    - Ran git diff --check.

    - Attempted ./run-be-ut.sh --run --filter=NewParquetReaderTest.DeletePredicateFiltersRowPositions, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.

- Behavior changed: No

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Add ParquetReader coverage for the combined path where a normal query conjunct and DeletePredicate are both present, verifying that row-position delete filtering composes correctly with query predicate selection.

### Release note

None

### Check List (For Author)

- Test: Unit Test

    - Added NewParquetReaderTest.QueryPredicateAndDeletePredicateFilterRowPositions.

    - Ran git diff --check.

    - Attempted ./run-be-ut.sh --run --filter=NewParquetReaderTest.QueryPredicateAndDeletePredicateFilterRowPositions:NewParquetReaderTest.DeletePredicateFiltersRowPositions, blocked because JAVA_HOME points to JDK 11 and JDK_17 is not set.

    - Attempted build-support/clang-format.sh be/test/format/new_parquet/parquet_reader_test.cpp, blocked because llvm@16 is not installed locally.

- Behavior changed: No

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Make the Iceberg delete file table reader test helper stop immediately when get_block returns an error, so failing reads do not loop on stale eos state.

### Release note

None

### Check List (For Author)

- Test: Unit Test

    - git diff --check

    - ./run-be-ut.sh --run --filter=TableReaderTest.IcebergTableReaderAppliesDeletionVectorFile:TableReaderTest.IcebergTableReaderAppliesPositionDeleteFile:TableReaderTest.IcebergTableReaderMergesDeletionVectorAndPositionDeleteFiles (blocked: JAVA_HOME points to JDK 11 and JDK_17 is not set)

- Behavior changed: No

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Initialize IOContext file reader and cache statistics in Iceberg delete file table reader tests so TracingFileReader can record reads without dereferencing null stats.

### Release note

None

### Check List (For Author)

- Test: Unit Test

    - git diff --check

    - ./run-be-ut.sh --run --filter=TableReaderTest.IcebergTableReaderAppliesDeletionVectorFile:TableReaderTest.IcebergTableReaderAppliesPositionDeleteFile:TableReaderTest.IcebergTableReaderMergesDeletionVectorAndPositionDeleteFiles (blocked: JAVA_HOME points to JDK 11 and JDK_17 is not set)

- Behavior changed: No

- Does this need documentation: No
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary: Move non-trivial IcebergTableReader v2 method bodies from the header into iceberg_reader_v2.cpp to reduce header weight and keep implementation details local.

### Release note

None

### Check List (For Author)

- Test: Unit Test

    - git diff --check

    - build-support/clang-format.sh be/src/format/table/iceberg_reader_v2.h be/src/format/table/iceberg_reader_v2.cpp (blocked: local llvm@16 is not installed)

    - ./run-be-ut.sh --run --filter=TableReaderTest.IcebergTableReaderAppliesDeletionVectorFile:TableReaderTest.IcebergTableReaderAppliesPositionDeleteFile:TableReaderTest.IcebergTableReaderMergesDeletionVectorAndPositionDeleteFiles (blocked: JAVA_HOME points to JDK 11 and JDK_17 is not set)

- Behavior changed: No

- Does this need documentation: No
@Gabriel39 Gabriel39 merged commit 2cfd503 into apache:refact_reader_branch May 28, 2026
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants