Equality Delete File Scanning via Join in DataFusion

### Is your feature request related to a problem or challenge?

The current handling of equality deletes is done inside the file reader layer https://github.com/apache/iceberg-rust/issues/630, where the delete keys are loaded and filtered row-by-row. However, this approach doesn’t support spilling to disk. While it’s possible to implement spill logic directly in the reader, but another direction is to use semi join in datafusion and we can utilize the spill disk ability of it. 

Another potential benefit **may be**:
- Unified resource control (e.g. compute thread), as we move the filter compute to a join operator, the resource of this part will be control by datafusion compute engine
- Potential for physical plan optimization: By expressing equality deletes as part of the physical plan (e.g., via a semi join), it opens the door for cost-based optimizations, join reordering, and operator fusion. 

I plan to conduct more performance experiments to evaluate whether this approach is worthwhile in practice, especially under large datasets and memory pressure and welcome any suggestion for this idea.

### Describe the solution you'd like

<img width="569" height="554" alt="Image" src="https://github.com/user-attachments/assets/721064f9-0ac1-4b2c-87fe-95a6201960f9" />

When we call scan for TableProvider(convert the logical plan to physical plan), we return a execution plan like above: a semi hash join connect:
- equality delete scan
- data file with position delete file scan

The semi-hash join will only return the batches from the right side that do not have matches in the left side. We partition the files based on the parallelism level set by DataFusion. The rule is to first divide the file scan tasks according to the configured parallelism, and then assign the equality delete files to each partition based on the corresponding file scan tasks. This ensures that all relevant equality delete files are included in the same partition as the data they affect.

### Willingness to contribute

Yes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Equality Delete File Scanning via Join in DataFusion #1530

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Willingness to contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Equality Delete File Scanning via Join in DataFusion #1530

Description

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions