-
Notifications
You must be signed in to change notification settings - Fork 358
Description
Is your feature request related to a problem or challenge?
The current handling of equality deletes is done inside the file reader layer #630, where the delete keys are loaded and filtered row-by-row. However, this approach doesn’t support spilling to disk. While it’s possible to implement spill logic directly in the reader, but another direction is to use semi join in datafusion and we can utilize the spill disk ability of it.
Another potential benefit may be:
- Unified resource control (e.g. compute thread), as we move the filter compute to a join operator, the resource of this part will be control by datafusion compute engine
- Potential for physical plan optimization: By expressing equality deletes as part of the physical plan (e.g., via a semi join), it opens the door for cost-based optimizations, join reordering, and operator fusion.
I plan to conduct more performance experiments to evaluate whether this approach is worthwhile in practice, especially under large datasets and memory pressure and welcome any suggestion for this idea.
Describe the solution you'd like
When we call scan for TableProvider(convert the logical plan to physical plan), we return a execution plan like above: a semi hash join connect:
- equality delete scan
- data file with position delete file scan
The semi-hash join will only return the batches from the right side that do not have matches in the left side. We partition the files based on the parallelism level set by DataFusion. The rule is to first divide the file scan tasks according to the configured parallelism, and then assign the equality delete files to each partition based on the corresponding file scan tasks. This ensures that all relevant equality delete files are included in the same partition as the data they affect.
Willingness to contribute
Yes