-
Notifications
You must be signed in to change notification settings - Fork 43
chore(rust/sedona-spatial-join): Evaluate spatial predicate operands in EvaluateOperandBatchStream #521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(rust/sedona-spatial-join): Evaluate spatial predicate operands in EvaluateOperandBatchStream #521
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR refactors the spatial join implementation to work with EvaluatedBatch streams instead of RecordBatch streams. The changes introduce a new EvaluateOperandBatchStream that evaluates spatial predicate operands and automatically compacts batches to optimize memory usage.
Changes:
- Introduces
compact_batchandcompact_arrayfunctions to reorganize payload buffers in view arrays - Creates
EvaluateOperandBatchStreamto evaluate geometry expressions and produceEvaluatedBatches - Updates
BuildSideBatchesCollectorandSpatialJoinStreamto consumeEvaluatedBatchstreams
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| rust/sedona-spatial-join/src/utils/arrow_utils.rs | Adds compaction utilities for view arrays to reduce memory usage |
| rust/sedona-spatial-join/src/stream.rs | Updates probe stream to use evaluated batches |
| rust/sedona-spatial-join/src/index/build_side_collector.rs | Refactors to consume evaluated batch streams and adds concurrent/sequential collection modes |
| rust/sedona-spatial-join/src/evaluated_batch/evaluated_batch_stream/in_mem.rs | Adds schema field to in-memory stream |
| rust/sedona-spatial-join/src/evaluated_batch/evaluated_batch_stream/evaluate.rs | New file implementing operand evaluation stream |
| rust/sedona-spatial-join/src/evaluated_batch/evaluated_batch_stream.rs | Adds schema method to trait |
| rust/sedona-spatial-join/src/build_index.rs | Simplifies partition collection logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
rust/sedona-spatial-join/src/evaluated_batch/evaluated_batch_stream/evaluate.rs
Show resolved
Hide resolved
2a08105 to
705bf1b
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
| if let Some(list_array) = array.as_any().downcast_ref::<ListArray>() { | ||
| let (new_values, mutated) = compact_array(list_array.values().clone())?; | ||
| if !mutated { | ||
| return Ok((array, false)); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a test for this and the below. Perhaps we could test these branches here or remove them and include this code as a suggested future improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added several test cases to cover the optimized case.
This refactor makes
BuildSideBatchesCollectorandSpatialJoinStreamwork with streams ofEvaluatedBatches instead of directly with streams ofRecordBatches. We will haveEvaluateBatches read directly by spill readers so adding this layer of abstraction make the main part of spatial join care less about whether the stream is directly from the source of read from spill files.The stream for the build-side automatically compact batches to avoid holding large sparse binary view arrays in memory. The
EvaluateOperandBatchStreamperforms the batch compaction automatically before evaluating the operands of spatial predicates.