-
Notifications
You must be signed in to change notification settings - Fork 43
chore(rust/sedona-spatial-join): Support right mark joins and enabled tests for all kinds of spatial joins #514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for RightMark joins to the Rust spatial join implementation and enables previously commented-out tests for all spatial join types. The changes align the implementation with DataFusion 50.2.0.
Changes:
- Added RightMark join handling in join utilities to compute mark indices from the build side
- Updated function signatures to accept join_type parameter for proper mark column generation
- Enabled comprehensive spatial join tests for all join types (Semi, Anti, and Mark variants)
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| rust/sedona-spatial-join/src/utils/join_utils.rs | Added RightMark join logic, refactored bitmap building, updated to DataFusion 50.2.0, changed return type of asymmetric_join_output_partitioning |
| rust/sedona-spatial-join/src/stream.rs | Added join_type parameter to build_batch_from_indices calls |
| rust/sedona-spatial-join/src/exec.rs | Updated asymmetric_join_output_partitioning calls with error handling, enabled all join type tests, added mark join test implementation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
This would also benefit from a Python integration/sanity for each join type while you're here.
| pub(crate) fn get_mark_indices<T: ArrowPrimitiveType, R: ArrowPrimitiveType>( | ||
| range: &Range<usize>, | ||
| input_indices: &PrimitiveArray<T>, | ||
| ) -> PrimitiveArray<R> | ||
| where | ||
| NativeAdapter<T>: From<<T as ArrowPrimitiveType>::Native>, | ||
| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would help (me + future llms) to have a comment on what exactly this is doing (outputting an array consisting of all zeroes where some of the values are null?).
Does it need to be generic on Range and the primitive type? (might be much easier to read if one can assume these are just int64s).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have commented this function. I'd like to keep the generic interface as-is, in order to be consistent with the original DataFusion code.
|
Python integration tests were added for SEMI/ANIT/MARK joins in 1051f5a. |
… tests for all kinds of spatial joins (apache#514) This patch updates join_utils.rs to support RightMark join. Spatial join tests for all join types were enabled.
This patch updates join_utils.rs to support RightMark join. Spatial join tests for all join types were enabled.