Skip to content

Conversation

@Kontinuation
Copy link
Member

This refactor makes BuildSideBatchesCollector and SpatialJoinStream work with streams of EvaluatedBatches instead of directly with streams of RecordBatches. We will have EvaluateBatches read directly by spill readers so adding this layer of abstraction make the main part of spatial join care less about whether the stream is directly from the source of read from spill files.

The stream for the build-side automatically compact batches to avoid holding large sparse binary view arrays in memory. The EvaluateOperandBatchStream performs the batch compaction automatically before evaluating the operands of spatial predicates.

@Kontinuation Kontinuation requested a review from Copilot January 16, 2026 02:17
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the spatial join implementation to work with EvaluatedBatch streams instead of RecordBatch streams. The changes introduce a new EvaluateOperandBatchStream that evaluates spatial predicate operands and automatically compacts batches to optimize memory usage.

Changes:

  • Introduces compact_batch and compact_array functions to reorganize payload buffers in view arrays
  • Creates EvaluateOperandBatchStream to evaluate geometry expressions and produce EvaluatedBatches
  • Updates BuildSideBatchesCollector and SpatialJoinStream to consume EvaluatedBatch streams

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
rust/sedona-spatial-join/src/utils/arrow_utils.rs Adds compaction utilities for view arrays to reduce memory usage
rust/sedona-spatial-join/src/stream.rs Updates probe stream to use evaluated batches
rust/sedona-spatial-join/src/index/build_side_collector.rs Refactors to consume evaluated batch streams and adds concurrent/sequential collection modes
rust/sedona-spatial-join/src/evaluated_batch/evaluated_batch_stream/in_mem.rs Adds schema field to in-memory stream
rust/sedona-spatial-join/src/evaluated_batch/evaluated_batch_stream/evaluate.rs New file implementing operand evaluation stream
rust/sedona-spatial-join/src/evaluated_batch/evaluated_batch_stream.rs Adds schema method to trait
rust/sedona-spatial-join/src/build_index.rs Simplifies partition collection logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Kontinuation Kontinuation force-pushed the evaluated-batch-streams branch from 2a08105 to 705bf1b Compare January 16, 2026 02:48
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Comment on lines +113 to +117
if let Some(list_array) = array.as_any().downcast_ref::<ListArray>() {
let (new_values, mutated) = compact_array(list_array.values().clone())?;
if !mutated {
return Ok((array, false));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a test for this and the below. Perhaps we could test these branches here or remove them and include this code as a suggested future improvement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added several test cases to cover the optimized case.

@Kontinuation Kontinuation merged commit f251ab4 into apache:main Jan 16, 2026
15 checks passed
@paleolimbot paleolimbot added this to the 0.3.0 milestone Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants