Skip to content

feat(rust/sedona): Auto-configure spilled batch in-memory size threshold based on global memory limit#680

Merged
paleolimbot merged 2 commits intoapache:mainfrom
Kontinuation:auto-configure-spill-batch-size-threshold
Mar 4, 2026
Merged

feat(rust/sedona): Auto-configure spilled batch in-memory size threshold based on global memory limit#680
paleolimbot merged 2 commits intoapache:mainfrom
Kontinuation:auto-configure-spill-batch-size-threshold

Conversation

@Kontinuation
Copy link
Member

@Kontinuation Kontinuation commented Mar 3, 2026

Summary

  • Automatically configure the spilled_batch_in_memory_size_threshold based on the global memory limit, setting it to 5% of the per-partition memory limit (with a minimum of 10MB).
  • This is a by-product of writing documentation for configuring memory management and spilling (docs: add memory management and spill configuration guide #679). We decided to configure this option automatically instead of requiring users to set it up manually based on the global memory limit.

Tests

We've tested running spatial join queries in SpatialBench SF=10 and 100, and also some spatial joins involving OvertureMaps datasets (address, building, landuse), and found that this default value worked pretty well. Users can still override the default value using SET sedona.spatial_join.spilled_batch_in_memory_size_threshold = <custom_value>.

@Kontinuation Kontinuation requested a review from Copilot March 4, 2026 05:59
@Kontinuation Kontinuation marked this pull request as ready for review March 4, 2026 06:02
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Automatically derives Sedona’s spatial-join spill batching threshold from the configured DataFusion global memory limit, to reduce manual tuning when enabling spilling/memory limits.

Changes:

  • Reads RuntimeEnv memory pool limit and computes a per-partition limit using target_partitions.
  • Sets sedona.spatial_join.spilled_batch_in_memory_size_threshold to 5% of per-partition memory (min 10MB) when the memory limit is finite.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@paleolimbot
Copy link
Member

Merging so I can do a few tests!

@paleolimbot paleolimbot merged commit 4ea183f into apache:main Mar 4, 2026
17 checks passed
paleolimbot pushed a commit that referenced this pull request Mar 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants