Disable memory-limited NLJ fallback for left-emitting joins with multi-partition right inputs#22671
Draft
kosiew wants to merge 3 commits into
Draft
Disable memory-limited NLJ fallback for left-emitting joins with multi-partition right inputs#22671kosiew wants to merge 3 commits into
kosiew wants to merge 3 commits into
Conversation
…regression tests - Disabled NLJ spill fallback when right_partition_count > 1 and when the join requires final left-side emission (Left, LeftSemi, LeftAnti, LeftMark, Full). - Added rationale comment to clarify the changes. - Introduced regression tests for: - Multi-partition LEFT join with tight memory resulting in OOM. - Single-partition LEFT join with tight memory which still spills but succeeds. - Updated overalloc test join scopes accordingly.
- Added private test helper `assert_multi_partition_join_oom` - Replaced duplicate OOM assertion blocks - Renamed `_columns` and `_batches` to `_`
… and improved caller parameters - Enhanced assert_multi_partition_join_oom to create a fresh constrained TaskContext. - Updated callers to pass memory_limit and batch_size parameters. - Removed duplicate context setup in tests for better clarity and efficiency.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
The memory-limited NestedLoopJoin fallback path can produce incorrect results for joins that require final left-side emission when the right side has multiple partitions.
The fallback implementation tracks match state locally for each probe partition. Without global coordination across all right partitions, a partition may incorrectly emit unmatched, semi, anti, or mark results for left rows that actually match in another partition. This can lead to incorrect query results under memory pressure.
To preserve correctness, this change disables the memory-limited fallback for join types that depend on final left-side emission when the right side is partitioned. In these cases, the operator will continue to fail with resource exhaustion rather than produce incorrect output.
What changes are included in this PR?
Generalize the existing FULL JOIN fallback guard to cover all joins that require final left-side emission when
right_partition_count > 1.Replace the previous
full_join_multi_partitioncheck with aleft_final_multi_partitioncheck based onneed_produce_result_in_final(join_type).Update comments to document why memory-limited fallback is unsafe for these join types without cross-partition left-match tracking.
Adjust memory-limited join tests so that:
LEFT,LEFT SEMI,LEFT ANTI,LEFT MARK, andFULLjoins verify that fallback remains disabled and returns a resource exhaustion error under tight memory limits.Add a dedicated regression test:
test_nlj_memory_limited_left_join_multi_partition_fallback_disabledAre these changes tested?
Yes.
The following tests were added or updated:
Added:
test_nlj_memory_limited_left_join_multi_partition_fallback_disabledassert_multi_partition_join_oomUpdated memory-limited fallback coverage to verify:
Inner,Right,RightSemi,RightAnti,RightMark).Left,LeftSemi,LeftAnti,LeftMark,Full).Are there any user-facing changes?
Yes.
Under memory pressure, NestedLoopJoin will no longer attempt the memory-limited fallback path for joins that require final left-side emission when the right side has multiple partitions. Instead of potentially returning incorrect results, these queries will fail with a resource exhaustion error until cross-partition coordination is implemented.
This change prioritizes correctness and ensures join results do not vary based on memory availability.
LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested.