[SPARK-46653][SQL] Code-gen for full outer sort merge join output row by row#44660
[SPARK-46653][SQL] Code-gen for full outer sort merge join output row by row#44660zml1206 wants to merge 2 commits intoapache:masterfrom
Conversation
|
@cloud-fan Can you help take a look if you have time? Thanks. |
There was a problem hiding this comment.
Why do you need replace for with while?
There was a problem hiding this comment.
Convenient control, go to +1 in different places.
|
cc @cloud-fan @wankunde @ulysses-you do you have any thought about this? Thanks. |
There was a problem hiding this comment.
Could we wrap "$matchRowsInBuffer" as a function? There are too many duplicated code.
There was a problem hiding this comment.
Done, generated code in PR description also updated.
|
I'm sorry where is smj_leftIndex_0 reset to 0 ? |
|
Whenever |
Fix oom by code-gen for full outer sort merge join
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
What changes were proposed in this pull request?
Be consistent with closing code-gen, update code-gen for full outer sort merge join output row by row.
For example:
before this pr, generated code: https://gist.github.com/zml1206/aff18fc313a7164d6f65096a97d233eb
after: https://gist.github.com/zml1206/a27350b8849951e6efac0fb6088e527f
Why are the changes needed?
Avoid oom. When code-gen for full outer sort merge join is enbaled and the parent of
SortMergeJoincannot codegen, full outer sort merge join needs to append the output of the same key toBufferedRowIterator.currentRowswhich type isLinkedList. If there are a large number of duplicate keys, it is likely to cause executor oom.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing UT and local test.
Local mode and driver memory 1G.
Before this pr will oom.
After this pr is ok.
Was this patch authored or co-authored using generative AI tooling?
No.