-
Notifications
You must be signed in to change notification settings - Fork 29.1k
[SPARK-35767] [SQL] Avoid executing child plan twice in CoalesceExec #32920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
ok to test |
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. This is a very old bug introduced by SPARK-22238 at 2.3.0.
Nice catch, @andygrove .
|
cc @cloud-fan |
|
+1 changes lgtm. |
|
Kubernetes integration test starting |
|
Thank you, @andygrove and @tgravescs . |
### What changes were proposed in this pull request? `CoalesceExec` needlessly calls `child.execute` twice when it could just call it once and re-use the results. This only happens when `numPartitions == 1`. ### Why are the changes needed? It is more efficient to execute the child plan once rather than twice. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? There are no functional changes. This is just a performance optimization, so the existing tests should be sufficient to catch any regressions. Closes #32920 from andygrove/coalesce-exec-executes-twice. Authored-by: Andy Grove <andygrove73@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 1012967) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? `CoalesceExec` needlessly calls `child.execute` twice when it could just call it once and re-use the results. This only happens when `numPartitions == 1`. ### Why are the changes needed? It is more efficient to execute the child plan once rather than twice. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? There are no functional changes. This is just a performance optimization, so the existing tests should be sufficient to catch any regressions. Closes #32920 from andygrove/coalesce-exec-executes-twice. Authored-by: Andy Grove <andygrove73@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 1012967) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
|
Kubernetes integration test status success |
|
Test build #139820 has finished for PR 32920 at commit
|
|
late LGTM |
### What changes were proposed in this pull request? `CoalesceExec` needlessly calls `child.execute` twice when it could just call it once and re-use the results. This only happens when `numPartitions == 1`. ### Why are the changes needed? It is more efficient to execute the child plan once rather than twice. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? There are no functional changes. This is just a performance optimization, so the existing tests should be sufficient to catch any regressions. Closes apache#32920 from andygrove/coalesce-exec-executes-twice. Authored-by: Andy Grove <andygrove73@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 1012967) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
### What changes were proposed in this pull request? `CoalesceExec` needlessly calls `child.execute` twice when it could just call it once and re-use the results. This only happens when `numPartitions == 1`. ### Why are the changes needed? It is more efficient to execute the child plan once rather than twice. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? There are no functional changes. This is just a performance optimization, so the existing tests should be sufficient to catch any regressions. Closes apache#32920 from andygrove/coalesce-exec-executes-twice. Authored-by: Andy Grove <andygrove73@gmail.com> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> (cherry picked from commit 1012967) Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
CoalesceExecneedlessly callschild.executetwice when it could just call it once and re-use the results. This only happens whennumPartitions == 1.Why are the changes needed?
It is more efficient to execute the child plan once rather than twice.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
There are no functional changes. This is just a performance optimization, so the existing tests should be sufficient to catch any regressions.