-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-16331] [SQL] Reduce code generation time #14000
Conversation
ok to test |
case nonChild: AnyRef => nonChild | ||
case null => null | ||
if (changed) makeCopy(newArgs) else this | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a small style nit:
} else {
this
}
LGTM other than the small nit. The actual diff is very small: https://github.com/apache/spark/pull/14000/files?w=1 |
Test build #61567 has finished for PR 14000 at commit
|
I fixed the coding style issue. |
Test build #61586 has finished for PR 14000 at commit
|
LGTM. good catch! |
Merging in master. Thanks. |
What changes were proposed in this pull request?
During the code generation, a
LocalRelation
often has a hugeVector
object asdata
. In the simple example below, aLocalRelation
has a Vector with 1000000 elements ofUnsafeRow
.At
TreeNode.transformChildren
, all elements of the vector is unnecessarily iterated to check whether any children exist in the vector sinceVector
is Traversable. This part significantly increases code generation time.This patch avoids this overhead by checking the number of children before iterating all elements;
LocalRelation
does not have children since it extendsLeafNode
.The performance of the above example
How was this patch tested?
using existing unit tests