-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-22263][SQL]Refactor deterministic as lazy value #19478
Conversation
Test build #82664 has finished for PR 19478 at commit
|
@@ -79,7 +79,9 @@ abstract class Expression extends TreeNode[Expression] { | |||
* An example would be `SparkPartitionID` that relies on the partition id returned by TaskContext. | |||
* By default leaf expressions are deterministic as Nil.forall(_.deterministic) returns true. | |||
*/ | |||
def deterministic: Boolean = children.forall(_.deterministic) | |||
lazy val deterministic: Boolean = isDeterministic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt how much time this can save. But why won't just:
lazy val deterministic: Boolean = children.forall(_.deterministic)
I think it is equal to this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@gengliangwang do you have any benchmark that shows that this is a performance bottleneck? |
Test build #82667 has finished for PR 19478 at commit
|
Compare the total optimization time for the TPC-DS queries? |
@viirya @hvanhovell @gatorsmile Thanks, I have attached the performance result in the description in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM cc @JoshRosen
Test build #82698 has finished for PR 19478 at commit
|
Ok. The performance result looks good. LGTM. |
Thanks! Merged to master. |
What changes were proposed in this pull request?
The method
deterministic
is frequently called in optimizer.Refactor
deterministic
as lazy value, in order to avoid redundant computations.How was this patch tested?
Simple benchmark test over TPC-DS queries, run time from query string to optimized plan(continuous 20 runs, and get the average of last 5 results):
Before changes: 12601 ms
After changes: 11993ms
This is 4.8% performance improvement.
Also run test with Unit test.