Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-16331] [SQL] Reduce code generation time #14000

Closed
wants to merge 3 commits into from

Conversation

inouehrs
Copy link
Contributor

What changes were proposed in this pull request?

During the code generation, a LocalRelation often has a huge Vector object as data. In the simple example below, a LocalRelation has a Vector with 1000000 elements of UnsafeRow.

val numRows = 1000000
val ds = (1 to numRows).toDS().persist()
benchmark.addCase("filter+reduce") { iter =>
  ds.filter(a => (a & 1) == 0).reduce(_ + _)
}

At TreeNode.transformChildren, all elements of the vector is unnecessarily iterated to check whether any children exist in the vector since Vector is Traversable. This part significantly increases code generation time.

This patch avoids this overhead by checking the number of children before iterating all elements; LocalRelation does not have children since it extends LeafNode.

The performance of the above example

without this patch
Java HotSpot(TM) 64-Bit Server VM 1.8.0_91-b14 on Mac OS X 10.11.5
Intel(R) Core(TM) i5-5257U CPU @ 2.70GHz
compilationTime:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
filter+reduce                                 4426 / 4533          0.2        4426.0       1.0X

with this patch
compilationTime:                         Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
filter+reduce                                 3117 / 3391          0.3        3116.6       1.0X

How was this patch tested?

using existing unit tests

@marmbrus
Copy link
Contributor

ok to test

case nonChild: AnyRef => nonChild
case null => null
if (changed) makeCopy(newArgs) else this
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a small style nit:

} else {
  this
}

@rxin
Copy link
Contributor

rxin commented Jun 30, 2016

LGTM other than the small nit.

The actual diff is very small: https://github.com/apache/spark/pull/14000/files?w=1

@SparkQA
Copy link

SparkQA commented Jun 30, 2016

Test build #61567 has finished for PR 14000 at commit 153e170.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@inouehrs
Copy link
Contributor Author

inouehrs commented Jul 1, 2016

I fixed the coding style issue.

@SparkQA
Copy link

SparkQA commented Jul 1, 2016

Test build #61586 has finished for PR 14000 at commit 639090d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member

viirya commented Jul 1, 2016

LGTM. good catch!

@rxin
Copy link
Contributor

rxin commented Jul 1, 2016

Merging in master. Thanks.

@asfgit asfgit closed this in 14cf61e Jul 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants