-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-15742][SQL] Reduce temp collections allocations in TreeNode transform methods #13484
Conversation
LGTM How big of a difference in the benchmarks? |
About 20%, give or take. |
Can you include some perf data in the description? |
Updated. |
Test build #59901 has finished for PR 13484 at commit
|
Want to do more benchmarking to investigate something so don't merge yet.
|
da79ec6
to
b3e63bb
Compare
val arr = Array.ofDim[B](productArity) | ||
var i = 0 | ||
while (i < arr.length) { | ||
arr(i) = f(productElement(i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that ProductIterator was maintaining its own internal counter and was calling productElement(i)
:
/** An iterator over all the elements of this product.
* @return in the default implementation, an `Iterator[Any]`
*/
def productIterator: Iterator[Any] = new scala.collection.AbstractIterator[Any] {
private var c: Int = 0
private val cmax = productArity
def hasNext = c < cmax
def next() = { val result = productElement(c); c += 1; result }
}
Calling productElement
ourselves here avoids another layer of object allocation and buys a bit of additional perf. boost.
LGTM too, probably this will be even more of a improvement once we get the other bottlenecks fixed |
Test build #59946 has finished for PR 13484 at commit
|
LGTM2 |
Alright, I'm going to merge this into master and 2.0. |
…ansform methods In Catalyst's TreeNode transform methods we end up calling `productIterator.map(...).toArray` in a number of places, which is slightly inefficient because it needs to allocate an `ArrayBuilder` and grow a temporary array. Since we already know the size of the final output (`productArity`), we can simply allocate an array up-front and use a while loop to consume the iterator and populate the array. For most workloads, this performance difference is negligible but it does make a measurable difference in optimizer performance for queries that operate over very wide schemas (such as the benchmark queries in #13456). ### Perf results (from #13456 benchmarks) **Before** ``` Java HotSpot(TM) 64-Bit Server VM 1.8.0_66-b17 on Mac OS X 10.10.5 Intel(R) Core(TM) i7-4960HQ CPU 2.60GHz parsing large select: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ 1 select expressions 19 / 22 0.0 19119858.0 1.0X 10 select expressions 23 / 25 0.0 23208774.0 0.8X 100 select expressions 55 / 73 0.0 54768402.0 0.3X 1000 select expressions 229 / 259 0.0 228606373.0 0.1X 2500 select expressions 530 / 554 0.0 529938178.0 0.0X ``` **After** ``` parsing large select: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ 1 select expressions 15 / 21 0.0 14978203.0 1.0X 10 select expressions 22 / 27 0.0 22492262.0 0.7X 100 select expressions 48 / 64 0.0 48449834.0 0.3X 1000 select expressions 189 / 208 0.0 189346428.0 0.1X 2500 select expressions 429 / 449 0.0 428943897.0 0.0X ``` ### Author: Josh Rosen <joshrosen@databricks.com> Closes #13484 from JoshRosen/treenode-productiterator-map. (cherry picked from commit e526913) Signed-off-by: Josh Rosen <joshrosen@databricks.com>
In Catalyst's TreeNode transform methods we end up calling
productIterator.map(...).toArray
in a number of places, which is slightly inefficient because it needs to allocate anArrayBuilder
and grow a temporary array. Since we already know the size of the final output (productArity
), we can simply allocate an array up-front and use a while loop to consume the iterator and populate the array.For most workloads, this performance difference is negligible but it does make a measurable difference in optimizer performance for queries that operate over very wide schemas (such as the benchmark queries in #13456).
Perf results (from #13456 benchmarks)
Before
After