[SPARK-15742][SQL] Reduce temp collections allocations in TreeNode transform methods #13484

JoshRosen · 2016-06-03T00:06:03Z

In Catalyst's TreeNode transform methods we end up calling productIterator.map(...).toArray in a number of places, which is slightly inefficient because it needs to allocate an ArrayBuilder and grow a temporary array. Since we already know the size of the final output (productArity), we can simply allocate an array up-front and use a while loop to consume the iterator and populate the array.

For most workloads, this performance difference is negligible but it does make a measurable difference in optimizer performance for queries that operate over very wide schemas (such as the benchmark queries in #13456).

Perf results (from #13456 benchmarks)

Before

Java HotSpot(TM) 64-Bit Server VM 1.8.0_66-b17 on Mac OS X 10.10.5
Intel(R) Core(TM) i7-4960HQ CPU @ 2.60GHz

parsing large select:                    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
1 select expressions                            19 /   22          0.0    19119858.0       1.0X
10 select expressions                           23 /   25          0.0    23208774.0       0.8X
100 select expressions                          55 /   73          0.0    54768402.0       0.3X
1000 select expressions                        229 /  259          0.0   228606373.0       0.1X
2500 select expressions                        530 /  554          0.0   529938178.0       0.0X

After

parsing large select:                    Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
------------------------------------------------------------------------------------------------
1 select expressions                            15 /   21          0.0    14978203.0       1.0X
10 select expressions                           22 /   27          0.0    22492262.0       0.7X
100 select expressions                          48 /   64          0.0    48449834.0       0.3X
1000 select expressions                        189 /  208          0.0   189346428.0       0.1X
2500 select expressions                        429 /  449          0.0   428943897.0       0.0X

marmbrus · 2016-06-03T00:10:05Z

LGTM

How big of a difference in the benchmarks?

JoshRosen · 2016-06-03T00:15:34Z

About 20%, give or take.

rxin · 2016-06-03T00:15:51Z

Can you include some perf data in the description?

JoshRosen · 2016-06-03T01:13:01Z

Updated.

SparkQA · 2016-06-03T02:26:00Z

Test build #59901 has finished for PR 13484 at commit da79ec6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2016-06-03T03:26:14Z

Want to do more benchmarking to investigate something so don't merge yet.
On Thu, Jun 2, 2016 at 7:27 PM UCB AMPLab notifications@github.com wrote:

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59901/
Test PASSed.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#13484 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AADGPL8RubvbrRbIK6eH0tTW0I6_-jtdks5qH5D8gaJpZM4ItGFp
.

JoshRosen · 2016-06-03T18:17:48Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala

+    val arr = Array.ofDim[B](productArity)
+    var i = 0
+    while (i < arr.length) {
+      arr(i) = f(productElement(i))


It turns out that ProductIterator was maintaining its own internal counter and was calling productElement(i):

/** An iterator over all the elements of this product. * @return in the default implementation, an `Iterator[Any]` */ def productIterator: Iterator[Any] = new scala.collection.AbstractIterator[Any] { private var c: Int = 0 private val cmax = productArity def hasNext = c < cmax def next() = { val result = productElement(c); c += 1; result } }

Calling productElement ourselves here avoids another layer of object allocation and buys a bit of additional perf. boost.

ericl · 2016-06-03T18:47:47Z

LGTM too, probably this will be even more of a improvement once we get the other bottlenecks fixed

SparkQA · 2016-06-03T20:22:52Z

Test build #59946 has finished for PR 13484 at commit b3e63bb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2016-06-03T20:36:19Z

LGTM2

JoshRosen · 2016-06-03T20:51:13Z

Alright, I'm going to merge this into master and 2.0.

…ansform methods In Catalyst's TreeNode transform methods we end up calling `productIterator.map(...).toArray` in a number of places, which is slightly inefficient because it needs to allocate an `ArrayBuilder` and grow a temporary array. Since we already know the size of the final output (`productArity`), we can simply allocate an array up-front and use a while loop to consume the iterator and populate the array. For most workloads, this performance difference is negligible but it does make a measurable difference in optimizer performance for queries that operate over very wide schemas (such as the benchmark queries in #13456). ### Perf results (from #13456 benchmarks) **Before** ``` Java HotSpot(TM) 64-Bit Server VM 1.8.0_66-b17 on Mac OS X 10.10.5 Intel(R) Core(TM) i7-4960HQ CPU 2.60GHz parsing large select: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ 1 select expressions 19 / 22 0.0 19119858.0 1.0X 10 select expressions 23 / 25 0.0 23208774.0 0.8X 100 select expressions 55 / 73 0.0 54768402.0 0.3X 1000 select expressions 229 / 259 0.0 228606373.0 0.1X 2500 select expressions 530 / 554 0.0 529938178.0 0.0X ``` **After** ``` parsing large select: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ 1 select expressions 15 / 21 0.0 14978203.0 1.0X 10 select expressions 22 / 27 0.0 22492262.0 0.7X 100 select expressions 48 / 64 0.0 48449834.0 0.3X 1000 select expressions 189 / 208 0.0 189346428.0 0.1X 2500 select expressions 429 / 449 0.0 428943897.0 0.0X ``` ### Author: Josh Rosen <joshrosen@databricks.com> Closes #13484 from JoshRosen/treenode-productiterator-map. (cherry picked from commit e526913) Signed-off-by: Josh Rosen <joshrosen@databricks.com>

JoshRosen added 3 commits June 3, 2016 10:51

Improve performance of tree transformation.

b28be47

Add comment.

9cfea43

Use productElement() instead of productIterator().

b3e63bb

JoshRosen force-pushed the treenode-productiterator-map branch from da79ec6 to b3e63bb Compare June 3, 2016 18:12

JoshRosen reviewed Jun 3, 2016
View reviewed changes

asfgit closed this in e526913 Jun 3, 2016

JoshRosen deleted the treenode-productiterator-map branch June 3, 2016 21:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-15742][SQL] Reduce temp collections allocations in TreeNode transform methods #13484

[SPARK-15742][SQL] Reduce temp collections allocations in TreeNode transform methods #13484

JoshRosen commented Jun 3, 2016 •

edited

marmbrus commented Jun 3, 2016

JoshRosen commented Jun 3, 2016

rxin commented Jun 3, 2016

JoshRosen commented Jun 3, 2016

SparkQA commented Jun 3, 2016

JoshRosen commented Jun 3, 2016

JoshRosen Jun 3, 2016

ericl commented Jun 3, 2016

SparkQA commented Jun 3, 2016

andrewor14 commented Jun 3, 2016

JoshRosen commented Jun 3, 2016

[SPARK-15742][SQL] Reduce temp collections allocations in TreeNode transform methods #13484

[SPARK-15742][SQL] Reduce temp collections allocations in TreeNode transform methods #13484

Conversation

JoshRosen commented Jun 3, 2016 • edited

Perf results (from #13456 benchmarks)

marmbrus commented Jun 3, 2016

JoshRosen commented Jun 3, 2016

rxin commented Jun 3, 2016

JoshRosen commented Jun 3, 2016

SparkQA commented Jun 3, 2016

JoshRosen commented Jun 3, 2016

JoshRosen Jun 3, 2016

Choose a reason for hiding this comment

ericl commented Jun 3, 2016

SparkQA commented Jun 3, 2016

andrewor14 commented Jun 3, 2016

JoshRosen commented Jun 3, 2016

JoshRosen commented Jun 3, 2016 •

edited