SPARK-14091 [core] Consider improving performance of SparkContext.get… #11911

rajeshbalamohan · 2016-03-23T05:53:35Z

What changes were proposed in this pull request?

Currently SparkContext.getCallSite() makes a call to Utils.getCallSite().

 private[spark] def getCallSite(): CallSite = {
    val callSite = Utils.getCallSite()
    CallSite(
      Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
      Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)
    )
  }

However, in some places utils.withDummyCallSite(sc) is invoked to avoid expensive threaddumps within getCallSite(). But Utils.getCallSite() is evaluated earlier causing threaddumps to be computed.

This can have severe impact on smaller queries (that finish in 10-20 seconds) having large number of RDDs.

Creating this patch for lazy evaluation of getCallSite.

How was this patch tested?

No new test cases are added. Following standalone test was tried out manually. Also, built entire spark binary and tried with few SQL queries in TPC-DS and TPC-H in multi node cluster

def run(): Unit = {
    val conf = new SparkConf()
    val sc = new SparkContext("local[1]", "test-context", conf)
    val start: Long = System.currentTimeMillis();
    val confBroadcast = sc.broadcast(new SerializableConfiguration(new Configuration()))
    Utils.withDummyCallSite(sc) {
      //Large tables end up creating 5500 RDDs
      for(i <- 1 to 5000) {
       //ignore nulls in RDD as its mainly for testing callSite
        val testRDD = new HadoopRDD(sc, confBroadcast, None, null,
          classOf[NullWritable], classOf[Writable], 10)
      }
    }
    val end: Long = System.currentTimeMillis();
    println("Time taken : " + (end - start))
  }

def main(args: Array[String]): Unit = {
    run
  }

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

…CallSite() (rbalamohan)

srowen · 2016-03-23T10:15:46Z

core/src/main/scala/org/apache/spark/SparkContext.scala

    CallSite(
-      Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
-      Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)
+      Option(getLocalProperty(CallSite.SHORT_FORM))


I see your point though now this calls Utils.getCallSite twice when neither property is set. That might be OK, but I wonder if you can instead retrieve both property values, and then proceed to call Utils.getCallSite once if either is null.

Thanks @srowen . Incorporated the review comments in the latest commit. Please review.

…CallSite()

srowen · 2016-03-23T12:15:44Z

core/src/main/scala/org/apache/spark/SparkContext.scala

+
+    if (shortForm == null || longForm == null) {
+      val callSite = Utils.getCallSite()
+      shortForm = callSite.shortForm


Better, but now it will overwrite both props if either is null. That's slightly different behavior from before. It may be true that they're always both null or not null; if that's pretty sure then we can leave this. Otherwise you may need = Option(shortForm).getOrElse(callSite.shortForm)?

Thanks @srowen. In Utils.withDummyCallSite(), both LONG_FORM and SHORT_FORM are explicitly set to "". But I can see that it is possible to explicitly set one of them via setCallSite(shortCallSite).
Incorporated your review comments in latest commit.

…CallSite()

srowen · 2016-03-23T12:34:44Z

Jenkins test this please

SparkQA · 2016-03-23T16:47:29Z

Test build #53932 has finished for PR 11911 at commit 1a580f6.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2016-03-23T16:50:49Z

Jenkins retest this please

SparkQA · 2016-03-23T17:10:18Z

Test build #53945 has finished for PR 11911 at commit 1a580f6.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2016-03-23T17:10:55Z

core/src/main/scala/org/apache/spark/SparkContext.scala

@@ -1745,11 +1745,16 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
   * has overridden the call site using `setCallSite()`, this will return the user's version.
   */
  private[spark] def getCallSite(): CallSite = {
-    val callSite = Utils.getCallSite()


Would making this into a lazy val have the same performance-improving impact?

I thought I tried that and it didn't work, but I must have done something wrong. It seems to:

scala> def foo(): Int = { println("foo"); 42 } foo: ()Int scala> def bar(arg: Boolean): Int = { lazy val f = foo(); if (arg) { f } else { 0 } } bar: (arg: Boolean)Int scala> bar(true) foo res0: Int = 42 scala> bar(false) res1: Int = 0

So yeah that could be a much cleaner solution.

…CallSite()

rajeshbalamohan · 2016-03-23T21:33:02Z

Thanks @JoshRosen and @srowen . Retested with "lazy val" which has the same perf improvement. Added "lazy val" in latest commit.

srowen · 2016-03-24T09:51:27Z

Jenkins retest this please

SparkQA · 2016-03-24T11:51:42Z

Test build #54029 has finished for PR 11911 at commit f59c85f.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2016-03-24T16:09:57Z

Jenkins, retest this please.

SparkQA · 2016-03-24T20:22:35Z

Test build #54049 has finished for PR 11911 at commit f59c85f.

This patch fails from timeout after a configured wait of 250m.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2016-03-25T17:57:35Z

Jenkins, retest this please.

SparkQA · 2016-03-25T20:11:12Z

Test build #54195 has finished for PR 11911 at commit f59c85f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen · 2016-03-25T22:08:16Z

LGTM. It looks like the eager evaluation was accidentally introduced in 6600786; prior to that patch it used to be lazy. I'm going to merge this into master. Thanks!

SPARK-14091 [core] Consider improving performance of SparkContext.get…

dba630b

…CallSite() (rbalamohan)

srowen reviewed Mar 23, 2016
View reviewed changes

SPARK-14091 [core] Consider improving performance of SparkContext.get…

7b3ba70

…CallSite()

srowen reviewed Mar 23, 2016
View reviewed changes

SPARK-14091 [core] Consider improving performance of SparkContext.get…

1a580f6

…CallSite()

JoshRosen reviewed Mar 23, 2016
View reviewed changes

SPARK-14091 [core] Consider improving performance of SparkContext.get…

f59c85f

…CallSite()

asfgit closed this in ff7cc45 Mar 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPARK-14091 [core] Consider improving performance of SparkContext.get… #11911

SPARK-14091 [core] Consider improving performance of SparkContext.get… #11911

rajeshbalamohan commented Mar 23, 2016

srowen Mar 23, 2016

rajeshbalamohan Mar 23, 2016

srowen Mar 23, 2016

rajeshbalamohan Mar 23, 2016

srowen commented Mar 23, 2016

SparkQA commented Mar 23, 2016

srowen commented Mar 23, 2016

SparkQA commented Mar 23, 2016

JoshRosen Mar 23, 2016

srowen Mar 23, 2016

rajeshbalamohan commented Mar 23, 2016

srowen commented Mar 24, 2016

SparkQA commented Mar 24, 2016

JoshRosen commented Mar 24, 2016

SparkQA commented Mar 24, 2016

JoshRosen commented Mar 25, 2016

SparkQA commented Mar 25, 2016

JoshRosen commented Mar 25, 2016

SPARK-14091 [core] Consider improving performance of SparkContext.get… #11911

SPARK-14091 [core] Consider improving performance of SparkContext.get… #11911

Conversation

rajeshbalamohan commented Mar 23, 2016

What changes were proposed in this pull request?

How was this patch tested?

srowen Mar 23, 2016

Choose a reason for hiding this comment

rajeshbalamohan Mar 23, 2016

Choose a reason for hiding this comment

srowen Mar 23, 2016

Choose a reason for hiding this comment

rajeshbalamohan Mar 23, 2016

Choose a reason for hiding this comment

srowen commented Mar 23, 2016

SparkQA commented Mar 23, 2016

srowen commented Mar 23, 2016

SparkQA commented Mar 23, 2016

JoshRosen Mar 23, 2016

Choose a reason for hiding this comment

srowen Mar 23, 2016

Choose a reason for hiding this comment

rajeshbalamohan commented Mar 23, 2016

srowen commented Mar 24, 2016

SparkQA commented Mar 24, 2016

JoshRosen commented Mar 24, 2016

SparkQA commented Mar 24, 2016

JoshRosen commented Mar 25, 2016

SparkQA commented Mar 25, 2016

JoshRosen commented Mar 25, 2016