Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-14091 [core] Consider improving performance of SparkContext.get… #11911

Closed
wants to merge 4 commits into from

Conversation

rajeshbalamohan
Copy link

What changes were proposed in this pull request?

Currently SparkContext.getCallSite() makes a call to Utils.getCallSite().

 private[spark] def getCallSite(): CallSite = {
    val callSite = Utils.getCallSite()
    CallSite(
      Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
      Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)
    )
  }

However, in some places utils.withDummyCallSite(sc) is invoked to avoid expensive threaddumps within getCallSite(). But Utils.getCallSite() is evaluated earlier causing threaddumps to be computed.

This can have severe impact on smaller queries (that finish in 10-20 seconds) having large number of RDDs.

Creating this patch for lazy evaluation of getCallSite.

How was this patch tested?

No new test cases are added. Following standalone test was tried out manually. Also, built entire spark binary and tried with few SQL queries in TPC-DS and TPC-H in multi node cluster

def run(): Unit = {
    val conf = new SparkConf()
    val sc = new SparkContext("local[1]", "test-context", conf)
    val start: Long = System.currentTimeMillis();
    val confBroadcast = sc.broadcast(new SerializableConfiguration(new Configuration()))
    Utils.withDummyCallSite(sc) {
      //Large tables end up creating 5500 RDDs
      for(i <- 1 to 5000) {
       //ignore nulls in RDD as its mainly for testing callSite
        val testRDD = new HadoopRDD(sc, confBroadcast, None, null,
          classOf[NullWritable], classOf[Writable], 10)
      }
    }
    val end: Long = System.currentTimeMillis();
    println("Time taken : " + (end - start))
  }

def main(args: Array[String]): Unit = {
    run
  }

(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

…CallSite() (rbalamohan)

CallSite(
Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)
Option(getLocalProperty(CallSite.SHORT_FORM))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point though now this calls Utils.getCallSite twice when neither property is set. That might be OK, but I wonder if you can instead retrieve both property values, and then proceed to call Utils.getCallSite once if either is null.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @srowen . Incorporated the review comments in the latest commit. Please review.


if (shortForm == null || longForm == null) {
val callSite = Utils.getCallSite()
shortForm = callSite.shortForm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better, but now it will overwrite both props if either is null. That's slightly different behavior from before. It may be true that they're always both null or not null; if that's pretty sure then we can leave this. Otherwise you may need = Option(shortForm).getOrElse(callSite.shortForm)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @srowen. In Utils.withDummyCallSite(), both LONG_FORM and SHORT_FORM are explicitly set to "". But I can see that it is possible to explicitly set one of them via setCallSite(shortCallSite).
Incorporated your review comments in latest commit.

@srowen
Copy link
Member

srowen commented Mar 23, 2016

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53932 has finished for PR 11911 at commit 1a580f6.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Mar 23, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Mar 23, 2016

Test build #53945 has finished for PR 11911 at commit 1a580f6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -1745,11 +1745,16 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
* has overridden the call site using `setCallSite()`, this will return the user's version.
*/
private[spark] def getCallSite(): CallSite = {
val callSite = Utils.getCallSite()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would making this into a lazy val have the same performance-improving impact?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I tried that and it didn't work, but I must have done something wrong. It seems to:

scala> def foo(): Int = { println("foo"); 42 }
foo: ()Int

scala> def bar(arg: Boolean): Int = { lazy val f = foo(); if (arg) { f } else { 0 } }
bar: (arg: Boolean)Int

scala> bar(true)
foo
res0: Int = 42

scala> bar(false)
res1: Int = 0

So yeah that could be a much cleaner solution.

@rajeshbalamohan
Copy link
Author

Thanks @JoshRosen and @srowen . Retested with "lazy val" which has the same perf improvement. Added "lazy val" in latest commit.

@srowen
Copy link
Member

srowen commented Mar 24, 2016

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #54029 has finished for PR 11911 at commit f59c85f.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Mar 24, 2016

Test build #54049 has finished for PR 11911 at commit f59c85f.

  • This patch fails from timeout after a configured wait of 250m.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Mar 25, 2016

Test build #54195 has finished for PR 11911 at commit f59c85f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor

LGTM. It looks like the eager evaluation was accidentally introduced in 6600786; prior to that patch it used to be lazy. I'm going to merge this into master. Thanks!

@asfgit asfgit closed this in ff7cc45 Mar 25, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants