[SPARK-10649][STREAMING] Prevent inheriting job group and irrelevant job description in streaming jobs #8781

tdas · 2015-09-16T20:54:17Z

The job group, and job descriptions information is passed through thread local properties, and get inherited by child threads. In case of spark streaming, the streaming jobs inherit these properties from the thread that called streamingContext.start(). This may not make sense.

Job group: This is mainly used for cancelling a group of jobs together. It does not make sense to cancel streaming jobs like this, as the effect will be unpredictable. And its not a valid usecase any way, to cancel a streaming context, call streamingContext.stop()
Job description: This is used to pass on nice text descriptions for jobs to show up in the UI. The job description of the thread that calls streamingContext.start() is not useful for all the streaming jobs, as it does not make sense for all of the streaming jobs to have the same description, and the description may or may not be related to streaming.

The solution in this PR is meant for the Spark master branch, where local properties are inherited by cloning the properties. The job group and job description in the thread that starts the streaming scheduler are explicitly removed, so that all the subsequent child threads does not inherit them. Also, the starting is done in a new child thread, so that setting the job group and description for streaming, does not change those properties in the thread that called streamingContext.start().

tdas · 2015-09-16T20:58:18Z

@andrewor14 Could you take a look at the core changes in this patch?
@zsxwing Could you take a look at all the changes.

tdas · 2015-09-16T20:59:50Z

streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala

+            ThreadUtils.runInNewThread("streaming-start") {
+              sparkContext.setCallSite(startSite.get)
+              sparkContext.setJobGroup(
+                StreamingContext.STREAMING_JOB_GROUP_ID, StreamingContext.STREAMING_JOB_DESCRIPTION, false)


I am forced to set a specific job group and desc, because there is not way to remove them (because of default Property in the local property).

SparkQA · 2015-09-16T21:06:28Z

Test build #42546 has finished for PR 8781 at commit df6f029.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class TaskCommitDenied(
- abstract class LocalNode(conf: SQLConf) extends QueryPlan[LocalNode] with Logging

SparkQA · 2015-09-16T22:14:59Z

Test build #42549 has finished for PR 8781 at commit 5b6ab28.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-09-16T22:25:43Z

Test build #42552 has finished for PR 8781 at commit 0553664.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-09-16T22:26:01Z

Test build #42550 has finished for PR 8781 at commit 83d037d.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-09-17T00:14:49Z

Test build #1767 has finished for PR 8781 at commit 23a4c2c.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- case class TaskCommitDenied(
- final val probabilityCol: Param[String] = new Param[String](this, "probabilityCol", "Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities")
- abstract class LocalNode(conf: SQLConf) extends QueryPlan[LocalNode] with Logging

SparkQA · 2015-09-17T02:49:52Z

Test build #1768 has finished for PR 8781 at commit 23a4c2c.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-09-17T03:16:35Z

Test build #42565 has finished for PR 8781 at commit c4534fd.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

zsxwing · 2015-09-17T11:04:38Z

core/src/test/scala/org/apache/spark/util/ThreadUtilsSuite.scala

+      runInNewThread("thread-name") { throw new IllegalArgumentException("test") }
+    }
+    assert(exception.isInstanceOf[IllegalArgumentException])
+    assert(exception.asInstanceOf[IllegalArgumentException].getMessage.contains("test"))


You can update these 5 lines to:

val exception = intercept[IllegalArgumentException] { runInNewThread("thread-name") { throw new IllegalArgumentException("test") } } assert(exception.getMessage.contains("test"))

That's true!

zsxwing · 2015-09-17T11:23:19Z

LGTM except some nits

SparkQA · 2015-09-17T12:49:40Z

Test build #42600 has finished for PR 8781 at commit 2afc50e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2015-09-18T10:44:09Z

@andrewor14 Will you be able to take a look? This is blocking other PRs
@zsxwing I addressed the comments.

zsxwing · 2015-09-18T11:56:25Z

streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala

+            ThreadUtils.runInNewThread("streaming-start") {
+              sparkContext.setCallSite(startSite.get)
+              sparkContext.clearJobGroup()
+              sparkContext.setLocalProperty(SparkContext.SPARK_JOB_INTERRUPT_ON_CANCEL, "false")


Is it necessary to set SPARK_JOB_INTERRUPT_ON_CANCEL to false, considering clearJobGroup will clean it?

Basically I dont want to rely on the default value of this parameter, and explicitly specify that do not interrupt.

SparkQA · 2015-09-18T13:19:59Z

Test build #42658 has finished for PR 8781 at commit d8600cf.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

andrewor14 · 2015-09-18T17:38:09Z

streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala

@@ -610,7 +614,7 @@ class StreamingContext private[streaming] (
        assert(env.metricsSystem != null)
        env.metricsSystem.registerSource(streamingSource)
        uiTab.foreach(_.attach())
-        logInfo("StreamingContext started")
+        this.logInfo("StreamingContext started")


not needed?

My bad. Forgot to remove that.

andrewor14 · 2015-09-18T17:42:53Z

streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala

        StreamingContext.ACTIVATION_LOCK.synchronized {
          StreamingContext.assertNoOtherContextIsActive()
          try {
            validate()
-            scheduler.start()
+            ThreadUtils.runInNewThread("streaming-start") {
+              sparkContext.setCallSite(startSite.get)


any reason why we need to move this in here? It's an atomic reference so it doesn't really matter which thread reads it right?

Because this sets a thread local variable.

an inheritable thread local variable, so it still doesn't matter

(anyway we can just keep this change, not a big deal, mainly just wondering)

andrewor14 · 2015-09-18T17:51:10Z

@tdas this looks good. Once you address the comments I can merge this.

andrewor14 · 2015-09-21T20:58:23Z

LGTM LGTM

SparkQA · 2015-09-21T23:15:59Z

Test build #42764 has finished for PR 8781 at commit 7550490.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2015-09-21T23:23:50Z

Test build #42768 has finished for PR 8781 at commit 8a900f3.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

tdas · 2015-09-21T23:47:08Z

I am merging this patch to master. I will issue a separate PR for branch 1.5. Thanks @andrewor14 and @zsxwing for reviewing.

… job description in streaming jobs The job group, and job descriptions information is passed through thread local properties, and get inherited by child threads. In case of spark streaming, the streaming jobs inherit these properties from the thread that called streamingContext.start(). This may not make sense. 1. Job group: This is mainly used for cancelling a group of jobs together. It does not make sense to cancel streaming jobs like this, as the effect will be unpredictable. And its not a valid usecase any way, to cancel a streaming context, call streamingContext.stop() 2. Job description: This is used to pass on nice text descriptions for jobs to show up in the UI. The job description of the thread that calls streamingContext.start() is not useful for all the streaming jobs, as it does not make sense for all of the streaming jobs to have the same description, and the description may or may not be related to streaming. The solution in this PR is meant for the Spark master branch, where local properties are inherited by cloning the properties. The job group and job description in the thread that starts the streaming scheduler are explicitly removed, so that all the subsequent child threads does not inherit them. Also, the starting is done in a new child thread, so that setting the job group and description for streaming, does not change those properties in the thread that called streamingContext.start(). Author: Tathagata Das <tathagata.das1565@gmail.com> Closes apache#8781 from tdas/SPARK-10649.

SparkQA · 2015-09-22T01:00:47Z

Test build #42769 timed out for PR 8781 at commit f84f479 after a configured wait of 250m.

… job description in streaming jobs **Note that this PR only for branch 1.5. See #8781 for the solution for Spark master.** The job group, and job descriptions information is passed through thread local properties, and get inherited by child threads. In case of spark streaming, the streaming jobs inherit these properties from the thread that called streamingContext.start(). This may not make sense. 1. Job group: This is mainly used for cancelling a group of jobs together. It does not make sense to cancel streaming jobs like this, as the effect will be unpredictable. And its not a valid usecase any way, to cancel a streaming context, call streamingContext.stop() 2. Job description: This is used to pass on nice text descriptions for jobs to show up in the UI. The job description of the thread that calls streamingContext.start() is not useful for all the streaming jobs, as it does not make sense for all of the streaming jobs to have the same description, and the description may or may not be related to streaming. The solution in this PR is meant for the Spark branch 1.5, where local properties are inherited by cloning the properties only when the Spark config `spark.localProperties.clone` is set to `true` (see #8781 for the PR for Spark master branch). Similar to the approach taken by #8721, StreamingContext sets that configuration to true, which makes sure that all subsequent child threads get a cloned copy of the threadlocal properties. This allows the job group and job description to be explicitly removed in the thread that starts the streaming scheduler, so that all the subsequent child threads does not inherit them. Also, the starting is done in a new child thread, so that setting the job group and description for streaming, does not change those properties in the thread that called streamingContext.start(). Author: Tathagata Das <tathagata.das1565@gmail.com> Closes #8856 from tdas/SPARK-10649-1.5.

Fixed job and job description for streaming jobs

df6f029

tdas force-pushed the SPARK-10649 branch from 2525bc5 to df6f029 Compare September 16, 2015 20:57

tdas reviewed Sep 16, 2015
View reviewed changes

Addressed scala style issue

5b6ab28

Added docs on runInNewThread

0553664

tdas force-pushed the SPARK-10649 branch from 83d037d to 0553664 Compare September 16, 2015 22:17

Fixed compilation error

1b347d7

style fix

c4534fd

tdas force-pushed the SPARK-10649 branch from 23a4c2c to c4534fd Compare September 17, 2015 00:35

tdas added 2 commits September 17, 2015 02:56

Merge remote-tracking branch 'apache-github/master' into SPARK-10649

cc90b2f

Changed to suit the current master

2afc50e

zsxwing reviewed Sep 17, 2015
View reviewed changes

Addressed comments on PR

d8600cf

tdas mentioned this pull request Sep 18, 2015

[SPARK-10652][SPARK-10742][Streaming] Set meaningful job descriptions for all streaming jobs #8791

Closed

zsxwing reviewed Sep 18, 2015
View reviewed changes

andrewor14 reviewed Sep 18, 2015
View reviewed changes

tdas added 2 commits September 21, 2015 10:50

Addressed PR comments

7550490

Added comment

f84f479

tdas force-pushed the SPARK-10649 branch from 8a900f3 to f84f479 Compare September 21, 2015 20:46

asfgit closed this in 7286988 Sep 21, 2015

tdas mentioned this pull request Sep 22, 2015

[SPARK-10649] [STREAMING] Prevent inheriting job group and irrelevant job description in streaming jobs #8856

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-10649][STREAMING] Prevent inheriting job group and irrelevant job description in streaming jobs #8781

[SPARK-10649][STREAMING] Prevent inheriting job group and irrelevant job description in streaming jobs #8781

tdas commented Sep 16, 2015

tdas commented Sep 16, 2015

tdas Sep 16, 2015

SparkQA commented Sep 16, 2015

SparkQA commented Sep 16, 2015

SparkQA commented Sep 16, 2015

SparkQA commented Sep 16, 2015

SparkQA commented Sep 17, 2015

SparkQA commented Sep 17, 2015

SparkQA commented Sep 17, 2015

zsxwing Sep 17, 2015

tdas Sep 17, 2015

zsxwing commented Sep 17, 2015

SparkQA commented Sep 17, 2015

tdas commented Sep 18, 2015

zsxwing Sep 18, 2015

andrewor14 Sep 18, 2015

tdas Sep 18, 2015

SparkQA commented Sep 18, 2015

andrewor14 Sep 18, 2015

tdas Sep 18, 2015

andrewor14 Sep 18, 2015

tdas Sep 21, 2015

andrewor14 Sep 21, 2015

andrewor14 commented Sep 18, 2015

andrewor14 commented Sep 21, 2015

SparkQA commented Sep 21, 2015

SparkQA commented Sep 21, 2015

tdas commented Sep 21, 2015

SparkQA commented Sep 22, 2015

[SPARK-10649][STREAMING] Prevent inheriting job group and irrelevant job description in streaming jobs #8781

[SPARK-10649][STREAMING] Prevent inheriting job group and irrelevant job description in streaming jobs #8781

Conversation

tdas commented Sep 16, 2015

tdas commented Sep 16, 2015

Choose a reason for hiding this comment

SparkQA commented Sep 16, 2015

SparkQA commented Sep 16, 2015

SparkQA commented Sep 16, 2015

SparkQA commented Sep 16, 2015

SparkQA commented Sep 17, 2015

SparkQA commented Sep 17, 2015

SparkQA commented Sep 17, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zsxwing commented Sep 17, 2015

SparkQA commented Sep 17, 2015

tdas commented Sep 18, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Sep 18, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewor14 commented Sep 18, 2015

andrewor14 commented Sep 21, 2015

SparkQA commented Sep 21, 2015

SparkQA commented Sep 21, 2015

tdas commented Sep 21, 2015

SparkQA commented Sep 22, 2015