Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5411] Allow SparkListeners to be specified in SparkConf and loaded when creating SparkContext #4111

Conversation

JoshRosen
Copy link
Contributor

This patch introduces a new configuration option, spark.extraListeners, that allows SparkListeners to be specified in SparkConf and registered before the SparkContext is initialized. From the configuration documentation:

A comma-separated list of classes that implement SparkListener; when initializing SparkContext, instances of these classes will be created and registered with Spark's listener bus. If a class has a single-argument constructor that accepts a SparkConf, that constructor will be called; otherwise, a zero-argument constructor will be called. If no valid constructor can be found, the SparkContext creation will fail with an exception.

This motivation for this patch is to allow monitoring code to be easily injected into existing Spark programs without having to modify those programs' code.

@JoshRosen
Copy link
Contributor Author

/cc @ksakellis, who originally requested this feature.

@SparkQA
Copy link

SparkQA commented Jan 20, 2015

Test build #25787 has finished for PR 4111 at commit 25988f3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class SparkContext(

@JoshRosen
Copy link
Contributor Author

@andrewor14, could you take a quick pass through this to see if it seems reasonable?

@pwendell
Copy link
Contributor

Hey so for this one, I think it might be good to use a configuration mechanism rather than exposing a different constructor. The issue with this approach is that users have to re-write their code to use a different constructor in order to use this, whereas many applications might be written using the standard SparkContext constructor.

What about having a configuration option that allows you to specify classes of additional listeners, and then we expect the listener has a constructor that accepts a SparkConf.

spark.extraListeners=a.b.C,d.e.F

Then when we construct a SparkContext we instantiate these and with the SparkConf and we attach them to the context.

@ksakellis
Copy link

@pwendell I think using configuration to add spark listeners adds complexity to the system.

  1. Can't do static checking of listeners at compile time
  2. The config option is not self describing, users will need to reference docs to know how to add these listeners.
  3. Listeners may not need to accept SparkConf so now you need to change your Listener implementation.

@vanzin had the idea, which I like, of creating a SparkContextBuilider object that would make construction easier than adding more constructor arguments. So you can imagine something like:

val context = SparkContextBuilder.addConfig(confg)
                                 .addListeners(...)
                                 .setSparkHome(..)
                                 .build()

We can obviously keep supporting the existing constructors for compatibility but move towards this for new constructor arguments. Just an Idea.

@pwendell
Copy link
Contributor

Whether we use a builder pattern vs. adding more constructors is slightly orthogonal to what I was saying.

I was saying that it would be good if there was some way to avoid making the user who instantiates a SparkContext have to explicitly identify listeners in compiled code. It would be more flexible if e.g. a packager could add listeners that will always be attached in a specific environment. Since listeners are a semi-public integration point, this seems reasonable to have a hook here.

Having builder-style constructors for SparkContext is a separate proposal. In the past we've tried to put as much of the logic as possible into configurations and generally avoid having complex constructors. The thinking was it's not good to have two disparate ways of doing configuration (what happens if I have spark.home in a config and then I manually set sparkHome, etc, can be confusing). That's something we could look at separately, though, if we want to go away from the current approach.

@SparkQA
Copy link

SparkQA commented Jan 21, 2015

Test build #25854 has finished for PR 4111 at commit b22b379.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

I've updated this patch to sketch out a proposal for SparkConf-based listener configuration, as Patrick suggested. Please take another look and note the updated PR description.

@@ -379,6 +379,41 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli
}
executorAllocationManager.foreach(_.start())

// Use reflection to instantiate listeners specified via the `spark.extraListeners` configuration

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we group this logic with listenerBus.start() into a separate method called "setupListenerBus()" and call it in the constructor? The reason is this logic has to occur before the listenerBus.start() and having them grouped in a method can better ensure the order is preserved.

@SparkQA
Copy link

SparkQA commented Jan 26, 2015

Test build #26073 has finished for PR 4111 at commit 2daff9b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen JoshRosen changed the title [SPARK-5190] Allow SparkListeners to be registered before SparkContext is initialized [SPARK-5190] [WIP] Allow SparkListeners to be registered before SparkContext is initialized Jan 26, 2015
@JoshRosen JoshRosen changed the title [SPARK-5190] [WIP] Allow SparkListeners to be registered before SparkContext is initialized [SPARK-5411] Allow SparkListeners to be specified in SparkConf and loaded when creating SparkContext Jan 26, 2015
@JoshRosen
Copy link
Contributor Author

I've updated this PR to remove the SPARK_EXTRA_LISTENERS variable. I've also edited the PR description and title to link to a new JIRA, https://issues.apache.org/jira/browse/SPARK-5411, which describes the new mechanism added in this PR, rather than https://issues.apache.org/jira/browse/SPARK-5190, which should be addressed via a different mechanism (either another overloaded SparkContext constructor or a SparkContext builder).

@SparkQA
Copy link

SparkQA commented Jan 26, 2015

Test build #26117 has finished for PR 4111 at commit 1a5b9a0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Jan 30, 2015

Test build #26382 has finished for PR 4111 at commit 1a5b9a0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…-sparklistener-in-sc-constructor

Conflicts:
	core/src/main/scala/org/apache/spark/scheduler/LiveListenerBus.scala
	core/src/main/scala/org/apache/spark/scheduler/SparkListenerBus.scala
@SparkQA
Copy link

SparkQA commented Feb 2, 2015

Test build #26520 has finished for PR 4111 at commit 8370839.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@JoshRosen
Copy link
Contributor Author

I'm going to merge this in a little bit unless there's any additional feedback.

@pwendell
Copy link
Contributor

pwendell commented Feb 5, 2015

LGTM

asfgit pushed a commit that referenced this pull request Feb 5, 2015
…aded when creating SparkContext

This patch introduces a new configuration option, `spark.extraListeners`, that allows SparkListeners to be specified in SparkConf and registered before the SparkContext is initialized.  From the configuration documentation:

> A comma-separated list of classes that implement SparkListener; when initializing SparkContext, instances of these classes will be created and registered with Spark's listener bus. If a class has a single-argument constructor that accepts a SparkConf, that constructor will be called; otherwise, a zero-argument constructor will be called. If no valid constructor can be found, the SparkContext creation will fail with an exception.

This motivation for this patch is to allow monitoring code to be easily injected into existing Spark programs without having to modify those programs' code.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #4111 from JoshRosen/SPARK-5190-register-sparklistener-in-sc-constructor and squashes the following commits:

8370839 [Josh Rosen] Two minor fixes after merging with master
6e0122c [Josh Rosen] Merge remote-tracking branch 'origin/master' into SPARK-5190-register-sparklistener-in-sc-constructor
1a5b9a0 [Josh Rosen] Remove SPARK_EXTRA_LISTENERS environment variable.
2daff9b [Josh Rosen] Add a couple of explanatory comments for SPARK_EXTRA_LISTENERS.
b9973da [Josh Rosen] Add test to ensure that conf and env var settings are merged, not overriden.
d6f3113 [Josh Rosen] Use getConstructors() instead of try-catch to find right constructor.
d0d276d [Josh Rosen] Move code into setupAndStartListenerBus() method
b22b379 [Josh Rosen] Instantiate SparkListeners from classes listed in configurations.
9c0d8f1 [Josh Rosen] Revert "[SPARK-5190] Allow SparkListeners to be registered before SparkContext starts."
217ecc0 [Josh Rosen] Revert "Add addSparkListener to JavaSparkContext"
25988f3 [Josh Rosen] Add addSparkListener to JavaSparkContext
163ba19 [Josh Rosen] [SPARK-5190] Allow SparkListeners to be registered before SparkContext starts.

(cherry picked from commit 9a7ce70)
Signed-off-by: Patrick Wendell <patrick@databricks.com>
@asfgit asfgit closed this in 9a7ce70 Feb 5, 2015
@JoshRosen JoshRosen deleted the SPARK-5190-register-sparklistener-in-sc-constructor branch January 14, 2016 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants