Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-7786][STREAMING] Allow StreamingListener to be specified in SparkConf and loaded when creating StreamingContext #6380

Closed
wants to merge 21 commits into from

Conversation

Jeffrharr
Copy link
Contributor

Add SparkConf option spark.streamingListeners to implement similar functionality to spark.extraListeners defined in [SPARK-5411].

Also modifies StatsReportListener to have a zero-valued constructor and adds a simple test

@Jeffrharr Jeffrharr changed the title [Spark 7786] Allow StreamingListener to be specified in SparkConf and loaded when creating StreamingContext [Spark 7786][STREAMING] Allow StreamingListener to be specified in SparkConf and loaded when creating StreamingContext May 24, 2015
@Jeffrharr Jeffrharr changed the title [Spark 7786][STREAMING] Allow StreamingListener to be specified in SparkConf and loaded when creating StreamingContext [SPARK-7786][STREAMING] Allow StreamingListener to be specified in SparkConf and loaded when creating StreamingContext May 24, 2015
@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@SparkQA
Copy link

SparkQA commented May 24, 2015

Test build #33425 has finished for PR 6380 at commit e8506d4.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StatsReportListener(numBatchInfos: Int) extends StreamingListener

@Jeffrharr
Copy link
Contributor Author

How can I run Scala style checks? By the way, thanks for writing the code that I essentially lifted for this :P

@SparkQA
Copy link

SparkQA commented May 24, 2015

Test build #33445 has finished for PR 6380 at commit 6453c90.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StatsReportListener(numBatchInfos: Int) extends StreamingListener

@@ -17,6 +17,8 @@

package org.apache.spark.streaming.scheduler

import org.apache.spark.scheduler.StatsReportListener
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have not addressed this. Why was this needed?? I dont see anything that required this.

@andrewor14
Copy link
Contributor

@His-name-is-Joof you can do dev/scalastyle locally

@@ -1481,6 +1481,17 @@ Apart from these, the following properties are also available, and may be useful
How many batches the Spark Streaming UI and status APIs remember before garbage collecting.
</td>
</tr>
<tr>
<td><code>spark.streamingListeners</code></td>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be under the spark.streaming namespace instead, so a better name would be spark.streaming.listeners, or spark.streaming.extraListeners to be more consistent with the existing spark.extraListeners.

@SparkQA
Copy link

SparkQA commented May 27, 2015

Test build #33612 has finished for PR 6380 at commit d92d55b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StatsReportListener(numBatchInfos: Int) extends StreamingListener

@SparkQA
Copy link

SparkQA commented May 28, 2015

Test build #33625 has finished for PR 6380 at commit 00c0409.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StatsReportListener(numBatchInfos: Int) extends StreamingListener

@SparkQA
Copy link

SparkQA commented May 28, 2015

Test build #33635 has finished for PR 6380 at commit c94982f.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StatsReportListener(numBatchInfos: Int) extends StreamingListener

val listenerClass = Class.forName(className)
listenerClass.getConstructors.asInstanceOf[Array[Constructor[_ <: StreamingListener]]]
}
val constructorTakingSparkConf = constructors.find { c =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listenerClass.getConstructor(classOf[SparkConf]) is simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. getConstructor will throw NoSuchMethodException and make the code complex.

@zsxwing
Copy link
Member

zsxwing commented Jun 9, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Jun 9, 2015

Test build #34479 has finished for PR 6380 at commit c94982f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StatsReportListener(numBatchInfos: Int) extends StreamingListener

@tdas
Copy link
Contributor

tdas commented Jun 12, 2015

Hey so, I thought about this feature a little bit. I am not this is an essential feature.The spark.extraListeners was added so that listeners can get added before SparkContext is started even in environments where the user does not explicitly start the SparkContext, e.g. spark shell. Since the SparkContext in the spark-shell is started automatically before the user gets any opportunity to attach their own listener, the extraListener conf was added to make sure the system automatically attaches them before creating any listener event.

But StreamingContext is really never started automatically like SparkContext in spark-shell. One can always add listeners before starting the StreamingContext. So what is the usecase where this is really essential?

@tdas
Copy link
Contributor

tdas commented Jun 18, 2015

Any thoughts on this?

@Jeffrharr
Copy link
Contributor Author

Sorry, been unusually busy for quite a while. You make an excellent point,
it doesn't seem essential. However it does provide a certain consistency
that may make it simpler to move from listeners to streamingListeners. A
sort of "expected functionality". That it was requested in the first place
(not by me) is some evidence of this. Perhaps ask on the JIRA the precise
motivation?

The downside being that streamingListeners must be formatted properly to
work (perhaps encouraging obsfucated zero-argument constructors) and extra
code maintained.

Any thoughts on this?


Reply to this email directly or view it on GitHub
#6380 (comment).

@SparkQA
Copy link

SparkQA commented Jul 10, 2015

Test build #36995 has finished for PR 6380 at commit 233335d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@tdas
Copy link
Contributor

tdas commented Jul 10, 2015

Jenkins, test this please.

@SparkQA
Copy link

SparkQA commented Jul 10, 2015

Test build #37027 has finished for PR 6380 at commit 233335d.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StatsReportListener(numBatchInfos: Int) extends StreamingListener

@tdas
Copy link
Contributor

tdas commented Jul 10, 2015

your test is failing :)


test("registering listeners via spark.streaming.listeners") {
val conf = new SparkConf().setMaster("local").setAppName("test")
.set("spark.streaming.listeners", classOf[BatchInfoCollector].getName + "," +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be changed too to extraListeners.

@SparkQA
Copy link

SparkQA commented Jul 10, 2015

Test build #37056 has finished for PR 6380 at commit 28e7f27.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StatsReportListener(numBatchInfos: Int) extends StreamingListener

@SparkQA
Copy link

SparkQA commented Jul 11, 2015

Test build #37069 has finished for PR 6380 at commit af41098.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StatsReportListener(numBatchInfos: Int) extends StreamingListener

}
} catch {
case e: Exception =>
try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stop() is not needed as the StreamingContext has not been started yet at this point. And in that case the whole try ... catch is not needed, and the whole things can be simplified to a single throw new SparkException("Exception when registering StreamingListener: $className did not have a zero-argument constructor or a ..."

<td><code>spark.streaming.extraListeners</code></td>
<td>(none)</td>
<td>
A comma-separated list of classes that implement <code>StreamingListener</code>; when initializing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect:
"when initializing SparkContext"? ==> StreamingContext

"Spark's streaming listener bus" ==> "StreamingContext's listener bus"

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37157 has finished for PR 6380 at commit 9ec68c1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class StatsReportListener(numBatchInfos: Int) extends StreamingListener

@tdas
Copy link
Contributor

tdas commented Jul 14, 2015

Couple of more comments to address and it is good to go.

@tdas
Copy link
Contributor

tdas commented Jul 16, 2015

@His-name-is-Joof ping!

@Jeffrharr
Copy link
Contributor Author

Working on it now. In the middle of moving apartments. Thanks for your
patience!

On Wed, Jul 15, 2015 at 7:07 PM, Tathagata Das notifications@github.com
wrote:

@His-name-is-Joof https://github.com/His-name-is-Joof ping!


Reply to this email directly or view it on GitHub
#6380 (comment).

@SparkQA
Copy link

SparkQA commented Jul 17, 2015

Test build #37548 has finished for PR 6380 at commit fa8c752.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds the following public classes (experimental):
    • class StatsReportListener(numBatchInfos: Int) extends StreamingListener

@tdas
Copy link
Contributor

tdas commented Jul 17, 2015

@His-name-is-Joof I am not sure you realized, but the PR does not merge cleanly! It has conflicts, please update with the master branch

val failingScc = new StreamingContext(failingConf, Seconds(1))
}
val expectedErrorMessage =
"Exception when registering Streaming Listener:" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets not be so specific, the exact text can change. Just verify whether the message has the name of the class inside it.

@tdas
Copy link
Contributor

tdas commented Jul 22, 2015

@His-name-is-Joof Mind updating this soon. Would be good to get this merged before the underlying code diverges further.

@tdas
Copy link
Contributor

tdas commented Jul 27, 2015

Hey @His-name-is-Joof if you are not able to update this PR, mind if I take over merging it?

@andrewor14
Copy link
Contributor

Let's close this PR for now and reopen it later if needed. @tdas can take over.

@asfgit asfgit closed this in 804a012 Sep 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants