Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1853] Show Streaming application code context (file, line number) in Spark Stages UI #2464

Closed
wants to merge 23 commits into from

Conversation

tdas
Copy link
Contributor

@tdas tdas commented Sep 19, 2014

This is a refactored version of the original PR #1723 my @mubarak

Please take a look @andrewor14, @mubarak

mubarak and others added 21 commits July 17, 2014 18:15
Conflicts:
	core/src/main/scala/org/apache/spark/rdd/RDD.scala
	streaming/src/main/scala/org/apache/spark/streaming/dstream/ReducedWindowedDStream.scala
…prevCallSite) only once. Adding return for other code paths (for None)
Conflicts:
	core/src/main/scala/org/apache/spark/util/Utils.scala
…llsite

Conflicts:
	streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
	streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
@SparkQA
Copy link

SparkQA commented Sep 19, 2014

QA tests have started for PR 2464 at commit 904cd92.

  • This patch merges cleanly.

None
// If RDD was already generated, then retrieve it from HashMap,
// or else compute the RDD
generatedRDDs.get(time).orElse {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are just a refactoring of the code (from case Some and case None to Option.orElse), with no change in the logic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed that this logic is the same as before

@SparkQA
Copy link

SparkQA commented Sep 19, 2014

QA tests have started for PR 2464 at commit 390b45d.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 19, 2014

QA tests have finished for PR 2464 at commit 390b45d.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 19, 2014

QA tests have finished for PR 2464 at commit 904cd92.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14
Copy link
Contributor

Thanks for working on this @tdas while on vacation. I will look at this in the next day or two.

@mubarak
Copy link
Contributor

mubarak commented Sep 20, 2014

LGTM. Thanks @tdas

*/
def getCallSite: CallSite = {
def getCallSite(classFilterFunc: String => Boolean = defaultCallSiteFilterFunc): CallSite = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would find it more natural to pass a function that excludes line from Spark code (it just inverts the logic). Something like internalExclusionFunc. Then you could have sparkCoreExclusionFunc and sparkStreamingExclusionFunc. @andrewor14 what do you find more intuitive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, need a vote on this. I inverted the logic because the function has to distinguish not just Spark code but also ignore Scala classes (which may actually be in user code file). That;s why the default function has scala-regex inside it. So its best to treat this function as a user-code-or-not function rather than a spark-code-or-not function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think negative is more intuitive than positive here. By default every line of the call stack is kept, but if the caller passes in his own ignoreClass function then we should respect that. I actually like the name skipClass more, because down there you can just do if (skipClass(el.getClassName)) { ... }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I am outvoted! But skipClass does not express that the parameter is a function. How about classExclusionFunc be the param name, and the actual functions be coreClassExclusionFunc and streamingClassExclusionFunc ?

@SparkQA
Copy link

SparkQA commented Sep 23, 2014

QA tests have started for PR 2464 at commit dc54c71.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Sep 23, 2014

QA tests have finished for PR 2464 at commit dc54c71.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 23, 2014

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20727/

@andrewor14
Copy link
Contributor

Thanks, LGTM. I'm merging this into master and 1.1

asfgit pushed a commit that referenced this pull request Sep 23, 2014
…er) in Spark Stages UI

This is a refactored version of the original PR #1723 my mubarak

Please take a look andrewor14, mubarak

Author: Mubarak Seyed <mubarak.seyed@gmail.com>
Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #2464 from tdas/streaming-callsite and squashes the following commits:

dc54c71 [Tathagata Das] Made changes based on PR comments.
390b45d [Tathagata Das] Fixed minor bugs.
904cd92 [Tathagata Das] Merge remote-tracking branch 'apache-github/master' into streaming-callsite
7baa427 [Tathagata Das] Refactored getCallSite and setCallSite to make it simpler. Also added unit test for DStream creation site.
b9ed945 [Mubarak Seyed] Adding streaming utils
c461cf4 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
ceb43da [Mubarak Seyed] Changing default regex function name
8c5d443 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
196121b [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
491a1eb [Mubarak Seyed] Removing streaming visibility from getRDDCreationCallSite in DStream
33a7295 [Mubarak Seyed] Fixing review comments: Merging both setCallSite methods
c26d933 [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
f51fd9f [Mubarak Seyed] Fixing scalastyle, Regex for Utils.getCallSite, and changing method names in DStream
5051c58 [Mubarak Seyed] Getting return value of compute() into variable and call setCallSite(prevCallSite) only once. Adding return for other code paths (for None)
a207eb7 [Mubarak Seyed] Fixing code review comments
ccde038 [Mubarak Seyed] Removing Utils import from MappedDStream
2a09ad6 [Mubarak Seyed] Changes in Utils.scala for SPARK-1853
1d90cc3 [Mubarak Seyed] Changes for SPARK-1853
5f3105a [Mubarak Seyed] Merge remote-tracking branch 'upstream/master'
70f494f [Mubarak Seyed] Changes for SPARK-1853
1500deb [Mubarak Seyed] Changes in Spark Streaming UI
9d38d3c [Mubarak Seyed] [SPARK-1853] Show Streaming application code context (file, line number) in Spark Stages UI
d466d75 [Mubarak Seyed] Changes for spark streaming UI

(cherry picked from commit 729952a)
Signed-off-by: Andrew Or <andrewor14@gmail.com>
@asfgit asfgit closed this in 729952a Sep 23, 2014
@tdas
Copy link
Contributor Author

tdas commented Sep 24, 2014

@mubarak Thank you very much for this fix! Its finally merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants