Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4705] Handle multiple app attempts event logs, history server. #5432

Closed
wants to merge 28 commits into from

Conversation

vanzin
Copy link
Contributor

@vanzin vanzin commented Apr 9, 2015

This change modifies the event logging listener to write the logs for different application
attempts to different files. The attempt ID is set by the scheduler backend, so as long
as the backend returns that ID to SparkContext, things should work. Currently, the
YARN backend does that.

The history server was also modified to model multiple attempts per application. Each
attempt has its own UI and a separate row in the listing table, so that users can look at
all the attempts separately. The UI "adapts" itself to avoid showing attempt-specific info
when all the applications being shown have a single attempt.

twinkle-g and others added 11 commits April 7, 2015 16:43
…the master branch. 2) Added the attempt id inside the SparkListenerApplicationStart, to make the info available independent of directory structure. 3) Changes in History Server to render the UI as per the snaphot II
This change explicitly models app attempts in the history server.
An app is now a collection of app attempts, instead of the logic
to match apps to app attempts being implicit in the rendering code.

This makes the rendering code a lot simpler, since it doesn't need
to do any fancy processing of the app list to figure out what to show.
@vanzin
Copy link
Contributor Author

vanzin commented Apr 9, 2015

This PR is an updated version of #4845. I rebased the code on top of current master, added the suggestions I made on the original PR, fixed a bunch of style nits and other issues, and added a couple of tests. Here's a screenshot:

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29905 has finished for PR 5432 at commit 657ec18.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SparkListenerApplicationStart(appName: String, appId: Option[String],
  • This patch does not change any dependencies.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 9, 2015

BTW the zebra-striping in the UI looks a little broken right now, I'll take a look at that.

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29907 timed out for PR 5432 at commit 3a14503 after a configured wait of 120m.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 9, 2015

Hmm, didn't find a test failure in the output.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 9, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29917 has finished for PR 5432 at commit 3a14503.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SparkListenerApplicationStart(appName: String, appId: Option[String],
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 9, 2015

Test build #29949 has finished for PR 5432 at commit 9092af5.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SparkListenerApplicationStart(appName: String, appId: Option[String],
  • This patch does not change any dependencies.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 9, 2015

Funny. All YARN tests (not just in this PR) are failing with this:

/bin/bash: /bin/java: No such file or directory

Wonder what changed in the environment since they were working before? (And why is github's user name search so useless it cannot autocomplete Shane's user name?)

@shaneknapp
Copy link
Contributor

so i just grepped through the code and found stuff like this:

yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala: YarnSparkHadoopUtil.expandEnvironment(Environment.JAVA_HOME) + "/bin/java", "-server"
yarn/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnable.scala: YarnSparkHadoopUtil.expandEnvironment(Environment.JAVA_HOME) + "/bin/java",

i've never explicitly set JAVA_HOME in jenkins' slave user space before, but that's obviously why it's failing. that's pretty bad code imo.

solutions:

  • explicitly set JAVA_HOME in each slave's config (bad, as it ties that slave to whatever is on system java)
  • if JAVA_HOME isn't set, use whatever java is in the path (good)
  • explicitly define which java version to test against in the jenkins build's config

i prefer the second solution.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 9, 2015

I think JAVA_HOME is something that YARN exposes to all containers, so even if you don't set it for your application, that code should still work. I think we problem here is a little different - we should just make sure the tests have the same env as you'd find in an usual YARN installation.

Anyway, I'm trying something out in #5441.

@shaneknapp
Copy link
Contributor

cool. on our systems, at least, the system java we use is /usr/bin/java, which points (through /etc/alternatives), to /usr/java/latest (which itself is a link to /usr/java/jdk1.7.0_71/).

@shaneknapp
Copy link
Contributor

oh, i just had a thought: i installed a couple of different versions of java through jenkins, and right now the tests are set in the config to use 'Default', which is system level java. i bet this is why JAVA_HOME isn't being set and why the tests are failing. @JoshRosen is going to set JAVA_HOME for us to get the builds green and then we can look a little deeper in to the problem.

@andrewor14
Copy link
Contributor

Is it always safe to rely on java.home pointing to the right directory? IIUC this is independently of whether we use Maven or SBT. Then perhaps the correct way of fixing this is doing something like what AbstractCommandBuilder does, where if JAVA_HOME is not set it defaults to using java.home

On a side note: http://stackoverflow.com/questions/17023782/are-java-system-properties-always-non-null

@vanzin
Copy link
Contributor Author

vanzin commented Apr 9, 2015

Note that the YARN code is not resolving JAVA_HOME locally, it's adding a reference to $JAVA_HOME to the command that will be executed by YARN. That will be resolved on the node where the command is run. The NM generally sets JAVA_HOME for child processes.

@andrewor14
Copy link
Contributor

Ah yes, scratch that.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 10, 2015

Jenkins, retest this please.

// Do not call ui.bind() to avoid creating a new server for each application
}
applications.get(appId).flatMap { appInfo =>
val attempts = appInfo.attempts.filter(_.attemptId == attemptId)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the doc for getAppUI says to use an empty string for apps with a single attempt -- but that isn't exactly what is reflected here. Some yarn apps will be successful on the first attempt, but with this implementation, you still need to pass in the actual attempt id. One way or the other, the doc & this should be resolved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface doc is slightly misleading, but all event logs from YARN will have an attempt ID after this change, even for a single attempt.

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31146 has finished for PR 5432 at commit bc885b7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SparkListenerApplicationStart(appName: String, appId: Option[String],
  • This patch does not change any dependencies.

Conflicts:
	core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala
@squito
Copy link
Contributor

squito commented Apr 28, 2015

lgtm

I'll wait a day for any other feedback.

@andrewor14
Copy link
Contributor

@vanzin thanks for the fix. I'll have a quick look at this tonight.

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31166 has finished for PR 5432 at commit f66dcc5.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SparkListenerApplicationStart(appName: String, appId: Option[String],
  • This patch does not change any dependencies.

@squito
Copy link
Contributor

squito commented Apr 30, 2015

@andrewor14 did you have any comments on this? otherwise I am ready to merge

@andrewor14
Copy link
Contributor

Sorry I will look at this right now.

}
applications.get(appId).flatMap { appInfo =>
val attempts = appInfo.attempts.filter(_.attemptId == attemptId)
attempts.headOption.map { attempt =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this can just be find I think

@andrewor14
Copy link
Contributor

Great to see this fixed @vanzin. My comments are mostly minor. Unfortunately I don't have the time to do a closer review. How much more work do you imagine fixing this additionally for standalone mode would be? If it's not that much we should also fix that for 1.4 in separate patch.

@vanzin
Copy link
Contributor Author

vanzin commented Apr 30, 2015

How much more work do you imagine fixing this additionally for standalone mode would be?

I have no idea, I'm mostly unfamiliar with standalone cluster mode. Feel free to file a separate bug for it.

@SparkQA
Copy link

SparkQA commented Apr 30, 2015

Test build #31464 has finished for PR 5432 at commit 7e289fa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SparkListenerApplicationStart(appName: String, appId: Option[String],
  • This patch does not change any dependencies.

@andrewor14
Copy link
Contributor

retest this please

@andrewor14
Copy link
Contributor

Latest changes LGTM based on my quick review. @squito feel free to merge it.

@SparkQA
Copy link

SparkQA commented May 1, 2015

Test build #31480 has finished for PR 5432 at commit 7e289fa.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class SparkListenerApplicationStart(appName: String, appId: Option[String],
  • This patch does not change any dependencies.

@asfgit asfgit closed this in 3052f49 May 1, 2015
@vanzin vanzin deleted the SPARK-4705 branch May 1, 2015 23:40
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
This change modifies the event logging listener to write the logs for different application
attempts to different files. The attempt ID is set by the scheduler backend, so as long
as the backend returns that ID to SparkContext, things should work. Currently, the
YARN backend does that.

The history server was also modified to model multiple attempts per application. Each
attempt has its own UI and a separate row in the listing table, so that users can look at
all the attempts separately. The UI "adapts" itself to avoid showing attempt-specific info
when all the applications being shown have a single attempt.

Author: Marcelo Vanzin <vanzin@cloudera.com>
Author: twinkle sachdeva <twinkle@kite.ggn.in.guavus.com>
Author: twinkle.sachdeva <twinkle.sachdeva@guavus.com>
Author: twinkle sachdeva <twinkle.sachdeva@guavus.com>

Closes apache#5432 from vanzin/SPARK-4705 and squashes the following commits:

7e289fa [Marcelo Vanzin] Review feedback.
f66dcc5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
bc885b7 [Marcelo Vanzin] Review feedback.
76a3651 [Marcelo Vanzin] Fix log cleaner, add test.
7c381ec [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
1aa309d [Marcelo Vanzin] Improve sorting of app attempts.
2ad77e7 [Marcelo Vanzin] Missed a reference to the old property name.
9d59d92 [Marcelo Vanzin] Scalastyle...
d5a9c37 [Marcelo Vanzin] Update JsonProtocol test, make property name consistent.
ba34b69 [Marcelo Vanzin] Use Option[String] for attempt id.
f1cb9b3 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
c14ec19 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
9092d39 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
86de638 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
07446c6 [Marcelo Vanzin] Disable striping for app id / name when multiple attempts exist.
9092af5 [Marcelo Vanzin] Fix HistoryServer test.
3a14503 [Marcelo Vanzin] Argh scalastyle.
657ec18 [Marcelo Vanzin] Fix yarn history URL, app links.
c3e0a82 [Marcelo Vanzin] Move app name to app info, more UI fixes.
ce5ee5d [Marcelo Vanzin] Misc UI, test, style fixes.
cbe8bba [Marcelo Vanzin] Attempt ID in listener event should be an option.
88b1de8 [Marcelo Vanzin] Add a test for apps with multiple attempts.
3245aa2 [Marcelo Vanzin] Make app attempts part of the history server model.
5fd5c6f [Marcelo Vanzin] Fix my broken rebase.
318525a [twinkle.sachdeva] SPARK-4705: 1) moved from directory structure to single file, as per the master branch. 2) Added the attempt id inside the SparkListenerApplicationStart, to make the info available independent of directory structure. 3) Changes in History Server to render the UI as per the snaphot II
6b2e521 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this
4c1fc26 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this
0eb7722 [twinkle sachdeva] SPARK-4705: Doing cherry-pick of fix into master
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
This change modifies the event logging listener to write the logs for different application
attempts to different files. The attempt ID is set by the scheduler backend, so as long
as the backend returns that ID to SparkContext, things should work. Currently, the
YARN backend does that.

The history server was also modified to model multiple attempts per application. Each
attempt has its own UI and a separate row in the listing table, so that users can look at
all the attempts separately. The UI "adapts" itself to avoid showing attempt-specific info
when all the applications being shown have a single attempt.

Author: Marcelo Vanzin <vanzin@cloudera.com>
Author: twinkle sachdeva <twinkle@kite.ggn.in.guavus.com>
Author: twinkle.sachdeva <twinkle.sachdeva@guavus.com>
Author: twinkle sachdeva <twinkle.sachdeva@guavus.com>

Closes apache#5432 from vanzin/SPARK-4705 and squashes the following commits:

7e289fa [Marcelo Vanzin] Review feedback.
f66dcc5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
bc885b7 [Marcelo Vanzin] Review feedback.
76a3651 [Marcelo Vanzin] Fix log cleaner, add test.
7c381ec [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
1aa309d [Marcelo Vanzin] Improve sorting of app attempts.
2ad77e7 [Marcelo Vanzin] Missed a reference to the old property name.
9d59d92 [Marcelo Vanzin] Scalastyle...
d5a9c37 [Marcelo Vanzin] Update JsonProtocol test, make property name consistent.
ba34b69 [Marcelo Vanzin] Use Option[String] for attempt id.
f1cb9b3 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
c14ec19 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
9092d39 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
86de638 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
07446c6 [Marcelo Vanzin] Disable striping for app id / name when multiple attempts exist.
9092af5 [Marcelo Vanzin] Fix HistoryServer test.
3a14503 [Marcelo Vanzin] Argh scalastyle.
657ec18 [Marcelo Vanzin] Fix yarn history URL, app links.
c3e0a82 [Marcelo Vanzin] Move app name to app info, more UI fixes.
ce5ee5d [Marcelo Vanzin] Misc UI, test, style fixes.
cbe8bba [Marcelo Vanzin] Attempt ID in listener event should be an option.
88b1de8 [Marcelo Vanzin] Add a test for apps with multiple attempts.
3245aa2 [Marcelo Vanzin] Make app attempts part of the history server model.
5fd5c6f [Marcelo Vanzin] Fix my broken rebase.
318525a [twinkle.sachdeva] SPARK-4705: 1) moved from directory structure to single file, as per the master branch. 2) Added the attempt id inside the SparkListenerApplicationStart, to make the info available independent of directory structure. 3) Changes in History Server to render the UI as per the snaphot II
6b2e521 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this
4c1fc26 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this
0eb7722 [twinkle sachdeva] SPARK-4705: Doing cherry-pick of fix into master
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
This change modifies the event logging listener to write the logs for different application
attempts to different files. The attempt ID is set by the scheduler backend, so as long
as the backend returns that ID to SparkContext, things should work. Currently, the
YARN backend does that.

The history server was also modified to model multiple attempts per application. Each
attempt has its own UI and a separate row in the listing table, so that users can look at
all the attempts separately. The UI "adapts" itself to avoid showing attempt-specific info
when all the applications being shown have a single attempt.

Author: Marcelo Vanzin <vanzin@cloudera.com>
Author: twinkle sachdeva <twinkle@kite.ggn.in.guavus.com>
Author: twinkle.sachdeva <twinkle.sachdeva@guavus.com>
Author: twinkle sachdeva <twinkle.sachdeva@guavus.com>

Closes apache#5432 from vanzin/SPARK-4705 and squashes the following commits:

7e289fa [Marcelo Vanzin] Review feedback.
f66dcc5 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
bc885b7 [Marcelo Vanzin] Review feedback.
76a3651 [Marcelo Vanzin] Fix log cleaner, add test.
7c381ec [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
1aa309d [Marcelo Vanzin] Improve sorting of app attempts.
2ad77e7 [Marcelo Vanzin] Missed a reference to the old property name.
9d59d92 [Marcelo Vanzin] Scalastyle...
d5a9c37 [Marcelo Vanzin] Update JsonProtocol test, make property name consistent.
ba34b69 [Marcelo Vanzin] Use Option[String] for attempt id.
f1cb9b3 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
c14ec19 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
9092d39 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
86de638 [Marcelo Vanzin] Merge branch 'master' into SPARK-4705
07446c6 [Marcelo Vanzin] Disable striping for app id / name when multiple attempts exist.
9092af5 [Marcelo Vanzin] Fix HistoryServer test.
3a14503 [Marcelo Vanzin] Argh scalastyle.
657ec18 [Marcelo Vanzin] Fix yarn history URL, app links.
c3e0a82 [Marcelo Vanzin] Move app name to app info, more UI fixes.
ce5ee5d [Marcelo Vanzin] Misc UI, test, style fixes.
cbe8bba [Marcelo Vanzin] Attempt ID in listener event should be an option.
88b1de8 [Marcelo Vanzin] Add a test for apps with multiple attempts.
3245aa2 [Marcelo Vanzin] Make app attempts part of the history server model.
5fd5c6f [Marcelo Vanzin] Fix my broken rebase.
318525a [twinkle.sachdeva] SPARK-4705: 1) moved from directory structure to single file, as per the master branch. 2) Added the attempt id inside the SparkListenerApplicationStart, to make the info available independent of directory structure. 3) Changes in History Server to render the UI as per the snaphot II
6b2e521 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this
4c1fc26 [twinkle sachdeva] SPARK-4705 Incorporating the review comments regarding formatting, will do the rest of the changes after this
0eb7722 [twinkle sachdeva] SPARK-4705: Doing cherry-pick of fix into master
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants