-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8372] History server shows incorrect information for application not started #6827
Conversation
@@ -282,8 +282,14 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) | |||
val newAttempts = logs.flatMap { fileStatus => | |||
try { | |||
val res = replay(fileStatus, bus) | |||
logInfo(s"Application log ${res.logPath} loaded successfully.") | |||
Some(res) | |||
res.map { r => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to change to pattern match style, using map is little weird.
res match {
case Some(r) => logInfo("...")
case None => logInfo("...")
}
Thanks @jerryshao . I updated the code as suggested. |
Test build #34928 has finished for PR 6827 at commit
|
Test build #34930 has finished for PR 6827 at commit
|
@jerryshao The |
@@ -160,7 +160,7 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) | |||
replayBus.addListener(appListener) | |||
val appInfo = replay(fs.getFileStatus(new Path(logDir, attempt.logPath)), replayBus) | |||
|
|||
ui.setAppName(s"${appInfo.name} ($appId)") | |||
appInfo.map { app => ui.setAppName(s"${app.name} ($appId)") } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're not using the return value, we generally use Option.foreach
, which is slightly cheaper.
If the PR summary you mention:
But I don't see that problem being fixed anywhere. The app id you set when |
Hi @markhamstra , I'm pretty sure about this style |
Yes, @jerryshao, the idiom is a little clearer when the contents of the Option are actually used to produce a result other than Unit, but it's also a little odd to use a different idiom just to handle the Unit result type. A lot of this is just style preference differences that often stem from how familiar and comfortable a developer is with other functional programming languages, libraries and idioms. And don't get me started on whether Anyway, the upshot of this is that |
@vanzin SPARK-8275 is a different issue. What I wanted to fix in this PR is to avoid showing an App ID like application_1432793609805_009_1.inprogress. This is not a valid App ID. The cause of this is in |
Thanks for reviewing this, @vanzin . The code is updated. |
Ok, now I get what the patch is doing. LGTM with latest changes. |
Test build #34967 has finished for PR 6827 at commit
|
logInfo(s"Application log ${res.logPath} loaded successfully.") | ||
Some(res) | ||
res match { | ||
case Some(r) => logDebug(s"Application log ${r.logPath} loaded successfully.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh never mind, just saw @vanzin's comment
Ok I'm going to merge this into master 1.4 after addressing the comments myself thanks. |
…on not started The history server may show an incorrect App ID for an incomplete application like <App ID>.inprogress. This app info will never disappear even after the app is completed. ![incorrectappinfo](https://cloud.githubusercontent.com/assets/9278199/8156147/2a10fdbe-137d-11e5-9620-c5b61d93e3c1.png) The cause of the issue is that a log path name is used as the app id when app id cannot be got during replay. Author: Carson Wang <carson.wang@intel.com> Closes #6827 from carsonwang/SPARK-8372 and squashes the following commits: cdbb089 [Carson Wang] Fix code style 3e46b35 [Carson Wang] Update code style 90f5dde [Carson Wang] Add a unit test d8c9cd0 [Carson Wang] Replaying events only return information when app is started (cherry picked from commit 2837e06) Signed-off-by: Andrew Or <andrew@databricks.com>
…on not started The history server may show an incorrect App ID for an incomplete application like <App ID>.inprogress. This app info will never disappear even after the app is completed. ![incorrectappinfo](https://cloud.githubusercontent.com/assets/9278199/8156147/2a10fdbe-137d-11e5-9620-c5b61d93e3c1.png) The cause of the issue is that a log path name is used as the app id when app id cannot be got during replay. Author: Carson Wang <carson.wang@intel.com> Closes apache#6827 from carsonwang/SPARK-8372 and squashes the following commits: cdbb089 [Carson Wang] Fix code style 3e46b35 [Carson Wang] Update code style 90f5dde [Carson Wang] Add a unit test d8c9cd0 [Carson Wang] Replaying events only return information when app is started
So, one thing that I noticed after I said "LGTM" is that this change breaks old logs (those generated by versions of Spark that do not record the app id). Those will never show up anymore (I think that should only be Spark 1.0?). If we care about that use case, this version should probably be reverted. The proper fix could be something as simple as this:
|
@vanzin can you elaborate? Is there a fix for this without reverting the patch? |
@andrewor14 It's not a particular line, it's the whole patch. The patch ignores any application whose logs do not contain an application ID. No logs generated by Spark 1.0 contain an app id, so they're all ignored after this patch. |
OK, I'm going to revert this patch since the app ID is fundamental to the fix here. @vanzin would you mind submitting an alternative fix? |
…on not started The history server may show an incorrect App ID for an incomplete application like <App ID>.inprogress. This app info will never disappear even after the app is completed. ![incorrectappinfo](https://cloud.githubusercontent.com/assets/9278199/8156147/2a10fdbe-137d-11e5-9620-c5b61d93e3c1.png) The cause of the issue is that a log path name is used as the app id when app id cannot be got during replay. Author: Carson Wang <carson.wang@intel.com> Closes apache#6827 from carsonwang/SPARK-8372 and squashes the following commits: cdbb089 [Carson Wang] Fix code style 3e46b35 [Carson Wang] Update code style 90f5dde [Carson Wang] Add a unit test d8c9cd0 [Carson Wang] Replaying events only return information when app is started
The history server may show an incorrect App ID for an incomplete application like .inprogress. This app info will never disappear even after the app is completed.
![incorrectappinfo](https://cloud.githubusercontent.com/assets/9278199/8156147/2a10fdbe-137d-11e5-9620-c5b61d93e3c1.png)
The cause of the issue is that a log path name is used as the app id when app id cannot be got during replay.