Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2458] Make failed application log visible on History Server #1558

Closed
wants to merge 1 commit into from

Conversation

tsudukim
Copy link
Contributor

Modified to show uncompleted applications in History Server ui.
Modified apps sort rule to startTime base (originally it was endTime base) because uncompleted apps doesn't have proper endTime.

Modified to show completed applications in History Server ui.
Modified apps sort rule to startTime base (originally it was endTime base) because failed apps doesn't have proper endTime.
@tsudukim
Copy link
Contributor Author

We get the same ui as now by default.
spark-2458-notinclude

When clicked the link above the table, we can also get the list that also include the apps which doesn't finished successfully.
spark-2458-include

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@vanzin
Copy link
Contributor

vanzin commented Jul 24, 2014

Disclaimer: haven't looked at the code yet.

I'm a little conflicted about exposing running apps in the history server, especially this way. First, "history" sort of implies things that happened in the past.

Second, misbehaving apps can cause log files to never go into a "finished" state (e.g. by failing to call SparkContext::stop()) - although you can make the argument that anyone can write anything to the root log dir anyway.

Third, the user experience from your screenshots is very weird. When just looking at finished apps, things are sorted one way, but when including unfinished ones, they're sorted another way. That's super confusing, especially when you have paging.

If listing running apps in the HS is really wanted, I'd suggest an approach where running apps are shown separately from finished ones. Either in a separate table, or a separate tab in the UI.

@tsudukim
Copy link
Contributor Author

Thank you for following this PR.
Let me explain a little.
I'm sorry I made you misunderstand my purpose with the improper word "uncompleted". The purpose of this PR is to show "failed" apps in the HS, but not the running apps. But it is true that we can't recognize if the app already failed or still running from the log in this way, so as a result they both show up in the HS.

First point, the purpose is to show failed apps in the past, so this PR still matches for the concept of HS.
Second point, the target of this PR is apps that never go into "finished" state.
And third point, sorting ways are the same in the both mode. But your suggestion makes sense. Separate table or tab might be better.

@vanzin
Copy link
Contributor

vanzin commented Jul 24, 2014

Hmm. A properly-written app that fails should still show up as finished:

val sc = new SparkContext(blah)
try {
  doStuff()
} finally {
  sc.stop()
}

Of course that's not guaranteed to work 100% of the time (for that we'd need an external entity monitoring the app, since we can't trust the app itself to do the right thing), but should cover most cases.

re: sorting, I see what you mean. Still, I think sorting by end time is more natural for someone checking app history. Perhaps at some point we should let the user pick how to sort / filter the list, but that's a separate discussion.

@SparkQA
Copy link

SparkQA commented Sep 5, 2014

Can one of the admins verify this patch?

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA tests have started for PR 1558. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/14/consoleFull

@SparkQA
Copy link

SparkQA commented Sep 10, 2014

QA results for PR 1558:
- This patch FAILED unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/14/consoleFull

@andrewor14
Copy link
Contributor

retest this please

@andrewor14
Copy link
Contributor

Hi @tsudukim, how does the user see the incomplete applications? As @vanzin suggested, the semantics of a history server is that it displays completed applications only. That said, since we can't distinguish running and failed applications, we might want a way to expose the potentially failed applications. I've had to tell people on the mailing list to manually create the APPLICATION_COMPLETE file, which is very bad user experience.

I think the UI should have a subtle "Show incomplete applications" link that only expands if the user clicks on it. These should be in a separate table by themselves so we don't mix them with the ones we know are complete. As for sorting, I agree with @vanzin that end time is more natural than start time. For incomplete applications, actually, won't the end time always be infinity or some special value? Maybe we can use that to detect whether an application has finished.

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA tests have started for PR 1558. This patch DID NOT merge cleanly!
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20169/consoleFull

@SparkQA
Copy link

SparkQA commented Sep 11, 2014

QA results for PR 1558:
- This patch FAILED unit tests.

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20169/consoleFull

@andrewor14
Copy link
Contributor

Also, looks like this has merge conflicts. It would be great if you could rebase to master. Thanks!

@tsudukim
Copy link
Contributor Author

tsudukim commented Oct 1, 2014

Thank you @andrewor14
I've researched this problem these days with our environment and it turned out to be a very rare case as @vanzin suggested first.
(like jvm lost and failed to call SparkContext::stop(), failed to write to HDFS for some reason, etc)
And my PR is not the smart way to solve the rare case.
so I drop this PR.
Thank you for your comments again.

@tsudukim tsudukim closed this Oct 1, 2014
@andrewor14
Copy link
Contributor

@tsudukim Actually the high-level fix here is not a bad idea. Right now if the logs don't show up here the user has to manually figure out whether the APPLICATION_COMPLETE file is present. It would be good to show some feedback to the user so they don't have to guess if their paths are set properly or their application terminated properly etc.

Let me know if you're interested in submitting a new PR that addresses the comments raised in this one.

@tsudukim
Copy link
Contributor Author

@andrewor14 I created a new PR (#3467) as your comment. Please check it.

sunchao pushed a commit to sunchao/spark that referenced this pull request Jun 2, 2023
* fix in predicate test failure

* Revert "[SPARK-40149][SQL][3.2] Propagate metadata columns through Project"

This reverts commit f78719f.

* Trigger Build
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants