Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8593] [core] Sort app attempts by start time. #7253

Closed
wants to merge 13 commits into from

Conversation

rekhajoshm
Copy link
Contributor

This makes sure attempts are listed in the order they were executed, and that the
app's state matches the state of the most current attempt.

@andrewor14
Copy link
Contributor

add to whitelist

@andrewor14
Copy link
Contributor

@vanzin

@SparkQA
Copy link

SparkQA commented Jul 7, 2015

Test build #36709 has finished for PR 7253 at commit ab65fa1.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -168,6 +174,10 @@ private[history] class HistoryPage(parent: HistoryServer) extends WebUIPage("")
"-"
}
val lastUpdated = UIUtils.formatDate(attempt.lastUpdated)
var someAttemptCompleted = false
info.attempts.foreach{ attempt =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be val someAttemptCompleted = info.attempts.exists(_.completed)

@vanzin
Copy link
Contributor

vanzin commented Jul 7, 2015

My interpretation of the bug is different. Basically, the bug says that when an old attempt that shows up as "running" exists, the newer attempt, even if finished, is ignored, meaning the application is actually finished but does not show up on the finished list.

I think the bug is actually in FsHistoryProvider::compareAttemptInfo. The javadoc says:

Comparison function that defines the sort order for application attempts within the same
application. Order is: running attempts before complete attempts, running attempts sorted
by start time, completed attempts sorted by end time.

That is wrong given what the bug describes; perhaps the correct sort order should be based solely on the attempt's start time.

@tgravescs
Copy link
Contributor

@vanzin interpretation of the bug is correct.

If an early attempt is left around a .inprogress file but then the next attempt completed successfully then it should be considered completed and at least show the last attempt that completed successfully.

@rekhajoshm
Copy link
Contributor Author

thanks @vanzin @tgravescs @srowen @andrewor14 for your comments.

My first line of attack was to remove early filtering in HistoryPage which entirely discards if first attempt was not completed(.filter(_.attempts.head.completed != requestedIncomplete)) but updating order of multiple attempts seems perfect.

New order: Completed attempts before running attempts, running attempts sorted by start time showing whichever started first, completed attempts sorted by end time showing whichever ended first.
Old and new way of multiple attempts ordering, as attached.please review.thanks!

updatedsort

@@ -64,61 +64,61 @@ private[history] class HistoryPage(parent: HistoryServer) extends WebUIPage("")
{providerConfig.map { case (k, v) => <li><strong>{k}:</strong> {v}</li> }}
</ul>
{
// This displays the indices of pages that are within `plusOrMinus` pages of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the changes in this file are unnecessary, please revert.

@vanzin
Copy link
Contributor

vanzin commented Jul 8, 2015

You'll also need to write a unit test for this (see FsHistoryProviderSuite.scala).

@SparkQA
Copy link

SparkQA commented Jul 8, 2015

Test build #36817 has finished for PR 7253 at commit a41ac4b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 8, 2015

Test build #36821 has finished for PR 7253 at commit 85024e8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 8, 2015

Test build #36829 has finished for PR 7253 at commit 304cb0b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 9, 2015

Test build #36846 has finished for PR 7253 at commit cc0fda7.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* application. Order is: running attempts before complete attempts, running attempts sorted
* by start time, completed attempts sorted by end time.
* application. Order is: completed attempts before running attempts, running attempts sorted
* by start time showing whichever started first,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wording is weird. What does "showing" mean?

I'd say: "sorted by descending start time", "sorted by descending end time".

Note, also, that your current code sorts in ascending order, while I think it should be the opposite.

@vanzin
Copy link
Contributor

vanzin commented Jul 9, 2015

Just a minor thing about the ordering (which will probably also affect the test code), otherwise looks ok. Since this is not really exclusive to yarn, I'd remove it from the PR title.

@rekhajoshm rekhajoshm changed the title [SPARK-8593] [YARN]: History Server: some attempt completed to work with showIncomplete [SPARK-8593] [CORE]: History Server: some attempt completed to work with showIncomplete Jul 9, 2015
@rekhajoshm
Copy link
Contributor Author

thanks @vanzin . unless i m missing something i intentionally kept ascending order for both start and end time. Users want to see which ended first (ascending endtime) on top, and which started first, which also has highest probability of getting finished first as well on top (ascending start time)

The updated order as is below:
1.Both completed: If both a1, and a2 completed then if a1 completed first , it will be shown first else a2 will be shown (ascending end time sort, sorted that which ended first)
2.Both running: If both a1, and a2 are not completed then if a1 started first , it will be shown first else a2 (running ascending start time sort, sorted that which started first)
3.a1 complete, a2 running: By changing the ! condition, showing a1 on top now
4.a1 incomplete, a2 complete: By changing the ! condition, showing a2 on top now

updated_sort

@vanzin
Copy link
Contributor

vanzin commented Jul 10, 2015

Users want to see which ended first (ascending endtime) on top

No, users want to see the most recent one on top (i.e. the one that ended last).

@rekhajoshm rekhajoshm changed the title [SPARK-8593] [CORE]: History Server: some attempt completed to work with showIncomplete [SPARK-8593] [Core]: History Server: some attempt completed to work with showIncomplete Jul 10, 2015
@rekhajoshm
Copy link
Contributor Author

Ack @vanzin updating, start time also descending order - users want to see which started last first??
Anycase, the concerned issue of jira hits just the case 4 in this case
4.a1 incomplete, a2 complete: By changing the ! condition, showing a2 on top now
So rest of case 1 and 2, will sort as you prefer. thanks.

@vanzin
Copy link
Contributor

vanzin commented Jul 10, 2015

start time also descending order - users want to see which started last first??

Correct. It should be very rare (if not impossible) for two attempts to overlap in time (at least on YARN), so an attempt that starts later will also end later.

@vanzin
Copy link
Contributor

vanzin commented Jul 10, 2015

I think you are confusing me. As I said in my first comment, descending start time should cover all cases.

@rekhajoshm
Copy link
Contributor Author

imo that would not be correct @vanzin for the case when a1 is completed, and a2 is incomplete if a1 start times before a2, doing descending start time sort would imply sort returning a2 before a1, and thus showing incompleted attempt before completed.not right as per intention of the issue.we always want to show completed first.hence i do not think descending start time can cover all cases.
please let me know your collective thoughts @vanzin @srowen @squito @andrewor14 @tgravescs as well on the commit.

The code would look as below when you say - "As I said in my first comment, descending start time should cover all cases."

  private def compareAttemptInfo(
      a1: FsApplicationAttemptInfo,
      a2: FsApplicationAttemptInfo): Boolean = {
    a1.startTime >= a2.startTime
  }

For clear illustration, that would lead to 3rd block on all states of a1 and a2.('After' block is what is in my commit presently)

updated_sort_2

@rekhajoshm
Copy link
Contributor Author

Hi. updated commit for having order of descending start time if both attempts are completed or running, else completed attempts before running attempts.maybe that is what you meant earlier? @vanzin ?
kindly review/approve? @vanzin @andrewor14 @srowen ? thanks!

@SparkQA
Copy link

SparkQA commented Jul 12, 2015

Test build #37101 has finished for PR 7253 at commit 83306a8.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Jul 13, 2015

for the case when a1 is completed, and a2 is incomplete if a1 start times before a2, doing descending start time sort would imply sort returning a2 before a1

That's the correct behavior. a2 is a later attempt than a1, and it means that a1 failed (and the app is still running). So a2 being listed before a1 is correct, since it's the most recent attempt and the one that matches the current state of the app ("running").

@rekhajoshm
Copy link
Contributor Author

hmm..I see your point @vanzin .According to your prescription then, later attempts before earlier attempts is the correct order.Did you consider all possibilities?speculative execution??
Also your thoughts on for case when a1 is incomplete and a2 has completed, if a1 start at same time as a2 (if not later by definition of attempts) Especially as that is the case @tgravescs argues in his SPARK-8593 that a1 is incomplete, but a2 has completed.and a2 does not show as order has prevented that.Hence preferred sort order to be completed before running, but if both are completed or both are running sorted by descending start time.please let me know your thoughts? thanks for your time!

@vanzin
Copy link
Contributor

vanzin commented Jul 13, 2015

Did you consider all possibilities?speculative execution??

There's no such thing as speculative execution of app attempts.

if a1 start at same time as a2 (if not later by definition of attempts)

That should never happen (see above, attempts are only started after a previous attempt has finished).

Especially as that is the case @tgravescs argues in his SPARK-8593

No it's not. The case he described is when the first attempt finished executing (i.e. there's no live SparkContext anymore) but the log file where the attempt was writing its logs wasn't properly closed. There are never two attempts executing at the same time, and it's a safe assumption that if there is more than one attempt, all attempts except the most recent one (i.e. the one with the most recent start time) are not running.

@rekhajoshm
Copy link
Contributor Author

thanks @vanzin convinced, updated to have descending start time for all cases.
guess i was nit picking to be sure; thanks for the conversation @vanzin

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37161 has finished for PR 7253 at commit 548c753.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37162 has finished for PR 7253 at commit 716e0b1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@vanzin
Copy link
Contributor

vanzin commented Jul 14, 2015

Jenkins retest this please.

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #15 has finished for PR 7253 at commit 716e0b1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37236 has finished for PR 7253 at commit 716e0b1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 14, 2015

Test build #37262 has finished for PR 7253 at commit 874dd80.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@rekhajoshm
Copy link
Contributor Author

done @vanzin thanks

@vanzin
Copy link
Contributor

vanzin commented Jul 15, 2015

Code looks good now, thanks! Could you update the PR title and description to match the actual change? e.g.

[SPARK-8593] [core] Sort app attempts by start time.

This makes sure attempts are listed in the order they were executed, and that the
app's state matches the state of the most current attempt.

@rekhajoshm rekhajoshm changed the title [SPARK-8593] [Core]: History Server: some attempt completed to work with showIncomplete [SPARK-8593] [core] Sort app attempts by start time. Jul 15, 2015
@rekhajoshm
Copy link
Contributor Author

done @vanzin thanks

@vanzin
Copy link
Contributor

vanzin commented Jul 17, 2015

ping @andrewor14 @srowen

@srowen
Copy link
Member

srowen commented Jul 17, 2015

LGTM, will merge shortly if there are no objections into master and 1.4.

@andrewor14
Copy link
Contributor

Looks fine, feel free to merge it.

asfgit pushed a commit that referenced this pull request Jul 17, 2015
This makes sure attempts are listed in the order they were executed, and that the
app's state matches the state of the most current attempt.

Author: Joshi <rekhajoshm@gmail.com>
Author: Rekha Joshi <rekhajoshm@gmail.com>

Closes #7253 from rekhajoshm/SPARK-8593 and squashes the following commits:

874dd80 [Joshi] History Server: updated order for multiple attempts(logcleaner)
716e0b1 [Joshi] History Server: updated order for multiple attempts(descending start time works everytime)
548c753 [Joshi] History Server: updated order for multiple attempts(descending start time works everytime)
83306a8 [Joshi] History Server: updated order for multiple attempts(descending start time)
b0fc922 [Joshi] History Server: updated order for multiple attempts(updated comment)
cc0fda7 [Joshi] History Server: updated order for multiple attempts(updated test)
304cb0b [Joshi] History Server: updated order for multiple attempts(reverted HistoryPage)
85024e8 [Joshi] History Server: updated order for multiple attempts
a41ac4b [Joshi] History Server: updated order for multiple attempts
ab65fa1 [Joshi] History Server: some attempt completed to work with showIncomplete
0be142d [Rekha Joshi] Merge pull request #3 from apache/master
106fd8e [Rekha Joshi] Merge pull request #2 from apache/master
e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master

(cherry picked from commit 42d8a01)
Signed-off-by: Sean Owen <sowen@cloudera.com>
@asfgit asfgit closed this in 42d8a01 Jul 17, 2015
@rekhajoshm
Copy link
Contributor Author

thanks @vanzin @srowen @andrewor14 !

@rekhajoshm rekhajoshm deleted the SPARK-8593 branch June 21, 2018 06:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants