-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WebUI][SPARK-7889] HistoryServer updates UI for incomplete apps #11118
Conversation
…ompletion state; use mock spark UI
…ault = 60; document
…om string to case class
… metrics used to track load & time —and for testing
… comments & stylecheck
…viction is taking place
…this triggers callbacs in the cache
… to keep scalastyle happy
…cking up any changes
…llelize().count() call, so the FS history provider isn't seeing an update, etc, etc.
…o scans through modified files to verify this takes.
…LoggingListener attempts to do so afterwards, swallowing exceptions raised
… its a race condition between probe time and the scanner thread -if the initial load is after the file update but before the scanner thread has looked @ the file, the file isn't detected as updated. The provider has to return the actual file timestamp of its choice for use in update checks, not the time that the initial load took place
…ore time details, but I'm about to move the fshistory off time and into a generic "attempt version" counter which will be compared on the probe. If an update has happened, this will know
…r and equality check
log.debug(s"Probing at time $now for updated application $cacheKey -> $entry") | ||
metrics.updateProbeCount.inc() | ||
updated = time(metrics.updateProbeTimer) { | ||
entry.updateProbe() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this check is now extremely cheap (at least with the FSHistoryProvider
). Actually checking for an update to the logs happens on its own schedule, as that scans logs looking for both new apps and updates to existing ones. That suggests that we could either drop this extra interval completely, and just do this check on every request, or if we want to leave it for other HistoryProvider
s, we could at least make the default very rapid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my real concern was not cost of probe, but what if there was an app updating rapidly, with a lot of user requests coming in; it'd trigger replay too often. it's the cost of replay which I worried about
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I guess I just wanted to point out that with the changes here, this probe is entirely independent from replay. Replay happens with normal log-checking -- that frequency is controlled by spark.history.fs.update.interval
. Here, we're just checking whether that regular log scanning has already loaded an updated UI for this attempt, and that is it. Since spark.history.fs.update.interval
is entirely controlling the expensive part, we may not need any other interval.
Test build #2524 has finished for PR 11118 at commit
|
Test build #50929 has finished for PR 11118 at commit
|
Test build #2525 has finished for PR 11118 at commit
|
Test build #50936 has finished for PR 11118 at commit
|
Test build #2526 has finished for PR 11118 at commit
|
// actually read, we may never refresh the app | ||
// we expect FileStatus to return the file size when it was initially created, but the api | ||
// is not explicit about this so lets be extra-safe. | ||
val eventLogLength = eventLog.getLen() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is usually just another call to getFileStatus().length; {{FileStatus}} is required to be static once created. (see http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/filesystem.html, though it skimps on concurrency issues)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah I see, I expected it to behave that way but couldn't find any documentation which really made that explicit. I guess you're saying its guaranteed by the post-conditions for getFileStatus()? I've updated the comment now.
LGTM; unifying the different probes for new-ness makes sense. |
Test build #51105 has started for PR 11118 at commit |
jenkins, test this please |
1 similar comment
jenkins, test this please |
57e937b
to
04f5385
Compare
Plan to merge this a little later (assuming tests pass), any other comments? |
Test build #51117 has finished for PR 11118 at commit
|
Test build #2536 has finished for PR 11118 at commit
|
Test build #51119 has finished for PR 11118 at commit
|
merged to master, thanks @steveloughran! |
Just saw this got merged. I'm probably missing some context, but can somebody explain to me why something so conceptually simple leads to such a big patch? |
Good Q. We thought it'd be simple at first too.
There's actually two other bigger things which would be possible to do on this chain
Oh, and faster boot time with a summary file alongside the full history, with main details (finished: Boolean, spark-version, ...) so that the boot time goes from O(apps*events) to O(apps) |
When the HistoryServer is showing an incomplete app, it needs to check if there is a newer version of the app available. It does this by checking if a version of the app has been loaded with a larger *filesize*. If so, it detaches the current UI, attaches the new one, and redirects back to the same URL to show the new UI. https://issues.apache.org/jira/browse/SPARK-7889 Author: Steve Loughran <stevel@hortonworks.com> Author: Imran Rashid <irashid@cloudera.com> Closes apache#11118 from squito/SPARK-7889-alternate.
When the HistoryServer is showing an incomplete app, it needs to check if there is a newer version of the app available. It does this by checking if a version of the app has been loaded with a larger filesize. If so, it detaches the current UI, attaches the new one, and redirects back to the same URL to show the new UI.
https://issues.apache.org/jira/browse/SPARK-7889