-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible #18979
Conversation
|
||
// file 2 is noted, but not visible | ||
tracker.newFile(file2.toString) | ||
touch(file3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
file3
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, you are right. Spurious. Harmless but wrong. I was playing around with different sequences to see if I could confuse things. Will cut
To mimic S3-like behavior, you can overwrite the file system |
Test build #80803 has finished for PR 18979 at commit
|
Thanks for the fix and tests, @steveloughran! |
case e: FileNotFoundException => | ||
// may arise against eventually consistent object stores | ||
logInfo(s"File $path is not yet visible", e) | ||
0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm i feel this would be dangerous; nowhere did we document that this would return an incorrect size ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is: what can be picked up if the file isn't yet reported as present by the endpoint? Adding a bool to say "results are unreliable" could be used as a warning.
One thing to consider long term is: if hadoop FS output streams added a simple <String, Long> map of statistics, could they be picked up by committers & then aggregated in job reports. Hadoop filesystems have statistics (simple ones in Hadoop <= 2.7, an arbitrary map of String -> Long in 2.8 (with standard key names across filesystems), and certainly today S3ABlockOutputStream collects stats on individual streams. If that was made visible and collected, you could get a lot more detail on what is going on. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+I could a comment in the docs somewhere to state that metrics in the cloud may not be consistent.
val tracker = new BasicWriteTaskStatsTracker(conf) | ||
tracker.newFile(file.toString) | ||
touch(file) | ||
assertStats(tracker, 1, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not be able to differentiate between 0 byte file
and Missing File
in final metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that the file will eventually come into existence; that its absence straight after collection is simply a transient create inconsistency of the endpoint, like a brief caching of negative HEAD/GET requests (which AWS S3 does do as part of its DoS defences). The files will be there later.
One option: count the #of missing files and include that in the report. It shouldn't be a metric most of the time though: never on a "real" FS or consistent object store, rarely on an inconsistent one
Btw, as the file path passed to state tracker should be task temp file, is it common to directly use S3 as temp file output destination? |
Currently nobody should be using s3a:// at the the temp file destination, which is the same as saying "nobody should be using s3a:// as the direct destination of work", not without a special committer (Netflix, IBM's stocator, ...) or without something to give S3 list consistency. Because today, task commit relies on a list & rename of all files in the task attempt dir, and if you don't get list consistency, you can miss out on files. If you ever hear anyone complaining "it takes too long to commit to s3" then they are using it this way. Tell them to use a consistency layer or to stop it :) |
@gatorsmile: you will be able to do something better soon, as S3A is adding an inconsistent AWS client into The S3A FS itself needs to do more to handle throttling & failures (retry, add failure metrics so throttling & error rates can be measured). Knowing throttling rates is important as it will help identify perf problems due to bad distribution of work across a bucket, excess use of KMS key lookup..., things that in surface in support calls. This patch restores Spark 2.3 to the behaviour it has in Spark 2.2: a brief delay between object creation and visibility does not cause the task to fail |
@adrian-ionescu wrote
No. As long as everyone is aware of it, it won't be an issue. |
Test build #80841 has finished for PR 18979 at commit
|
Related to this, updated spec on Hadoop output stream, Syncable and StreamCapabilities. As the doc notes, object stores != filesystems, and while a lot can be done to preserve the metaphor on input, its on output where CRUD inconsistencies surface. along with the logic as "does a 0-byte file get created in create()", "when is data written?", etc. |
Has anyone had a look at this recently? The problem still exists, and while downstream filesystems can address if they recognise the use case & lie about values, they will be returning invalid values to the caller: spark will be reporting the wrong values. At least with this PR Spark will get to make the decisions about how to react itself. |
To me, this looks good. |
Will review it tomorrow |
I don't have strong opinion against this. Incorrect size is an issue but I can't think a better solution for now... |
@viirya : the new data writer API will allow for a broader set of stats to be propagated back from workers. When you are working with the object stores, an useful stat to get back is throttle count & retry count as they can be the cause of why things are slow ... and if it is due to throttling, throwing more workers at the job will actually slow things down. They'd be the ones to look at first |
} catch { | ||
case e: FileNotFoundException => | ||
// may arise against eventually consistent object stores | ||
logInfo(s"File $path is not yet visible", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the log message and indicate the size zero might be wrong? For example negative caching in S3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say "Reported file size in job summary may be invalid"?
LGTM except a minor comment. |
+1. This solves the regression on writing emtpy dataset with ORC format, too! |
Could you also include the test cases to InsertSuite.scala ? |
Gentle ping, @steveloughran ! :) |
Noted :) If so, the warning message which @gatorsmile has proposed is potentially going to mislead people into worrying about a problem which isn't there. and the numFiles metric is going to mislead. I'm starting to worry about how noisy the log would be, both there and when working with s3 when it's playing delayed visibility (rarer).
This would line things up in future for actually returning the list of expected vs actual files up as a metric where it could be reported. |
649f8da
to
d3f96f6
Compare
The latest PR update pulls in @dongjoon-hyun's new test; to avoid merge conflict in the Insert suite I've rebased against master.
|
Test build #82730 has finished for PR 18979 at commit
|
Hi, @steveloughran .
Yes, So far, Spark leave an empty directory in case of ORC. |
Test build #82732 has finished for PR 18979 at commit
|
Test build #82731 has finished for PR 18979 at commit
|
} catch { | ||
case e: FileNotFoundException => | ||
// may arise against eventually consistent object stores | ||
logDebug(s"File $path is not yet visible", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the error messages, it looks okay for me. First, it's a debug message. Second, ORC writer bug will be fixed in Spark 2.3 in any way.
Could you resolve the conflicts again? |
… of tests for various file states. Change-Id: I3269cb901a38b33e399ebef10b2dbcd51ccf9b75
Change-Id: I38ac11c808849e2fd91f4931f4cb5cdfad43e2af
Change-Id: I6d101ece0cccbd8403dff10004575a24109e6f1b
* Use Option to track whether or not current file is set; guarantees once-only invocation, amongst other things * separate counting of submitted files from number of files actually seen * Log at debug if an FNFE when caught * Log at info only at the end of a sequance of writes Change-Id: Id242c11338be1f41a3f9a5b8b30c796ac5b002a2
This is going to create merge conflict with this branch until I rebase it, which I'm about to Change-Id: Ie2309066ad7892cb20155d9de8248c1682bba526
d3f96f6
to
c0e81a1
Compare
done. Not writing 0-byte files will offer significant speedup against object stores, where the cost of a call to getFileStatus() can take hundreds of millis. I look forward to it |
Test build #82745 has finished for PR 18979 at commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM, too.
Thanks! Merged to master. |
thanks for the review everyone! |
What changes were proposed in this pull request?
BasicWriteTaskStatsTracker.getFileSize()
to catchFileNotFoundException
, log @ info and then return 0 as a file size.This ensures that if a newly created file isn't visible due to the store not always having create consistency, the metric collection doesn't cause the failure.
How was this patch tested?
New test suite included,
BasicWriteTaskStatsTrackerSuite
. This not only checks the resilience to missing files, but verifies the existing logic as to how file statistics are gathered.Note that in the current implementation
if you call
Tracker..getFinalStats()
more than once, the file size count will increase by size of the last file. This could be fixed by clearing the filename field insidegetFinalStats()
itself.If you pass in an empty or null string to
Tracker.newFile(path)
then IllegalArgumentException is raised, but only ingetFinalStats()
, rather than innewFile
. There's a test for this behaviour in the new suite, as it verifies that only FNFEs get swallowed.