[SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible #18979

steveloughran · 2017-08-17T20:05:53Z

What changes were proposed in this pull request?

BasicWriteTaskStatsTracker.getFileSize() to catch FileNotFoundException, log @ info and then return 0 as a file size.

This ensures that if a newly created file isn't visible due to the store not always having create consistency, the metric collection doesn't cause the failure.

How was this patch tested?

New test suite included, BasicWriteTaskStatsTrackerSuite. This not only checks the resilience to missing files, but verifies the existing logic as to how file statistics are gathered.

Note that in the current implementation

if you call Tracker..getFinalStats() more than once, the file size count will increase by size of the last file. This could be fixed by clearing the filename field inside getFinalStats() itself.
If you pass in an empty or null string to Tracker.newFile(path) then IllegalArgumentException is raised, but only in getFinalStats(), rather than in newFile. There's a test for this behaviour in the new suite, as it verifies that only FNFEs get swallowed.

gatorsmile · 2017-08-17T20:38:02Z

.../test/scala/org/apache/spark/sql/execution/datasources/BasicWriteTaskStatsTrackerSuite.scala

+
+    // file 2 is noted, but not visible
+    tracker.newFile(file2.toString)
+    touch(file3)


yeah, you are right. Spurious. Harmless but wrong. I was playing around with different sequences to see if I could confuse things. Will cut

hvanhovell · 2017-08-17T20:45:55Z

cc @adrian-ionescu

gatorsmile · 2017-08-17T20:50:36Z

To mimic S3-like behavior, you can overwrite the file system spark.hadoop.fs.$scheme.impl

SparkQA · 2017-08-17T22:47:32Z

Test build #80803 has finished for PR 18979 at commit 2a113fd.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

adrian-ionescu · 2017-08-18T06:29:04Z

Thanks for the fix and tests, @steveloughran!
Re 1. -- is there a need for calling getFinalStats() more than once? The function doc clearly states that it's not supported and may lead to undefined behaviour. Could be fixed, of course, but depending on the implementation of the stats tracker, that can be at the expense of additional memory or code complexity..

rxin · 2017-08-18T06:29:53Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala

+      case e: FileNotFoundException =>
+        // may arise against eventually consistent object stores
+        logInfo(s"File $path is not yet visible", e)
+        0


hm i feel this would be dangerous; nowhere did we document that this would return an incorrect size ...

The problem is: what can be picked up if the file isn't yet reported as present by the endpoint? Adding a bool to say "results are unreliable" could be used as a warning.

One thing to consider long term is: if hadoop FS output streams added a simple <String, Long> map of statistics, could they be picked up by committers & then aggregated in job reports. Hadoop filesystems have statistics (simple ones in Hadoop <= 2.7, an arbitrary map of String -> Long in 2.8 (with standard key names across filesystems), and certainly today S3ABlockOutputStream collects stats on individual streams. If that was made visible and collected, you could get a lot more detail on what is going on. Thoughts?

+I could a comment in the docs somewhere to state that metrics in the cloud may not be consistent.

viirya · 2017-08-18T07:30:52Z

.../test/scala/org/apache/spark/sql/execution/datasources/BasicWriteTaskStatsTrackerSuite.scala

+    val tracker = new BasicWriteTaskStatsTracker(conf)
+    tracker.newFile(file.toString)
+    touch(file)
+    assertStats(tracker, 1, 0)


We may not be able to differentiate between 0 byte file and Missing File in final metrics.

I'm assuming that the file will eventually come into existence; that its absence straight after collection is simply a transient create inconsistency of the endpoint, like a brief caching of negative HEAD/GET requests (which AWS S3 does do as part of its DoS defences). The files will be there later.

One option: count the #of missing files and include that in the report. It shouldn't be a metric most of the time though: never on a "real" FS or consistent object store, rarely on an inconsistent one

viirya · 2017-08-18T07:46:25Z

Btw, as the file path passed to state tracker should be task temp file, is it common to directly use S3 as temp file output destination?

steveloughran · 2017-08-18T09:42:18Z

Currently nobody should be using s3a:// at the the temp file destination, which is the same as saying "nobody should be using s3a:// as the direct destination of work", not without a special committer (Netflix, IBM's stocator, ...) or without something to give S3 list consistency. Because today, task commit relies on a list & rename of all files in the task attempt dir, and if you don't get list consistency, you can miss out on files. If you ever hear anyone complaining "it takes too long to commit to s3" then they are using it this way. Tell them to use a consistency layer or to stop it :)

steveloughran · 2017-08-18T09:55:25Z

To mimic S3-like behavior, you can overwrite the file system spark.hadoop.fs.$scheme.impl"

@gatorsmile: you will be able to do something better soon, as S3A is adding an inconsistent AWS client into hadoop-aws JAR, which you can then enable to guarantee consistency delays and inject intermittent faults into the system (throttling, transient network events). All it will take is a config option to switch to this client, plus the chaos-monkey-esque probabilities and delays. This is what I'm already using —you will be able to as well. That is, no need to switch clients, just gospark.hadoop.fs.s3a.s3.client.factory.impl=org.apache.hadoop.fs.s3a.InconsistentS3ClientFactory and wait for the stack traces.

The S3A FS itself needs to do more to handle throttling & failures (retry, add failure metrics so throttling & error rates can be measured). Knowing throttling rates is important as it will help identify perf problems due to bad distribution of work across a bucket, excess use of KMS key lookup..., things that in surface in support calls.

This patch restores Spark 2.3 to the behaviour it has in Spark 2.2: a brief delay between object creation and visibility does not cause the task to fail

steveloughran · 2017-08-18T10:14:21Z

@adrian-ionescu wrote

is there a need for calling getFinalStats() more than once?

No. As long as everyone is aware of it, it won't be an issue.

SparkQA · 2017-08-18T14:07:03Z

Test build #80841 has finished for PR 18979 at commit f778213.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

steveloughran · 2017-08-22T12:09:25Z

Related to this, updated spec on Hadoop output stream, Syncable and StreamCapabilities.

As the doc notes, object stores != filesystems, and while a lot can be done to preserve the metaphor on input, its on output where CRUD inconsistencies surface. along with the logic as "does a 0-byte file get created in create()", "when is data written?", etc.

steveloughran · 2017-10-10T14:28:47Z

Has anyone had a look at this recently?

The problem still exists, and while downstream filesystems can address if they recognise the use case & lie about values, they will be returning invalid values to the caller: spark will be reporting the wrong values. At least with this PR Spark will get to make the decisions about how to react itself.

adrian-ionescu · 2017-10-11T05:10:36Z

To me, this looks good.

gatorsmile · 2017-10-11T06:53:18Z

Will review it tomorrow

viirya · 2017-10-11T14:10:19Z

I don't have strong opinion against this. Incorrect size is an issue but I can't think a better solution for now...

steveloughran · 2017-10-11T15:30:30Z

@viirya : the new data writer API will allow for a broader set of stats to be propagated back from workers. When you are working with the object stores, an useful stat to get back is throttle count & retry count as they can be the cause of why things are slow ... and if it is due to throttling, throwing more workers at the job will actually slow things down. They'd be the ones to look at first

gatorsmile · 2017-10-11T18:19:16Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala

+    } catch {
+      case e: FileNotFoundException =>
+        // may arise against eventually consistent object stores
+        logInfo(s"File $path is not yet visible", e)


Could you update the log message and indicate the size zero might be wrong? For example negative caching in S3

say "Reported file size in job summary may be invalid"?

gatorsmile · 2017-10-11T19:51:41Z

LGTM except a minor comment.

dongjoon-hyun · 2017-10-12T03:45:47Z

+1. This solves the regression on writing emtpy dataset with ORC format, too!

gatorsmile · 2017-10-12T05:03:42Z

Could you also include the test cases to InsertSuite.scala ?

dongjoon-hyun · 2017-10-13T06:55:59Z

Gentle ping, @steveloughran ! :)

steveloughran · 2017-10-13T09:58:12Z

Noted :)
@dongjoon-hyun : is the issue with ORC that if there's nothing to write, it doesn't generate a file (so avoiding that issue with sometimes you get 0-byte ORC files & things downstream fail)?

If so, the warning message which @gatorsmile has proposed is potentially going to mislead people into worrying about a problem which isn't there. and the numFiles metric is going to mislead.

I'm starting to worry about how noisy the log would be, both there and when working with s3 when it's playing delayed visibility (rarer).

What if this patch just logged at debug: less noise, but still something there if people are trying to debug a mismatch?
if there's no file found, numFiles doesn't get incremented.
I count the number of files actually submitted
And in getFinalStats() log @ info if there is a mismatch

This would line things up in future for actually returning the list of expected vs actual files up as a metric where it could be reported.

steveloughran · 2017-10-13T12:52:23Z

The latest PR update pulls in @dongjoon-hyun's new test; to avoid merge conflict in the Insert suite I've rebased against master.

Everything handles missing files on output
There's only one logInfo at the end of the execute call, so if many empty files are created, the logs aren't too noisy.
There is now some implicit counting of how many files were missing = submittedFiles - numFiles; this isn't aggregated and reported though.

SparkQA · 2017-10-13T13:18:51Z

Test build #82730 has finished for PR 18979 at commit adab985.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-10-13T14:37:18Z

Hi, @steveloughran .

is the issue with ORC that if there's nothing to write, it doesn't generate a file (so avoiding that issue with sometimes you get 0-byte ORC files & things downstream fail)?

Yes, So far, Spark leave an empty directory in case of ORC.

SparkQA · 2017-10-13T15:00:43Z

Test build #82732 has finished for PR 18979 at commit d3f96f6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2017-10-13T15:32:05Z

Test build #82731 has finished for PR 18979 at commit 649f8da.

This patch passes all tests.
This patch does not merge cleanly.
This patch adds no public classes.

dongjoon-hyun · 2017-10-13T16:09:20Z

sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala

+    } catch {
+      case e: FileNotFoundException =>
+        // may arise against eventually consistent object stores
+        logDebug(s"File $path is not yet visible", e)


For the error messages, it looks okay for me. First, it's a debug message. Second, ORC writer bug will be fixed in Spark 2.3 in any way.

dongjoon-hyun · 2017-10-13T16:10:12Z

Could you resolve the conflicts again?

… of tests for various file states. Change-Id: I3269cb901a38b33e399ebef10b2dbcd51ccf9b75

Change-Id: I38ac11c808849e2fd91f4931f4cb5cdfad43e2af

Change-Id: I6d101ece0cccbd8403dff10004575a24109e6f1b

* Use Option to track whether or not current file is set; guarantees once-only invocation, amongst other things * separate counting of submitted files from number of files actually seen * Log at debug if an FNFE when caught * Log at info only at the end of a sequance of writes Change-Id: Id242c11338be1f41a3f9a5b8b30c796ac5b002a2

This is going to create merge conflict with this branch until I rebase it, which I'm about to Change-Id: Ie2309066ad7892cb20155d9de8248c1682bba526

steveloughran · 2017-10-13T18:57:44Z

done. Not writing 0-byte files will offer significant speedup against object stores, where the cost of a call to getFileStatus() can take hundreds of millis. I look forward to it

SparkQA · 2017-10-13T21:42:41Z

Test build #82745 has finished for PR 18979 at commit c0e81a1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun

+1, LGTM, too.

gatorsmile · 2017-10-14T06:08:31Z

Thanks! Merged to master.

steveloughran · 2017-10-16T11:45:14Z

thanks for the review everyone!

gatorsmile reviewed Aug 17, 2017

View reviewed changes

rxin reviewed Aug 18, 2017

View reviewed changes

viirya reviewed Aug 18, 2017

View reviewed changes

gatorsmile reviewed Oct 11, 2017

View reviewed changes

viirya mentioned this pull request Oct 12, 2017

[SPARK-22258][SQL] Writing empty dataset fails with ORC format #19477

Closed

steveloughran force-pushed the cloud/SPARK-21762-missing-files-in-metrics branch from 649f8da to d3f96f6 Compare October 13, 2017 12:09

dongjoon-hyun reviewed Oct 13, 2017

View reviewed changes

steveloughran added 5 commits October 13, 2017 19:55

SPARK-21762 handle FNFE events in BasicWriteStatsTracker; add a suite…

a11e9e1

… of tests for various file states. Change-Id: I3269cb901a38b33e399ebef10b2dbcd51ccf9b75

SPARK-21762 add tests for "" and null filenames

f86b234

Change-Id: I38ac11c808849e2fd91f4931f4cb5cdfad43e2af

SPARK-21762, remove needless touch(file3)

d7a63bc

Change-Id: I6d101ece0cccbd8403dff10004575a24109e6f1b

SPARK-21762: pull in Donjoon's test from PR apache#19477

c0e81a1

This is going to create merge conflict with this branch until I rebase it, which I'm about to Change-Id: Ie2309066ad7892cb20155d9de8248c1682bba526

steveloughran force-pushed the cloud/SPARK-21762-missing-files-in-metrics branch from d3f96f6 to c0e81a1 Compare October 13, 2017 18:56

dongjoon-hyun reviewed Oct 13, 2017

View reviewed changes

asfgit closed this in e353640 Oct 14, 2017

steveloughran deleted the cloud/SPARK-21762-missing-files-in-metrics branch March 18, 2019 17:57

[SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible #18979

[SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsTracker metrics collection fails if a new file isn't yet visible #18979

Conversation

steveloughran commented Aug 17, 2017

What changes were proposed in this pull request?

How was this patch tested?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hvanhovell commented Aug 17, 2017

gatorsmile commented Aug 17, 2017

SparkQA commented Aug 17, 2017

adrian-ionescu commented Aug 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

viirya commented Aug 18, 2017

steveloughran commented Aug 18, 2017

steveloughran commented Aug 18, 2017

steveloughran commented Aug 18, 2017

SparkQA commented Aug 18, 2017

steveloughran commented Aug 22, 2017

steveloughran commented Oct 10, 2017

adrian-ionescu commented Oct 11, 2017

gatorsmile commented Oct 11, 2017

viirya commented Oct 11, 2017

steveloughran commented Oct 11, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gatorsmile commented Oct 11, 2017

dongjoon-hyun commented Oct 12, 2017

gatorsmile commented Oct 12, 2017

dongjoon-hyun commented Oct 13, 2017 • edited Loading

steveloughran commented Oct 13, 2017

steveloughran commented Oct 13, 2017

SparkQA commented Oct 13, 2017

dongjoon-hyun commented Oct 13, 2017

SparkQA commented Oct 13, 2017

SparkQA commented Oct 13, 2017

Choose a reason for hiding this comment

dongjoon-hyun commented Oct 13, 2017

steveloughran commented Oct 13, 2017

SparkQA commented Oct 13, 2017

dongjoon-hyun left a comment

Choose a reason for hiding this comment

gatorsmile commented Oct 14, 2017

steveloughran commented Oct 16, 2017

dongjoon-hyun commented Oct 13, 2017 •

edited

Loading