[SPARK-26089][CORE] Handle corruption in large shuffle blocks #23453

ankuriitg · 2019-01-04T17:35:08Z

What changes were proposed in this pull request?

SPARK-4105 added corruption detection in shuffle blocks but that was limited to blocks which are
smaller than maxBytesInFlight/3. This commit adds upon that by adding corruption check for large
blocks. There are two changes/improvements that are made in this commit:

Large blocks are checked upto maxBytesInFlight/3 size in a similar way as smaller blocks, so if a
large block is corrupt in the starting, that block will be re-fetched and if that also fails,
FetchFailureException will be thrown.
If large blocks are corrupt after size maxBytesInFlight/3, then any IOException thrown while
reading the stream will be converted to FetchFailureException. This is slightly more aggressive
than was originally intended but since the consumer of the stream may have already read some records and processed them, we can't just re-fetch the block, we need to fail the whole task. Additionally, we also thought about maybe adding a new type of TaskEndReason, which would re-try the task couple of times before failing the previous stage, but given the complexity involved in that solution we decided to not proceed in that direction.

Thanks to @squito for direction and support.

How was this patch tested?

Changed the junit test for big blocks to check for corruption.

SparkQA · 2019-01-04T17:39:00Z

Test build #100746 has finished for PR 23453 at commit 08da21f.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

squito

just a partial review so far

there should be a test that a large block is read correctly, with the partial buffering & stream concatenation. I think most unit tests will not end up creating a large block.

squito · 2019-01-04T21:29:58Z

core/src/main/scala/org/apache/spark/util/Utils.scala

+      if (closeStreams) {
+        try {
+          if (count < maxSize) {
+            in.close()


maxSize might be exactly the same length as the stream, in which case you'll read to the end, but not close it.

Right, I left that edge case, as it will be correctly handled by the concatenated stream later

core/src/main/scala/org/apache/spark/util/Utils.scala

squito · 2019-01-04T21:36:07Z

core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala

              isStreamCopied = true
+              streamCompressedOrEncrypted = true


I wonder if we should do the inputStream checks even if detectCorrupt is false, and perhaps even change that default. Or add another flag just for doing this memory copy, as its more expensive.

Also I think the comments around here need to be updated to explain what is happening now.

Sure, I can change it to always detect corruption. Which default did you mean though, the default value of detectCorrupt is true.

SparkQA · 2019-01-04T22:33:40Z

Test build #100750 has finished for PR 23453 at commit ca7e804.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-01-07T17:48:01Z

Test build #100895 has finished for PR 23453 at commit e638c4e.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-01-07T22:20:52Z

Test build #100898 has finished for PR 23453 at commit 0d10626.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

rezasafi

Thanks for the work here. This is just an initial review to better understand your approach. I think It will be great to ping the original authors of SPARK-4105 and related changes to get their feedback as well.

core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala

ankuriitg · 2019-01-10T23:04:39Z

cc @davies since you worked on SPARK-4105

squito

I wonder if we should do the inputStream checks even if detectCorrupt is false, and perhaps even change that default. Or add another flag just for doing this memory copy, as its more expensive.

Which default did you mean though, the default value of detectCorrupt is true.

Sorry I didn't explain that very well. What I was wondering is we should change the behavior to

max the copy-into-memory check turned off by default
but always do the on-the-fly check, even if detectCorrupt is false. You could add another option for this also, and have it default true, if really want to allow users to turn off all detection.

I'm saying this because I think that I'd want to run my spark apps with the new default I'm suggesting -- don't do the copy into memory, but do the on-the-fly check all the time, of the entire block.

core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala

core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala

core/src/main/scala/org/apache/spark/util/Utils.scala

SparkQA · 2019-01-28T23:33:29Z

Test build #101777 has finished for PR 23453 at commit abbec63.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-01-29T03:58:01Z

Test build #101776 has finished for PR 23453 at commit d40d396.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

ankuriitg · 2019-01-29T16:35:29Z

retest this please

SparkQA · 2019-01-29T16:44:40Z

Test build #101832 has finished for PR 23453 at commit abbec63.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

squito · 2019-01-29T18:40:29Z

real build error:

[error] /home/jenkins/workspace/SparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/internal/config/package.scala:931: SHUFFLE_DETECT_CORRUPT is already defined as value SHUFFLE_DETECT_CORRUPT
[error]   private[spark] val SHUFFLE_DETECT_CORRUPT =
[error]                      ^
[error] one error found

ankuriitg · 2019-01-29T19:01:47Z

real build error:

[error] /home/jenkins/workspace/SparkPullRequestBuilder@2/core/src/main/scala/org/apache/spark/internal/config/package.scala:931: SHUFFLE_DETECT_CORRUPT is already defined as value SHUFFLE_DETECT_CORRUPT
[error]   private[spark] val SHUFFLE_DETECT_CORRUPT =
[error]                      ^
[error] one error found

Thanks, I realized it after this morning's build failure. The previous test failure confused me and I did not see the build failure before that.

SparkQA · 2019-01-29T23:14:44Z

Test build #101837 has finished for PR 23453 at commit 847898d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito

just some test updates.

@tgravescs how do you feel about my suggested updates to the defaults?

core/src/test/scala/org/apache/spark/util/UtilsSuite.scala

tgravescs · 2019-01-31T20:44:36Z

I think you are talking about:

but always do the on-the-fly check, even if detectCorrupt is false. You could add another option for this also, and have it default true, if really want to allow users to turn off all detection.

so the only place detectCorrupt is used after this change is passed into BufferReleasingInputStream
for the read, and its default is true. If we want to make the exception handling there always present, are you saying just remove the detectCorrupt config or are you suggesting something with the streamCompressedOrEncrypted?

squito · 2019-01-31T22:15:34Z

sorry my earlier comment wasn't clear -- ankur was confused too but I think this version of the patch implements what I think the right option is:

by default, do not copy into memory at all, but do on-the-fly detection of corruption. This alway results in a fetch-failure
optionally, go back to the old behavior, to copy into memory if the block is small. Then you'd first retry pulling the data, and throw a fetch failure after that. Turned on with "spark.shuffle.detectCorrupt.useExtraMemory=true"
optionally, never do any check, even on the fly (using the old config, "spark.shuffle.detectCorrupt=false")

tgravescs · 2019-01-31T22:55:50Z

Ok, That makes sense to me and agree with you.

core/src/test/scala/org/apache/spark/util/UtilsSuite.scala

core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala

SparkQA · 2019-02-01T21:20:48Z

Test build #101996 has finished for PR 23453 at commit 17e36c8.

This patch fails Spark unit tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2019-02-01T21:26:17Z

Test build #101997 has finished for PR 23453 at commit c8d3569.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-02-01T22:40:11Z

Test build #101999 has finished for PR 23453 at commit 80d4e6b.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2019-02-02T03:08:57Z

Jenkins, retest this please

SparkQA · 2019-02-02T07:11:37Z

Test build #102008 has finished for PR 23453 at commit 80d4e6b.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

ankuriitg · 2019-02-04T16:29:00Z

Jenkins, retest this please

SparkQA · 2019-02-04T21:10:32Z

Test build #102037 has finished for PR 23453 at commit 80d4e6b.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala

core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala

core/src/main/scala/org/apache/spark/util/Utils.scala

SparkQA · 2019-02-06T22:40:13Z

Test build #102065 has finished for PR 23453 at commit c0c550f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

core/src/main/scala/org/apache/spark/util/Utils.scala

core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala

SparkQA · 2019-02-12T04:02:51Z

Test build #102223 has finished for PR 23453 at commit 016d0d7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito

a few very minor style things, otherwise lgtm

squito · 2019-03-07T22:39:12Z

core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala

@@ -571,7 +582,8 @@ final class ShuffleBlockFetcherIterator(
    }
  }

-  private def throwFetchFailedException(blockId: BlockId, address: BlockManagerId, e: Throwable) = {
+  private[storage] def throwFetchFailedException(
+      blockId: BlockId, address: BlockManagerId, e: Throwable) = {


nit: if the method declaration is multi-line, each arg on its own line. And this method should have had a return type in the first place (probably my fault, oops)

private[storage] def throwFetchFailedException( blockId: BlockId, address: BlockManagerId, e: Throwable): Unit = {

squito · 2019-03-07T22:41:40Z

core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala

+    private val iterator: ShuffleBlockFetcherIterator,
+    private val blockId: BlockId,
+    private val address: BlockManagerId,
+    private val streamCompressedOrEncrypted: Boolean)


I would rename streamCompressedOrEncrypted to detectCorruption or something like that, as the condition is a bit more complex now (you also check the detectCorrupt config when passing this in)

squito · 2019-03-07T22:47:09Z

core/src/main/scala/org/apache/spark/util/Utils.scala

+  def copyStreamUpTo(in: InputStream, maxSize: Long): (Boolean, InputStream) = {
+    var count = 0L
+    val out = new ChunkedByteBufferOutputStream(64 * 1024, ByteBuffer.allocate)
+    val streamCopied = tryWithSafeFinally {


nit: I think a better name for streamCopied would be fullyCopied ... everytime I come back to this code I always get a bit confused that maybe no copying is happening in some situations.

squito · 2019-03-07T22:50:00Z

core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala

+
+    // Only one block should be returned which has corruption after maxBytesInFlight/3 because the
+    // other block will detect corruption on first fetch, and then get added to the queue again for
+    // a retry


I'd reword this -- you get one block because you call next() once. You really want to explain why you get a certain block back.

We'll get back the block which has corruption after maxBytesInFlight/3 because ...

(assuming I understand this correctly...)

squito · 2019-03-07T22:52:08Z

core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala

+    }
+
+    // Following will succeed as it reads part of the stream which is not corrupt. This will read
+    // maxBytesInFlight/3 bytes from first stream and remaining from the second stream


... will read maxBytesInFlight/3 bytes from the portion copied into memory, and the remaining from the underlying stream.

squito · 2019-03-07T22:54:47Z

core/src/test/scala/org/apache/spark/util/UtilsSuite.scala

+
+    val limit = 1000
+    // testing for inputLength less than, equal to and greater than limit
+    List(998, 999, 1000, 1001, 1002).foreach { inputLength =>


(limit - 2 to limit + 2).foreach

@squito

SPARK-4105 added corruption detection in shuffle blocks but that was limited to blocks which are smaller than maxBytesInFlight/3. This commit adds upon that by adding corruption check for large blocks. There are two changes/improvements that are made in this commit: 1. Large blocks are checked upto maxBytesInFlight/3 size in a similar way as smaller blocks, so if a large block is corrupt in the starting, that block will be re-fetched and if that also fails, FetchFailureException will be thrown. 2. If large blocks are corrupt after size maxBytesInFlight/3, then any IOException thrown while reading the stream will be converted to FetchFailureException. This is slightly more aggressive than was originally intended but since the consumer of the stream may have already read some records and processed them, we can't just re-fetch the block, we need to fail the whole task. Additionally, we also thought about maybe adding a new type of TaskEndReason, which would re-try the task couple of times before failing the previous stage, but given the complexity involved in that solution we decided to not proceed in that direction. Thanks to @squito for direction and support. Testing Done: Changed the junit test for big blocks to check for corruption.

1. Updated comments in the code 2. If IOException is thrown while reading from a stream, it will always be converted to a FetchFailureException, even when detectCorruption is false 3. Added a junit test which verifies that data can be read from concatenated stream

…n Mockito api across versions

1. Minor changes 2. Added a new config for detecting corruption by using extra memory with default set to false 3. Added test cases for copyStreamUpTo

1. Changed test to also compare the contents of stream 2. Other minor refactoring

Changes to unit test case

Minor changes to variable names and comments

SparkQA · 2019-03-08T18:09:57Z

Test build #103217 has finished for PR 23453 at commit b178f65.

This patch fails Java style tests.
This patch does not merge cleanly.
This patch adds no public classes.

SparkQA · 2019-03-08T22:58:08Z

Test build #103218 has finished for PR 23453 at commit bd1a813.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito

Thanks for the update @ankuriitg . sorry I am being very particular about this, its just a really core piece. I have a couple of minor updates to comments -- but really I was doing one final pass and I got pretty confused about the existing in.close(). I'd like us to clean that up now, while we're looking at this, if possible. Left a longer comment inline, please check my reasoning.

squito · 2019-03-11T14:49:17Z

core/src/main/scala/org/apache/spark/util/Utils.scala

+   * Copy all data from an InputStream to an OutputStream upto maxSize and
+   * close the input stream if all data is read.
+   * @return A tuple of boolean, which is whether the stream was fully copied, and an InputStream,
+   *         which is a combined stream of read data and any remaining data


this doc needs updating now. Something like

Copy the first `maxSize` bytes of data from the InputStream to an in-memory buffer, while still exposing the entire original input stream, primarily to check for corruption. This returns a new InputStream which contains the same data as the original input stream. It may be entirely on an in-memory buffer, or it may be a combination of of in-memory data, and then continue to read from the original stream. The only real use of this is if the original input stream will potentially detect corruption while the data is being read (eg. from compression). This allows for an eager check of corruption in the first maxSize bytes of data. @return A tuple of boolean, which is whether the stream was fully copied, and an InputStream which includes all data from the original stream (combining buffered data and remaining data in the original stream)

squito · 2019-03-11T15:00:21Z

core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala

-              // Decompress the whole block at once to detect any corruption, which could increase
-              // the memory usage tne potential increase the chance of OOM.
+              // Decompress the block upto maxBytesInFlight/3 at once to detect any corruption which
+              // could increase the memory usage and potentially increase the chance of OOM.


this comment & the one just above the if are redundant and a little bit wrong -- I think you only need one comment (don't care whether its above or below the if) and should be something more like:

We optionally decompress the first maxBytesInFlight/3 bytes into memory, to check for corruption in that portion of the data. But even if that configuration is off, or if the corruption is later, we'll still try to detect the corruption later in the stream.

squito · 2019-03-11T15:02:27Z

core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala

-              Utils.copyStream(input, out, closeStreams = true)
-              input = out.toChunkedByteBuffer.toInputStream(dispose = true)
+              val (fullyCopied: Boolean, mergedStream: InputStream) = Utils.copyStreamUpTo(
+                input, maxBytesInFlight / 3)


I'm trying to understand why the

finally { // TODO: release the buf here to free memory earlier if (isStreamCopied) { in.close() }

is needed down below. To be honest, I don't think it was needed in the old code. The old Utils.copyStream was always called with closeStreams=true, and that would always close the input in a finally itself:

spark/core/src/main/scala/org/apache/spark/util/Utils.scala

Lines 302 to 307 in d9978fb

def copyStream(

in: InputStream,

out: OutputStream,

closeStreams: Boolean = false,

transferToEnabled: Boolean = false): Long = {

tryWithSafeFinally {

spark/core/src/main/scala/org/apache/spark/util/Utils.scala

Lines 330 to 332 in d9978fb

if (closeStreams) {

try {

in.close()

It doesn't hurt, but it also makes things unnecessarily confusing. If you didn't need to do that in.close() below, you woudln't need to track isStreamCopied, and wouldn't need to even return fullyCopied from Utils.copyStreamUpTo. That's really the part of this which is bugging me -- something seems off that we need to know whether or not the stream is fully copied, seems like it shouldn't matter. If it does matter, aren't we getting something wrong in the case where the stream is exactly maxBytesInFlight / 3, but we haven't realized its fully copied because we haven't read past the end yet?

I think in.close() is needed if there is an exception while creating a wrapped stream. The first time I saw isStreamCopied, I was also confused and by looking at it more closely now, I realize that it is not doing what it is supposed to do.

I have removed isStreamCopied and instead used another condition to close the stream. Please check and let me know if it makes sense.

squito · 2019-03-11T15:03:15Z

core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala

+              val (fullyCopied: Boolean, mergedStream: InputStream) = Utils.copyStreamUpTo(
+                input, maxBytesInFlight / 3)
+              isStreamCopied = fullyCopied
+              input = mergedStream


not related to your changes, but while you're touching this file, can you add 2 more spaces of indentation to || corruptedBlocks.contains(blockId)) { a couple of lines below this?

1. Ensured input stream is closed on exception 2. Minor comments changes

SparkQA · 2019-03-11T21:06:18Z

Test build #103338 has finished for PR 23453 at commit d36c862.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

squito · 2019-03-12T19:28:56Z

merged to master, thanks @ankuriitg !

squito reviewed Jan 4, 2019

View reviewed changes

rezasafi reviewed Jan 10, 2019

View reviewed changes

squito reviewed Jan 25, 2019

View reviewed changes

ankuriitg force-pushed the ankurgupta/SPARK-26089 branch from d40d396 to abbec63 Compare January 28, 2019 23:23

squito reviewed Jan 31, 2019

View reviewed changes

core/src/test/scala/org/apache/spark/util/UtilsSuite.scala Show resolved Hide resolved

ankuriitg force-pushed the ankurgupta/SPARK-26089 branch from 17e36c8 to c8d3569 Compare February 1, 2019 17:18

squito reviewed Feb 1, 2019

View reviewed changes

squito reviewed Feb 5, 2019

View reviewed changes

squito reviewed Feb 11, 2019

View reviewed changes

squito approved these changes Mar 7, 2019

View reviewed changes

ankuriitg added 11 commits March 8, 2019 09:52

Correct indentation

365b27d

Replaced 'getArgumentAt()' with 'getArguments()' because of changes i…

2ad6547

…n Mockito api across versions

Review comments: Part 2

15ee096

1. Minor changes 2. Added a new config for detecting corruption by using extra memory with default set to false 3. Added test cases for copyStreamUpTo

Fixed compilation errors after merging with master

ede5178

Review comments: Part 3

5b4e74f

1. Changed test to also compare the contents of stream 2. Other minor refactoring

Review comments: Part 4

1870ff2

Review comments: Part 5

980f2bc

Changes to unit test case

Review comments: Part 6

e7d9894

Review Comments: Part 7

bd1a813

Minor changes to variable names and comments

ankuriitg force-pushed the ankurgupta/SPARK-26089 branch from b178f65 to bd1a813 Compare March 8, 2019 18:10

squito reviewed Mar 11, 2019

View reviewed changes

Review Comments: Part 8

d36c862

1. Ensured input stream is closed on exception 2. Minor comments changes

asfgit closed this in 688b0c0 Mar 12, 2019

turboFei mentioned this pull request Apr 25, 2019

[WIP][SPARK-27562][Shuffle]Complete the verification mechanism for shuffle transmitted data #24447

Closed

turboFei mentioned this pull request May 14, 2020

[SPARK-27562][Shuffle] Complete the verification mechanism for shuffle transmitted data #28525

Closed

	def copyStream(
	in: InputStream,
	out: OutputStream,
	closeStreams: Boolean = false,
	transferToEnabled: Boolean = false): Long = {
	tryWithSafeFinally {

[SPARK-26089][CORE] Handle corruption in large shuffle blocks #23453

[SPARK-26089][CORE] Handle corruption in large shuffle blocks #23453

Conversation

ankuriitg commented Jan 4, 2019

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Jan 4, 2019

squito left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ankuriitg Jan 4, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Jan 4, 2019

SparkQA commented Jan 7, 2019

SparkQA commented Jan 7, 2019

rezasafi left a comment

Choose a reason for hiding this comment

ankuriitg commented Jan 10, 2019

squito left a comment

Choose a reason for hiding this comment

SparkQA commented Jan 28, 2019

SparkQA commented Jan 29, 2019

ankuriitg commented Jan 29, 2019

SparkQA commented Jan 29, 2019

squito commented Jan 29, 2019

ankuriitg commented Jan 29, 2019

SparkQA commented Jan 29, 2019

squito left a comment

Choose a reason for hiding this comment

tgravescs commented Jan 31, 2019

squito commented Jan 31, 2019

tgravescs commented Jan 31, 2019

SparkQA commented Feb 1, 2019

SparkQA commented Feb 1, 2019

SparkQA commented Feb 1, 2019

squito commented Feb 2, 2019

SparkQA commented Feb 2, 2019

ankuriitg commented Feb 4, 2019

SparkQA commented Feb 4, 2019

SparkQA commented Feb 6, 2019

SparkQA commented Feb 12, 2019

squito left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Mar 8, 2019

SparkQA commented Mar 8, 2019

squito left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SparkQA commented Mar 11, 2019

squito commented Mar 12, 2019

ankuriitg Jan 4, 2019 •

edited

Loading