Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-18188] Add checksum for shuffle blocks #15894

Closed
wants to merge 1 commit into from

Conversation

davies
Copy link
Contributor

@davies davies commented Nov 16, 2016

What changes were proposed in this pull request?

TBD

How was this patch tested?

Existing tests.

@SparkQA
Copy link

SparkQA commented Nov 16, 2016

Test build #68680 has finished for PR 15894 at commit a68171c.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class ChecksumOutputStream extends FilterOutputStream

@@ -92,12 +91,27 @@ public ByteBuffer nioByteBuffer() throws IOException {
}

@Override
public InputStream createInputStream() throws IOException {
public InputStream createInputStream(boolean checksum) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, is this only for testing? because it's otherwise very expensive to compute, requiring two passes over the file.

If so, should it not be in some more package-private method only and not exposed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good question.

Actually we already have checksum along with compression, we could move the decompression a little bit earlier to detect the corruption in block fetcher, will try that soon.

import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.util.zip.Adler32;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be better to abstract out the checksum functions into some class so it's easier to change in the future.

@@ -171,14 +179,30 @@ final class ShuffleBlockFetcherIterator(
override def onBlockFetchSuccess(blockId: String, buf: ManagedBuffer): Unit = {
// Only add the buffer to results queue if the iterator is not zombie,
// i.e. cleanup() has not been called yet.
val in = try {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: move this out of callback to not block network pool

@davies
Copy link
Contributor Author

davies commented Nov 18, 2016

Due to complexity and overhead here, close it in favor of #15923.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants