Skip to content

Conversation

@sitalkedia
Copy link

@sitalkedia sitalkedia commented Oct 9, 2016

What changes were proposed in this pull request?

Currently we use BufferedInputStream to read the shuffle file which copies the file content from os buffer cache to the user buffer. This adds additional latency in reading the spill files. We made a change to use java nio's direct buffer to read the spill files and for certain pipelines spilling significant amount of data, we see up to 7% speedup for the entire pipeline.

How was this patch tested?

Tested by running the job in the cluster and observed up to 7% speedup.

…ffer to read the spill files in order to avoid additional copy
@sitalkedia
Copy link
Author

cc - @rxin

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems conceptually sound and the benchmark shows good results. I wonder, is this not something already available in the JDK? I could only find this from the JDK, which doesn't do internal buffering, but does some more stuff that might be worth comparing notes with: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/sun/nio/ch/ChannelInputStream.java/

*/
public final class NioBasedBufferedFileInputStream extends InputStream {

ByteBuffer bb;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private final

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.


FileChannel ch;

public NioBasedBufferedFileInputStream(File file, int bufferSize) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be reasonable to establish a default constructor with some default buffer size, like BufferedInputStream?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that makes sense, changed accordingly.


public NioBasedBufferedFileInputStream(File file, int bufferSize) throws IOException {
bb = ByteBuffer.allocateDirect(bufferSize);
FileInputStream f = new FileInputStream(file);
Copy link
Member

@srowen srowen Oct 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure about this, but seems so from reading the source: closing the channel won't free the file descriptor? can you just hang on to this ref as well to close it in close() for completeness?

Better yet, can't we just use FileChannel.open(...) and not bother with the stream?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed accordingly.

bb.flip();
}

public boolean refill() throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

public void close() throws IOException {
ch.close();
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll want to implement skip() and available() too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip() is already implemented by underlying InputStream so that is not needed. Implemented available() .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skip() in InputStream will call read(), this is not the optimal solution

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, we are not using skip() anywhere, but I implemented an optimal version of it anyways.


final BufferedInputStream bs =
new BufferedInputStream(new FileInputStream(file), (int) bufferSizeBytes);
final NioBasedBufferedFileInputStream bs =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I think IndexShuffleBlockResolver, RowQueue have instances that could use a similar treatment.

Can this reference be InputStream?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there are other instance in the code where we are using BufferedInputStream. I will file raise another PR to replace those with NioBasedBufferedFileInputStream.

@SparkQA
Copy link

SparkQA commented Oct 9, 2016

Test build #66610 has finished for PR 15408 at commit bd741fa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public final class NioBasedBufferedFileInputStream extends InputStream

@sitalkedia
Copy link
Author

Seems conceptually sound and the benchmark shows good results. I wonder, is this not something already available in the JDK?

Yes, I looked around a bit and unfortunately, I could not find any implementation in JDK which support buffering based nio read.

@SparkQA
Copy link

SparkQA commented Oct 9, 2016

Test build #66612 has finished for PR 15408 at commit 12c9ded.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* but does not support buffering.
*
*/
public final class NioBasedBufferedFileInputStream extends InputStream {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add a test suite specifically for this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a test suite for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great - thanks. Just FYI in general I find ScalaTest to be a better framework than JUnit, so it is a good idea to write tests in Scala even if the class is implemented in Java. Not a big deal though.

@rxin
Copy link
Contributor

rxin commented Oct 10, 2016

Can you also expand on what the 7% means? Is it some workload end-to-end that's been improved by 7%, or the sorting itself improves by 7%?

bb = ByteBuffer.allocateDirect(bufferSize);
ch = FileChannel.open(file.toPath(), StandardOpenOption.READ);
ch.read(bb);
bb.flip();
Copy link
Contributor

@witgo witgo Oct 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ch.read(bb); can be removed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed, thanks!

@sitalkedia
Copy link
Author

sitalkedia commented Oct 10, 2016

Can you also expand on what the 7% means? Is it some workload end-to-end that's been improved by 7%, or the sorting itself improves by 7%?

The perf improvement was end-to-end which means the sorting improvement is definitely more than that. Also, changed the PR description to make that clear.

@SparkQA
Copy link

SparkQA commented Oct 10, 2016

Test build #66621 has finished for PR 15408 at commit 856593a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

*/
public final class NioBasedBufferedFileInputStream extends InputStream {

private static int DEFAULT_BUFFER_SIZE = 8192;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't do this consistently, but would be useful to rename this DEFAULT_BUFFER_SIZE_BYTES so it is clear what the unit is

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed.


private final FileChannel fileChannel;

public NioBasedBufferedFileInputStream(File file, int bufferSize) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here too, bufferSizeInBytes

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

this(file, DEFAULT_BUFFER_SIZE);
}

private boolean refill() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

document this function and explain what the ret value means?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

@Override
public int read(byte[] b, int off, int len) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

off -> offset

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

public int available() throws IOException {
return byteBuffer.remaining();
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove one blank line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

@Override
public int available() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one question: how is this used? AFAIK InputStream always return 0 for this. Is this used somewhere else for performance improvements?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a standard API method for InputStream, so if something does call it, it's better to return a useful answer because it's pretty easy to do so.

Copy link
Contributor

@mridulm mridulm Oct 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, InputStream simply has a stub impl (just like read(byte[]) loops through read). Providing .remaining() can be useful


@Override
public long skip(long n) throws IOException {
if(n < 0L) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a space after if

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

fileChannel.position(size);
return size - currentFilePosition;
}
else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move else to previous line

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

return skipFromFileChannel(toSkip);
}

private long skipFromFileChannel(long n) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just inline this in the previous method, and add an else case so both cases are indented the same level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually never mind this comment


@Override
public void close() throws IOException {
fileChannel.close();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to free the buffer or anything?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there is a cleaner in DirectByteBuffer which frees up the memory when the object is garbage collected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK. I vaguely remember Spark ran into an issue in ~ Spark 1.0 that the direct bytebuffers were not cleared fast enough (because the life cycle is decoupled from GC). We introduced StorageUtils.dispose and made buffer clearing more aggressive. It might be good to call that here too, just in case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see...that makes sense. Added an explicit call to cleanup the byte buffer.

}

@Test
public void testBytesSkipped() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test case for negative skips?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

}

@Test
public void testReadMultipleBytes() throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should construct a test case that triggers the code path in skipFromFileChannel very explicitly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Actually I discovered a bug with my previous implementation after adding this test case, which I have fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me --- we should add test coverage reporting to Spark so it becomes obvious what branches are not tested sufficiently.

@rxin
Copy link
Contributor

rxin commented Oct 10, 2016

Looks pretty good overall. I left some comments mostly on styling, documentation, and test cases.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good; I don't know that it makes sense to change other instances separately? Changing all instances of buffering a file input stream seems like one coherent logical change (but the titles need to be updates)

}

@Override
public int available() throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a standard API method for InputStream, so if something does call it, it's better to return a useful answer because it's pretty easy to do so.

if(n < 0L) {
return 0L;
}
if (byteBuffer.remaining() > n) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

= n ? or did I miss something obvious?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have enough data in the bytebuffer to skip then we can skip from the buffer and no need to skip from the file. Looks good to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my comment rendered incorrectly. I'm asking if the condition can be >= n. If there are exactly n bytes remaining then you can skip the entire buffer, right? it need not be > n?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can change the condition to be >=n. Although the behavior remains the same because skipping 0 bytes from the underlying file channel is essentially a no-op. But changed it avoid confusion.

*/
public class NioBasedBufferedFileInputStreamSuite {

byte[] randomBytes;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: private should always be the default

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@sitalkedia
Copy link
Author

Thanks @srowen, @rxin for the review comments. I have addressed them.

@SparkQA
Copy link

SparkQA commented Oct 10, 2016

Test build #66667 has finished for PR 15408 at commit 3d48a16.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 11, 2016

Test build #66749 has finished for PR 15408 at commit 30173fa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sitalkedia
Copy link
Author

jenkins retest this please.

Copy link
Member

@zsxwing zsxwing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Please add synchronized to the public methods except close (the current close method is already thread-safe). The javadoc of InputStream says nothing about thread-safety but the implementation like ChannelInputStream, ByteArrayInputStream, BufferedInputStream implies that.
  • A minor behavior change is NioBufferedFileInputStream doesn't close fd in finalize like FileInputStream. But that's not a big deal since the user should do it instead of counting on GC.

private long skipFromFileChannel(long n) throws IOException {
long currentFilePosition = fileChannel.position();
long size = fileChannel.size();
if (currentFilePosition + n > size) {
Copy link
Member

@zsxwing zsxwing Oct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use n > size - currentFilePosition to avoid overflow (e.g., n is Long.MAX_VALUE).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, changed accordingly.


@Override
public int read(byte[] b, int offset, int len) throws IOException {
if (!refill()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please add the defense codes like BufferedInputStream

        if ((off | len | (off + len) | (b.length - (off + len))) < 0) {
            throw new IndexOutOfBoundsException();
        } else if (len == 0) {
            return 0;
        }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@SparkQA
Copy link

SparkQA commented Oct 12, 2016

Test build #66765 has finished for PR 15408 at commit 30173fa.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also missed why close() shouldn't be synchronized. You wouldn't want it to take place during a read, right?

I think a finalizer wouldn't hurt here because FileInputStream does it. It's exactly because people may not properly close the stream.

WDYT @zsxwing ?

Getting quite close here I think.

if (!refill()) {
return -1;
}
if ((offset | len | (offset + len) | (b.length - (offset + len))) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this must be a typo... this is a bitwise-or of a bunch of ints. I think maybe the example was given as a sort of pseudo code. offset and len can't be negative and their sum shouldn't exceed the array lenght, but that seems like it. 'else' below isn't really needed, but that's a nit. Actually, you want to check whether len is after updating on line 94 right? if no data is available you also return 0 without a no-op read?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen I think this check is correct. It checks if any of these values is negative.

offset and len can't be negative and their sum shouldn't exceed the array length.

It's possible. offset and len are set by the caller.

@sitalkedia Could you move the argument check before refill? If the argument is invalid, we should always throw an exception instead of returning -1.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the 'else' part and moved the argument check before refill as suggested.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zsxwing oh I get it, you're really just checking if any top bit is set. Hm, that seems fairly obscure compared to just checking the arguments. Sure it's a little more code, but it's readable and i don't think this is performance critical? I hadn't seen this before and had to think a while to get what it does.

Yes, I'm agreeing that offset and len can't be allowed to be negative, not that it can't be specified by the caller as something negative.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen this line comes from BufferedInputStream. I'm ok with rewriting it for readability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. Well it's reasonable to keep it then, but I'd at least add a comment I think. I didn't recognize this at all until I thought about it for a good few minutes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen - Agreed, the condition was little confusing. Changed it to make it clearer.

@SparkQA
Copy link

SparkQA commented Oct 12, 2016

Test build #66830 has finished for PR 15408 at commit 439cead.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@zsxwing
Copy link
Member

zsxwing commented Oct 12, 2016

I also missed why close() shouldn't be synchronized. You wouldn't want it to take place during a read, right?

Good point. I was thinking FileChannel.close is safe but didn't consider this case.

I think a finalizer wouldn't hurt here because FileInputStream does it. It's exactly because people may not properly close the stream.

Agreed that it doesn't hurt the performance since FileInputStream does it.

@SparkQA
Copy link

SparkQA commented Oct 13, 2016

Test build #66860 has finished for PR 15408 at commit b74fb36.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 14, 2016

Test build #66972 has finished for PR 15408 at commit 769b3f0.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@sitalkedia
Copy link
Author

jenkins retest this please.

@SparkQA
Copy link

SparkQA commented Oct 14, 2016

Test build #66978 has finished for PR 15408 at commit 769b3f0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we're down to pretty trivial changes, thank you. This will be a nice win and I am glad we've scrutinized it because it will be in a lot of critical paths.

* TODO: support {@link #mark(int)}/{@link #reset()}
*
*/
@ThreadSafe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only see one location in the codebase where we use this annotation, and I think we probably shouldn't use it at all if not used consistently.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, removed it for consistency.

@ThreadSafe
public final class NioBufferedFileInputStream extends InputStream {

private static int DEFAULT_BUFFER_SIZE_BYTES = 8192;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, forgot to say this should be final

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* {@link sun.nio.ch.ChannelInputStream} supports reading a file using nio,
* but does not support buffering.
*
* TODO: support {@link #mark(int)}/{@link #reset()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really care, but this could be a comment inside the class rather than user-facing. In fact I don't even know it's a to-do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I removed the TODO here.


@Override
public synchronized int read(byte[] b, int offset, int len) throws IOException {
if (offset < 0 || len < 0 || (offset + len) < 0 || (b.length - (offset + len)) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardly matters, but now that this condition has been made more explicit, then final condition is simpler as offset + len > b.length

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to check if offset and len is less than 0 right? Removed theoffset + len < 0 condition because that is covered in the last condition (b.length - (offset + len)) < 0

Copy link
Author

@sitalkedia sitalkedia Oct 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore my previous comment, we still need it.


@Override
protected void finalize() throws IOException {
close();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: 2-space indent not 4

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good eye, wonder why the checkstyle did not catch it though?


import org.apache.spark.storage.StorageUtils;

import javax.annotation.concurrent.ThreadSafe;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit^2 : no longer needed as an import


@Override
public synchronized int read(byte[] b, int offset, int len) throws IOException {
if (offset < 0 || len < 0 || (b.length - (offset + len)) < 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no I think that condition was needed. I mean: if (offset < 0 || len < 0 || offset + len < 0 || offset + len > b.length) {

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending tests

@SparkQA
Copy link

SparkQA commented Oct 15, 2016

Test build #67014 has finished for PR 15408 at commit f1f108f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 15, 2016

Test build #67016 has finished for PR 15408 at commit 5306fb0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Oct 17, 2016

Going once, going twice, any more comments?

@zsxwing
Copy link
Member

zsxwing commented Oct 17, 2016

LGTM. Merging to master. Thanks!

@asfgit asfgit closed this in c7ac027 Oct 17, 2016
robert3005 pushed a commit to palantir/spark that referenced this pull request Nov 1, 2016
…ream in order to avoid additional copy from os buffer cache to user buffer

## What changes were proposed in this pull request?

Currently we use BufferedInputStream to read the shuffle file which copies the file content from os buffer cache to the user buffer. This adds additional latency in reading the spill files. We made a change to use java nio's direct buffer to read the spill files and for certain pipelines spilling significant amount of data, we see up to 7% speedup for the entire pipeline.

## How was this patch tested?
Tested by running the job in the cluster and observed up to 7% speedup.

Author: Sital Kedia <skedia@fb.com>

Closes apache#15408 from sitalkedia/skedia/nio_spill_read.
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…ream in order to avoid additional copy from os buffer cache to user buffer

## What changes were proposed in this pull request?

Currently we use BufferedInputStream to read the shuffle file which copies the file content from os buffer cache to the user buffer. This adds additional latency in reading the spill files. We made a change to use java nio's direct buffer to read the spill files and for certain pipelines spilling significant amount of data, we see up to 7% speedup for the entire pipeline.

## How was this patch tested?
Tested by running the job in the cluster and observed up to 7% speedup.

Author: Sital Kedia <skedia@fb.com>

Closes apache#15408 from sitalkedia/skedia/nio_spill_read.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants