-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New class PagedBytesReference: BytesReference over pages #5427
Conversation
|
||
public class PagedBytesReference implements BytesReference { | ||
|
||
private ByteArray bytearray; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be final?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used to be - done.
Squashed version with non-materializing equals()/hashCode(), support for zero-copy writeTo(OutputStream), composite ChannelBuffers and randomized tests with lots of corner-case fixes esp. for slices. The Netty buffers are non-gathering for now until I've read up on how Netty implements scatter/gather and the implications. |
Here is a quick breakdown of gathering in Netty. Gathering tells netty if to send the list of ByteBuffer[] to the nio layer (when true), or write it one by one to the nio layer (when false). This is only enabled on Java 7 because of bugs in sending the array of buffers in 1.6. (for Netty logic, see The part that is tricky is the fact that in the NIO layer, at least in 1.7, ends up taking those ByteBuffer arrays, copying over the direct buffers, and then trying to write as much as possible. If this message is very large, that it can't be sent at a single go, then extra copying will happen over and over again. Imagine sending a 1MB, 10 of 100k buffers, and only 1 buffer was sent at one go, then it will copy back the rest of the 9 buffers to try and send them again. (see IOUtil.java in the jdk source code). This is not the end of the world, and this is what happens today without the paged data structure, so its already an improvement (assuming we set gathering to |
// we can use gathering writes from the ChannelBuffers, but only if they are | ||
// moderately small to prevent OOMs due to DirectBuffer allocations. | ||
// up to 32 buffers result in gathering writes for payloads <= 512k. | ||
return ChannelBuffers.wrappedBuffer((numBuffers <= 32), buffers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have this limit as a constant, and respect the PAGE_SIZE constance as well in case we decided to change it in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest commit ignores number of pages (and thus page size) and only uses a constant of 512k.
|
||
@Override | ||
public boolean hasArray() { | ||
return (length <= PAGE_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BigByteArray materializes as soon as you ask for something that goes across several pages. So I think this should returnoffset + length <= PAGE_SIZE
instead.
It looks good to me now. Maybe one last thing that should be fixed would be to make the equals method accept any BytesReference instance by falling back to |
bytes = Arrays.copyOf(ref.bytes, length); | ||
} | ||
else { | ||
// BigArray has materialized for us, no need to do it again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumption is wrong if offset >= N * PAGE_SIZE
and offset + length < (N+1) * PAGE_SIZE
. I think the only way to fix it would be to make ByteArray.get return a boolean which can be used to know whether a copy has already been performed.
@jpountz commit 7b1c94a has the last requested change using the return value from ByteArray.get(), as discussed. (not sure why old commits are suddenly showing up here?!) |
This might be because you rebased against master?
Lookgs good, you have my +1. Maybe @kimchy and/or @s1monw would like to do another round before it gets merged? |
I only did a quick glance but it looks good - go move forward |
existing tests. Non-allocating hashCode/Equals, zero-copy writeTo() and ChannelBuffer support. Fix for #5427
existing tests. Non-allocating hashCode/Equals, zero-copy writeTo() and ChannelBuffer support. Fix for #5427
existing tests. Non-allocating hashCode/Equals, zero-copy writeTo() and ChannelBuffer support. Fix for #5427
Finally in master & 1.x .. \o/ |
BytesRefererence over a BigArray/ByteArray list of pages. Passes both a boatload of new tests and the full test suite. About as compatible as it can be for now; minor perf improvements require extensions to BigArrays (see #5420)