Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New class PagedBytesReference: BytesReference over pages #5427

Closed
wants to merge 1 commit into from
Closed

New class PagedBytesReference: BytesReference over pages #5427

wants to merge 1 commit into from

Conversation

hhoffstaette
Copy link

BytesRefererence over a BigArray/ByteArray list of pages. Passes both a boatload of new tests and the full test suite. About as compatible as it can be for now; minor perf improvements require extensions to BigArrays (see #5420)


public class PagedBytesReference implements BytesReference {

private ByteArray bytearray;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be final?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used to be - done.

@hhoffstaette
Copy link
Author

Squashed version with non-materializing equals()/hashCode(), support for zero-copy writeTo(OutputStream), composite ChannelBuffers and randomized tests with lots of corner-case fixes esp. for slices. The Netty buffers are non-gathering for now until I've read up on how Netty implements scatter/gather and the implications.

@kimchy
Copy link
Member

kimchy commented Mar 17, 2014

Here is a quick breakdown of gathering in Netty. Gathering tells netty if to send the list of ByteBuffer[] to the nio layer (when true), or write it one by one to the nio layer (when false). This is only enabled on Java 7 because of bugs in sending the array of buffers in 1.6. (for Netty logic, see SocketSendBufferPool#acquire,

The part that is tricky is the fact that in the NIO layer, at least in 1.7, ends up taking those ByteBuffer arrays, copying over the direct buffers, and then trying to write as much as possible. If this message is very large, that it can't be sent at a single go, then extra copying will happen over and over again. Imagine sending a 1MB, 10 of 100k buffers, and only 1 buffer was sent at one go, then it will copy back the rest of the 9 buffers to try and send them again. (see IOUtil.java in the jdk source code).

This is not the end of the world, and this is what happens today without the paged data structure, so its already an improvement (assuming we set gathering to true, which should be the default if we don't add any other logic). But, we can potentially do better, and only try and send in gathering mode if the total length of the bytes is less than, lets say 512k (or something we expect, in the most common case, to succeed). If its more, it might be worth it to send the buffers one by one.

// we can use gathering writes from the ChannelBuffers, but only if they are
// moderately small to prevent OOMs due to DirectBuffer allocations.
// up to 32 buffers result in gathering writes for payloads <= 512k.
return ChannelBuffers.wrappedBuffer((numBuffers <= 32), buffers);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have this limit as a constant, and respect the PAGE_SIZE constance as well in case we decided to change it in the future?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest commit ignores number of pages (and thus page size) and only uses a constant of 512k.


@Override
public boolean hasArray() {
return (length <= PAGE_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigByteArray materializes as soon as you ask for something that goes across several pages. So I think this should returnoffset + length <= PAGE_SIZE instead.

@jpountz
Copy link
Contributor

jpountz commented Mar 24, 2014

It looks good to me now. Maybe one last thing that should be fixed would be to make the equals method accept any BytesReference instance by falling back to BytesReference.Helper.bytesEquals (that should soon not materialize arrays anymore when #5517 is pushed) when the other instance is not an instance of PagedBytesReference?

bytes = Arrays.copyOf(ref.bytes, length);
}
else {
// BigArray has materialized for us, no need to do it again
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumption is wrong if offset >= N * PAGE_SIZE and offset + length < (N+1) * PAGE_SIZE. I think the only way to fix it would be to make ByteArray.get return a boolean which can be used to know whether a copy has already been performed.

@hhoffstaette
Copy link
Author

@jpountz commit 7b1c94a has the last requested change using the return value from ByteArray.get(), as discussed. (not sure why old commits are suddenly showing up here?!)

@jpountz
Copy link
Contributor

jpountz commented Mar 25, 2014

not sure why old commits are suddenly showing up here?

This might be because you rebased against master?

commit 7b1c94a has the last requested change using the return value from ByteArray.get()

Lookgs good, you have my +1. Maybe @kimchy and/or @s1monw would like to do another round before it gets merged?

@s1monw
Copy link
Contributor

s1monw commented Mar 25, 2014

I only did a quick glance but it looks good - go move forward

existing tests. Non-allocating hashCode/Equals, zero-copy writeTo() and
ChannelBuffer support.

Fix for #5427
hhoffstaette pushed a commit that referenced this pull request Mar 25, 2014
existing tests. Non-allocating hashCode/Equals, zero-copy writeTo() and
ChannelBuffer support.

Fix for #5427
hhoffstaette pushed a commit that referenced this pull request Mar 25, 2014
existing tests. Non-allocating hashCode/Equals, zero-copy writeTo() and
ChannelBuffer support.

Fix for #5427
@hhoffstaette
Copy link
Author

Finally in master & 1.x .. \o/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants