Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use optimized writeV methods in Mutations #669

Closed
wants to merge 1 commit into from

Conversation

@milleruntime
Copy link
Contributor

commented Oct 2, 2018

  • Moved optimized writeVLong and writeVInt to UnsynchronizedBuffer
    to replace use of WriteableUtils methods. Each optimized method will
    only make one write call to the underlying outputstream.

This was mainly done as an attempt to improve WAL performance. It "should" help performance of mutations overall since the Hadoop WriteableUtils method can make multiple write calls per long.

@@ -1613,22 +1615,28 @@ public void write(DataOutput out) throws IOException {
}
out.write((byte) (0x80 | hasValues));

WritableUtils.writeVInt(out, row.length);
int size = UnsynchronizedBuffer.writeVInt(integerBuffer, 0, row.length);

This comment has been minimized.

Copy link
@keith-turner

keith-turner Oct 2, 2018

Contributor

With a utility method like the following, could replace the repeated two lines with a single line.

public static void writeVInt(OutputStream out, byte[] workBuffer, int i){
   int size = writeVInt(workBuffer, 0, i);
   out.write(workBuffer, 0 size);
}

This comment has been minimized.

Copy link
@keith-turner

keith-turner Oct 2, 2018

Contributor

Oh, also this utillity method could have a comment about how its faster because it avoid writing single bytes which may be sync.

@@ -1603,6 +1603,8 @@ private void oldReadFields(byte first, DataInput in) throws IOException {

}

private final byte[] integerBuffer = new byte[4];

This comment has been minimized.

Copy link
@keith-turner

keith-turner Oct 2, 2018

Contributor

This could be a local var in the write method.

This comment has been minimized.

Copy link
@milleruntime

milleruntime Oct 2, 2018

Author Contributor

Wouldn't this result in a new array be created every time the method is called? Vs just once, and then reused.

This comment has been minimized.

Copy link
@keith-turner

keith-turner Oct 2, 2018

Contributor

Yeah, but I think that is ok. I think a common use case after calling write would be to discard the object. So I suspect their is no gain from the global var and an opportunity for a tighter scope is missed.

This comment has been minimized.

Copy link
@milleruntime

milleruntime Oct 2, 2018

Author Contributor

Oh I see what you are saying... it won't make a difference because of how Mutation is typically used. What about for ServerMutation?

This comment has been minimized.

Copy link
@keith-turner

keith-turner Oct 2, 2018

Contributor

Also, since the buffer is used multiple times in the method, this amortizes the cost of the allocation per method call.

This comment has been minimized.

Copy link
@milleruntime

milleruntime Oct 2, 2018

Author Contributor

Yeah, good point. I will make them local

@@ -31,6 +32,8 @@
*/
public class ServerMutation extends Mutation {
private long systemTime = 0L;
private final byte[] timeBuffer = new byte[8];

This comment has been minimized.

Copy link
@keith-turner

keith-turner Oct 2, 2018

Contributor

Could make these local vars.

@@ -1613,22 +1615,22 @@ public void write(DataOutput out) throws IOException {
}
out.write((byte) (0x80 | hasValues));

WritableUtils.writeVInt(out, row.length);
UnsynchronizedBuffer.writeVInt(out, integerBuffer, row.length);

This comment has been minimized.

Copy link
@ctubbsii

ctubbsii Oct 2, 2018

Member

Is this change compatible with the old implementation? If not, we should highlight that any persisted Mutations won't work across versions (if users did that) in the release notes.

This comment has been minimized.

Copy link
@milleruntime

milleruntime Oct 2, 2018

Author Contributor

It should be. It should be writing the same bytes... i.e. when only 2 bytes from an int get written, those same 2 bytes should get written.

This comment has been minimized.

Copy link
@keith-turner

keith-turner Oct 2, 2018

Contributor

I don't think it changes persisted data. Which makes think that this change could be made to 1.9.

This comment has been minimized.

Copy link
@ctubbsii

ctubbsii Oct 2, 2018

Member

It's just not obvious that the two different writeVInt methods should be expected to have the sample serialization format in all cases. If they do, this is fine. If not, it's also fine, we just need a note in the release notes.

This comment has been minimized.

Copy link
@keith-turner

keith-turner Oct 2, 2018

Contributor

would probably be nice to add to the javadocs that the methods are compat with hadoops methods

This comment has been minimized.

Copy link
@milleruntime

milleruntime Oct 2, 2018

Author Contributor

I added a few tests to make sure we are writing the same as WritableUtils.

@milleruntime

This comment has been minimized.

Copy link
Contributor Author

commented Oct 2, 2018

@keith-turner and @ctubbsii think this change is OK to make to 1.9? I believe the Mutation write method is only used in LogFileValue.java AKA the WAL. If this does improve WAL performance, I feel it would be worth calling it a bug fix for 1.9 ;-D

Use optimized writeV methods in Mutations
* Moved optimized writeVLong and writeVInt to UnsynchronizedBuffer
to replace use of WriteableUtils methods. Each optimized method will
only make one write call to the underlying outputstream.

@milleruntime milleruntime force-pushed the milleruntime:wal-perf branch from 69dcaf1 to 08acd65 Oct 3, 2018

asfgit pushed a commit that referenced this pull request Oct 3, 2018

Use optimized writeV methods in Mutations (#669)
* Moved optimized writeVLong and writeVInt to UnsynchronizedBuffer
to replace use of WriteableUtils methods. Each optimized method will
only make one write call to the underlying outputstream.
@milleruntime

This comment has been minimized.

Copy link
Contributor Author

commented Oct 3, 2018

Squashed and cherry picked into 1.9. Than merged forward back into master.

@milleruntime milleruntime deleted the milleruntime:wal-perf branch Dec 3, 2018

@ctubbsii ctubbsii added this to Done in 1.9.3 Jun 14, 2019

@ctubbsii ctubbsii added this to Done in 2.0.0 Jun 14, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
3 participants
You can’t perform that action at this time.