Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize `ProtoBufSerializationProvider` by minimizing copying #943

Merged
merged 3 commits into from Feb 21, 2020

Conversation

@idelpivnitskiy
Copy link
Member

idelpivnitskiy commented Feb 20, 2020

Motivation:

Current implementation of ProtoBufSerializationProvider has a
few performance issues:

  1. ProtoDeserializer may create an intermediate copy of the composite
    buffers and then copy it again inside proto parser.
  2. DefaultSerializer.DEFAULT_SIZE_ESTIMATOR inaccurately predicts
    size for destination Buffer. For aggregated API we may use
    getSerializedSize() to allocate enough bytes for destination
    Buffer.
  3. MessageLite.writeTo(OutputStream) creates an internal copy of
    data that could be avoided if we create CodedOutputStream from
    internal data storage of destination Buffer.

Modifications:

  • Create optimized version of CodedInputStream for
    ProtoDeserializer to do less copying during proto deserialization;
  • Provide a number of bytesEstimate for serialization for aggregated
    API;
  • Create optimized version of CodedOutputStream for
    ProtoSerializer to do less copying during proto serialization;

Results:

Less copying during protobuf serialization/deserialization leads to
improved throughput by 5-15% on the gRPC client and server when
they (de)serialize 16Kb payload body.

Motivation:

Current implementation of `ProtoBufSerializationProvider` has a
few performance issues:
1. `ProtoDeserializer` may create an intermediate copy of composite
buffers and then copy it again inside proto parser.
2. `DefaultSerializer.DEFAULT_SIZE_ESTIMATOR` inaccurately predicts
size for destination `Buffer`. For aggregated API we may use
`getSerializedSize()` to allocate enough bytes for destination
`Buffer`.
3. `MessageLite.writeTo(OutputStream)` creates internal copy of
data that could be avoided if we create `CodedOutputStream` from
internal data storage of destination `Buffer`.

Modifications:

- Create optimized version of `CodedInputStream` for
`ProtoDeserializer` to do less copying during proto deserialization;
- Provide number of `bytesEstimate` for serialization for aggregated
API;
- Create optimized version of `CodedOutputStream` for
`ProtoSerializer` to do less copying during proto serialization;

Results:

Less copying during protobuf serialization/deserialization leads to
improved throughput by 5-15% on the gRPC client and server when
they (de)serialize 16Kb payload body.
@idelpivnitskiy idelpivnitskiy requested review from NiteshKant and Scottmitch Feb 20, 2020
// To avoid unnecessary copying, we use newCodedInputStream(buffers, lengthOfData).
final ByteBuffer[] buffers = toDeserialize.toNioBuffers(toDeserialize.readerIndex(),
lengthOfData);
in = buffers.length == 1 ? CodedInputStream.newInstance(buffers[0]) :

This comment has been minimized.

Copy link
@NiteshKant

NiteshKant Feb 21, 2020

Member

Is this check necessary since we will only be in this else block if toDeserialize.nioBufferCount() > 1?

This comment has been minimized.

Copy link
@idelpivnitskiy

idelpivnitskiy Feb 21, 2020

Author Member

Yes, this is described in the comment above. toDeserialize.nioBufferCount() does not account for a requested length. We may be in a situation when the CompositeBuffer has 2 or more buffers internally (and those buffers may contain multiple proto messages), but we request only a few bytes (single proto message) that could be read from the first internal buffer. In this case, toDeserialize.toNioBuffers(..., length) may return an array that has only the first internal ByteBuffer, not all the internal buffers.

private static CodedInputStream newCodedInputStream(final ByteBuffer[] buffers, final int lengthOfData) {
// Because we allocated a new internal ByteBuffer that will never be mutated we may just wrap it and
// enable aliasing to avoid an extra copying inside parser for a deserialized message.
final CodedInputStream in = unsafeWrap(mergeByteBuffers(buffers, lengthOfData)).newCodedInput();

This comment has been minimized.

Copy link
@NiteshKant

NiteshKant Feb 21, 2020

Member

Wondering why we can not use ByteString.unsafeWrap() and aliasing for toDeserialize.nioBufferCount() == 1 case also? We are pretty sure that the buffer will not be mutated between creation of CodedInputStream and usage by the parser.

This comment has been minimized.

Copy link
@idelpivnitskiy

idelpivnitskiy Feb 21, 2020

Author Member

If toDeserialize.nioBufferCount() == 1, the Buffer will not create a new copy when we do toDeserialize.toNioBuffer(...). Instead, it will return a ByteBuffer wrapper that shares the same internal array.
Because users have a way to access the original Buffer using http filters, we can not guarantee that they do not modify the internal array. Therefore, by default we need to be safe. However, we can provide a configuration option for ProtoBufSerializationProviderBuilder and GrpcClientBuilder/GrpcServerBuilder to do zero-copying deserialization when users sure that they do not touch the payload body in any other way.

@@ -155,7 +157,7 @@ public GrpcSerializationProvider build() {
@Override
public Buffer serialize(final HttpHeaders headers, final T value, final BufferAllocator allocator) {
addContentHeaders(headers);
return serializer.serialize(value, allocator);
return serializer.serialize(value, allocator, METADATA_SIZE + value.getSerializedSize());

This comment has been minimized.

Copy link
@NiteshKant

NiteshKant Feb 21, 2020

Member

If the size is not memoized, getSerializedSize() will use reflection to calculate the size. Is that worth the size optimization here?

This comment has been minimized.

Copy link
@idelpivnitskiy

idelpivnitskiy Feb 21, 2020

Author Member

We invoke getSerializedSize() anyway to fill in the metadata later. So, we do not pay the reflection cost twice. Without this, profiling shows that we frequently expand the buffer during writeTo because it can not fit the serialized proto.

@idelpivnitskiy idelpivnitskiy merged commit 1653fc7 into apple:master Feb 21, 2020
3 checks passed
3 checks passed
pull request validation (jdk11) Build finished.
Details
pull request validation (jdk8) Build finished.
Details
pull request validation (quality) Build finished.
Details
@idelpivnitskiy idelpivnitskiy deleted the idelpivnitskiy:protobuf-serializer-impr branch Feb 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.