Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protobuf deserializer is allocating unnecessary memory #1701

Open
8 tasks done
MichalBrylka opened this issue Oct 16, 2021 · 6 comments
Open
8 tasks done

Protobuf deserializer is allocating unnecessary memory #1701

MichalBrylka opened this issue Oct 16, 2021 · 6 comments

Comments

@MichalBrylka
Copy link

MichalBrylka commented Oct 16, 2021

Description

I'd like to address issue in DeserializeAsync metod of ProtobufDeserializer class:
https://github.com/confluentinc/confluent-kafka-dotnet/blob/master/src/Confluent.SchemaRegistry.Serdes.Protobuf/ProtobufDeserializer.cs#L97

I like the idea that API for IAsyncDeserializer uses ReadOnlyMemory (or ReadOnlySpan for sync version). In deserialization code however there is at lot of unnecessary allocations:

  • data.ToArray();
  • using (var stream = new MemoryStream(array))
  • using (var reader = new BinaryReader(stream))

I understand that they make code more smooth-looking but please have a look at this benchmark:
https://github.com/nemesissoft/KafkaProtobufSyncOverAsyncPerf/blob/main/DeserializerBenchmarks.cs
I've attached my results:
https://github.com/nemesissoft/KafkaProtobufSyncOverAsyncPerf/blob/main/README.md

Could we improve this class by removing these allocations ? I've already proposed a quick solution:
https://github.com/nemesissoft/KafkaProtobufSyncOverAsyncPerf/blob/main/Deserializers/NonAllocProtobufDeserializer.cs

Sadly there is no SpanStream class that would make coding easier (though I plan to provide one in future) but solution I've proposed (span + position to mock stream read-advance behaviors) should be enough for now

I personally do not see benefit of having async deserializer just to call .ConfigureAwait(continueOnCapturedContext: false).GetAwaiter().GetResult(); but this overhead seems negligible. So even if we leave Deserialize method as async (or potentially add sync version as well), could we remove allocations ?

How to reproduce

Run provided benchmark or look at results:
https://github.com/nemesissoft/KafkaProtobufSyncOverAsyncPerf/blob/main/README.md

Checklist

Please provide the following information:

  • A complete (i.e. we can run it), minimal program demonstrating the problem. No need to supply a project file: see description
  • Confluent.Kafka nuget version: 1.8.1
  • Apache Kafka version: N/A
  • Client configuration: N/A
  • Operating system: Windows 10 but should not matter
  • Provide logs (with "debug" : "..." as necessary in configuration): N/A
  • Provide broker log excerpts: N/A
  • Critical issue.: NO
@MichalBrylka
Copy link
Author

as per data.ToArray(); this is copied from documentation:
Copies the contents of this read-only span into a new array. This heap allocates, so should generally be avoided, however it is sometimes necessary to bridge the gap with APIs written in terms of arrays.

@MichalBrylka
Copy link
Author

Hello Team
would you like me to provide a pull request with a solution ?

@MichalBrylka
Copy link
Author

hi, would you like to accept a PR with improved serializer and deserializer ?

@MichalBrylka
Copy link
Author

Hi,
I took the liberty of creating an efficient deserializer for both sync and async version:
EfficientProtobufDeserializer

Would you like me to create a pull request for your codebase with such deserializer?

@mhowlett
Copy link
Contributor

thanks. yes, we'd like to but need to prioritize reviewing etc.
at some point in the future, I expect we'll review many aspects of the serdes for performance, and consider this then.

@MichalBrylka
Copy link
Author

@mhowlett let me know if/when I should prepare a PR. Code itself is ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants