Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate using Cap'n Proto instead of protobuf #29

Closed
RomanIakovlev opened this issue Dec 14, 2018 · 3 comments
Closed

Investigate using Cap'n Proto instead of protobuf #29

RomanIakovlev opened this issue Dec 14, 2018 · 3 comments
Labels
enhancement New feature or request

Comments

@RomanIakovlev
Copy link
Owner

There might be some performance gains to be obtained from this. See https://capnproto.org/.

@juarezr
Copy link
Contributor

juarezr commented Dec 14, 2018

Perhaps it would be possible to squeeze some performance, mainly while querying:

  • On disk compression should'nt change due to the use of ZStandard to compress the files.
  • Query/processing performance should be better due to Cap'n Proto random inplace access with no decoding.
  • Memory usage should be higher due to Cap'n Proto using more bytes than Protobuff for same data. Maybe this trade-off is worth pursuing. Or maybe it would be best tailored having two versions: memory or processing optimized.

@juarezr
Copy link
Contributor

juarezr commented Jan 10, 2019

Adding flatbuffers as candidate for comparison.
The serialization libraries tend to go in two opposite directions:

  1. Space efficient:
    1.1. The data is encoded and compressed/coded while storing and for accessing a decode step is needed.
    1.2. There is no random access to data/fields.
    1.3. Encoded size is lower as well network/disk/memory size but reading/writing time is higher.
    1.4. This is the scheme of Protobuf

  2. Processing efficient:
    2.1. The data is encoded but nocompressed/coded while storing and the data/fields could be accessed directly thought memory.
    2.2. All data/fields can be randomly/directly accessed.
    2.3. Encoded size is about the same and higher than in 1 but reading/writing time is lower due to zero copy and no decoding step.
    2.4. This is the scheme of Cap'n Proto and flatbuffers

There is many serializations libraries/formats but the trio Protobuf capnproto/flatbuffers above figure in the top performance/use/completeness. See also thrift, BSON, MessagePack, etc...

@RomanIakovlev
Copy link
Owner Author

Closing due to lack of activity. I don't want to preclude any work on improving the performance of Timeshape, but I don't have a throughput to work on it (and didn't have since 2019, as is evident from the age of this issue). If someone is interested in picking this up, I'll be able to provide guidance and code review, as well as make releases, so contributions are welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants