Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support varint to have a better list size encoding #408

Closed
ghost opened this issue Jun 27, 2022 · 3 comments
Closed

Support varint to have a better list size encoding #408

ghost opened this issue Jun 27, 2022 · 3 comments
Assignees
Labels
feature New feature request serialization Involve message serialization

Comments

@ghost
Copy link

ghost commented Jun 27, 2022

Is your feature request related to a problem?

We are using some fixed int sizing for the encoding, for example in the list or binary encoding from estimation or approximation. (ex: 1 byte = 256 elements in a list)
However, this approach is not scalable and inefficient as it can either limit the number of items or use too big encoding and useless bytes.

Describe the solution you'd like

To tackle this problem, we can leverage Variable length integers (varint): an encoding format which compress down integers into a smaller space than is normally needed. So we can save bandwidth. The idea is smaller numbers are more common in than larger ones.
So the trade-off is to spend more bits on larger numbers, and fewer bits on smaller numbers.
Example: a 64-bit integer that is almost always less than 256 would be wasting the top 56 bits of a fixed width representation.

There are several approaches of this problem:

While Protobuf's approach seems interesting, it's still quite complex using bits manipulation.
The benefit of length prefixed is the simplicity of understanding and fast decoding.

So we can leverage such technique as length prefixed but we can simplify it a bit using a simple byte to identify how many bytes the following integer is encoded:

  • x: >= 0 && x <= 28: <<1::8, x::8>> -> 1 byte
  • x > 28 && x <= 216: <<2::8, x::16>> -> 3 bytes
  • x > 216 && x <= 232: <<3::8, x::32>> -> 5 bytes
  • x > 232 && x <= 264: <<4::8, x::64>> -> 9 bytes

Additional context

Should be implemented for the infinite sizes (input list, etc...)

@ghost ghost added feature New feature request serialization Involve message serialization labels Jun 27, 2022
@ghost ghost assigned Neylix Jun 28, 2022
@prix-uniris
Copy link
Contributor

There's a clear advantage in proto-buf to support the greater length of integers.
But in the case of bitcoin encoding, we have to pre-define how much is the length of our integers can be or a range, so if we want to support say, u128 then we will be required to account for then they will make use of 0xfc as last one byte range and following simply as per their scheme.
The above encoding, is a good option also, as it will add 1 byte as the integer length as prepend to the integer itself, so this can also work fine.
This would be a great serialization technique which is simpler as in our stack we use bitstring, also protobuf, is a better alternative which is bit complex them simple encoding read as mentioned above.

@ghost
Copy link
Author

ghost commented Jul 1, 2022

Hey team! Please add your planning poker estimate with ZenHub @apoorv-2204 @imnik11 @Neylix @prix-uniris

@ghost ghost assigned prix-uniris and unassigned Neylix Jul 1, 2022
@prix-uniris
Copy link
Contributor

prix-uniris commented Jul 4, 2022

Tasks/Actions

  • Make a Util Module to encode and decode variable length number
  • Make changes in codebase wherever lists are being used
    • Transaction Data Fields
    • Transaction
    • Encoder (DB)
    • Validation TimeStamps
  • Make Changes for the same in decoding
  • Modify Doctests for the same
  • Modify/Update Tests for new Encoding

@prix-uniris prix-uniris mentioned this issue Jul 7, 2022
Neylix pushed a commit that referenced this issue Jul 21, 2022
* Added VarInt Utility Module

* Added new VarInt Scheme for all lists inside transaction chain folder

* VarInt Implemented for transaction_chain, tests passing for transaction_chain

* Fixed with new VarInt in BeaconChain Lists and DocTests

* Fixed Test in Election and Mining Fees and Transaction Controller since changes in Serialization.

* Chore: Credo Fix

* Review Modification and Addition of VarInt in message.ex

* Added VarInt in db/encoding.ex for storage

* Review Changes and Modifications

* Review Changes: Remove VarInt from nb_validations and nb_cross_validations
@Neylix Neylix closed this as completed Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature request serialization Involve message serialization
Projects
None yet
Development

No branches or pull requests

2 participants