Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a standard data encoding #242

Closed
erikgrinaker opened this issue May 8, 2020 · 5 comments · Fixed by #287
Closed

Use a standard data encoding #242

erikgrinaker opened this issue May 8, 2020 · 5 comments · Fixed by #287
Assignees
Labels
T:breaking Type: Breaking Change
Milestone

Comments

@erikgrinaker
Copy link
Contributor

erikgrinaker commented May 8, 2020

We currently use Amino, we should use a standard serialization format instead, e.g. Protobuf. However, Protobuf does not support random access and always decodes e.g. whole nodes, which may be too expensive when we're simply traversing the tree. Other encodings may be more appropriate.

@erikgrinaker erikgrinaker added the T:breaking Type: Breaking Change label May 8, 2020
@erikgrinaker erikgrinaker self-assigned this May 8, 2020
@alexanderbez
Copy link
Contributor

@erikgrinaker can you elaborate on random access? What are some alternatives? What is the relationship between encoding and the merkle root? If there is none, we can consider gob with streaming APIs.

@erikgrinaker
Copy link
Contributor Author

There will be an ADR on this.

can you elaborate on random access?

We may not always want to load the entire node, but only a field or two. For example, when we're checking if a given key exists it's not necessary to also load the value, which can be arbitrarily large. Protobuf does not support reading a single field, it always reads the entire message (or node, in this case).

Of course, the OS may use read-ahead anyway, so it may not matter - we'll have to look into it.

What are some alternatives?

As far as I know, IAVL currently uses Amino only for encoding of scalar values, which are laid out linearly. We may want to use a similar approach instead of a message-oriented one. In principle, any length-prefixed scalar encoding would do.

What is the relationship between encoding and the merkle root

I don't believe it matters, except for possibly the encoding of the integer node version. We'll have to look into it.

we can consider gob with streaming APIs

I feel like Gob is too tied to Go, I'd prefer something broader so that it's at least possible to access the data in other languages.

@alexanderbez
Copy link
Contributor

I see. Well, if the encoding isn't tied to the merkle-ization of the tree, then being language specific doesn't really matter (WRT to gob). Looking forward to reading the ADR 👍

@erikgrinaker
Copy link
Contributor Author

Well, this turned out to be a bit of a red herring, since we obviously have to load the entire serialized value from the KV store - Protobuf will do just fine.

Sorry about the noise, I'd been thinking about on-disk B-tree layouts all week.

@erikgrinaker
Copy link
Contributor Author

So the Amino-encoding in IAVL really just wraps binary.PutVarint and related varint methods in the Go standard library, and uses varint-prefixed byte slices. It would be trivial to remove Amino (the library) and retain the same encoding using the Go standard library.

I'm not convinced that moving to Protobuf (which would be mostly the same, varints and varint-prefixed byte slices) is worth the breakage, at least not unless we're making other changes at the same time.

mergify bot pushed a commit that referenced this issue Jun 30, 2020
Removes Amino and adds varint utility-functions instead, keeping the same serialized format. Partial fix for #242.

The last vestige of Amino is in the proof handling, namely `AbsenceOp` and `ValueOp`, which are structs encoded/decoded with Amino.
@mergify mergify bot closed this as completed in #287 Jul 2, 2020
mergify bot pushed a commit that referenced this issue Jul 2, 2020
Fixes #242. Switches ProofOp encoding from Amino to Protobuf, but retains the same byte layout for backwards compatibility. This is not strictly Protobuf since the proofs are length-prefixed, but that's what Amino did so that's what we'll keep doing.

The Protobuf types need to be moved to a separate package and cleaned up: they conflict with the native types, and we're making the same mistake as before by leaking Protobuf types through public APIs. I'll submit a separate PR for this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T:breaking Type: Breaking Change
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants