-
Notifications
You must be signed in to change notification settings - Fork 696
Non-canonical LEB128 #892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-canonical LEB128 #892
Conversation
We've discussed this before, but the spec doesn't mention it.
BinaryEncoding.md
Outdated
| * non-significant zero `0x80` bytes are present in an over-large encoding; and / or | ||
| * non-significant LEB128 bits are ignored. | ||
|
|
||
| In both cases, the _N_ bit limitation applies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An additional constraint (AIUI) is that the maximum length of a valid non-canonical LEB128 is equal to the maximum length of the canonical LEB128 taking into account all possible values. So, for example, varint32 has a max length of 5 bytes so a non-canonical varint32 cannot be 6 bytes long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't that what this line says? Maybe I misunderstand what you're adding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I interpret this as saying the maximum value must be a 32-bit value (say). So a 6-byte LEB128 that decodes to a value < 2**32 would be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah you're totally right! Updating...
|
Oh, I just noticed this actually is mentioned, including the maximum length:
|
|
Yes, as @binji points out, this is already stated. No need to add anything. |
|
That's half of the non-canonical stuff. It's missing the other half. And the maximum length. I can put everything in one section if that's clearer. |
|
What other half? And it has the maximum length part too ( |
|
"non-relevant LEB128 bits (bits past the size) are ignored" |
|
Ah, but I think that's incorrect. If we want to extend a varint (from varint32 to a varint64, say) in the future, then it's important that the bits past the size are a zero-extension (for LEB) or sign-extension (for SLEB) of the most significant bit. |
|
Is it incorrect? That's the main reason I opened this :) |
|
Hm, I don't read it that way:
Since it's extended up to a multiple of 7 bits, it should only have zeroes or ones past the size. Anyway, I think it's pretty important that the bits are not ignored. For example, in #895 you changed the resizable_limits flags from a varuint32 to a varuint1. If we ignore the bits past the size, then we can't safely extend the value. |
|
@jfbastien If I'm reading the |
|
@binji @lukewagner are you saying that non-canonical LEB128 can only be extra long (either zero or sign extended, up to max size), and that insignificant bits are and should be disallowed? |
|
@jfbastien, that was the intention of the text, and it still reads that way to me. |
|
Agreed, assuming "insignificant" means "bits that otherwise wouldn't fit in the target |
|
@jfbastien So I think we can close this out now? Or perhaps you'd like to create PR that clarifies the wording? |
|
@lukewagner yeah I'll update the PR to clarify wording. Soon :) |
|
As discussed above, this is already documented, so there's no need to add anything. |
We've discussed this before, but the spec doesn't mention it.