Non-canonical LEB128 #892

jfbastien · 2016-12-08T22:47:47Z

We've discussed this before, but the spec doesn't mention it.

binji · 2016-12-08T22:57:15Z

BinaryEncoding.md

+* non-significant zero `0x80` bytes are present in an over-large encoding; and / or
+* non-significant LEB128 bits are ignored.
+
+In both cases, the _N_ bit limitation applies.


An additional constraint (AIUI) is that the maximum length of a valid non-canonical LEB128 is equal to the maximum length of the canonical LEB128 taking into account all possible values. So, for example, varint32 has a max length of 5 bytes so a non-canonical varint32 cannot be 6 bytes long.

Isn't that what this line says? Maybe I misunderstand what you're adding.

Well, I interpret this as saying the maximum value must be a 32-bit value (say). So a 6-byte LEB128 that decodes to a value < 2**32 would be valid.

Ah yeah you're totally right! Updating...

binji · 2016-12-09T08:19:38Z

Oh, I just noticed this actually is mentioned, including the maximum length:

A LEB128 variable-length integer, limited to N bits (i.e., the values [0, 2^N-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 bytes.

A Signed LEB128 variable-length integer, limited to N bits (i.e., the values [-2^(N-1), +2^(N-1)-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 or 0xFF bytes.

rossberg · 2016-12-09T08:34:56Z

Yes, as @binji points out, this is already stated. No need to add anything.

jfbastien · 2016-12-09T17:25:43Z

That's half of the non-canonical stuff. It's missing the other half. And the maximum length. I can put everything in one section if that's clearer.

binji · 2016-12-09T18:11:31Z

What other half? And it has the maximum length part too (ceil(N/7) bytes). The only thing I see that's missing is that the padding will actually be 0 or 0x7f for the last byte.

jfbastien · 2016-12-09T18:21:51Z

"non-relevant LEB128 bits (bits past the size) are ignored"

binji · 2016-12-09T18:25:32Z

Ah, but I think that's incorrect. If we want to extend a varint (from varint32 to a varint64, say) in the future, then it's important that the bits past the size are a zero-extension (for LEB) or sign-extension (for SLEB) of the most significant bit.

jfbastien · 2016-12-09T18:30:29Z

Is it incorrect? That's the main reason I opened this :)
The wikipedia entry were refer to so authoritatively leads me to believe past-the-end bits can be anything!

binji · 2016-12-09T18:45:18Z

Hm, I don't read it that way:

To encode an unsigned number using unsigned LEB128 first represent the number in binary. Then zero extend the number up to a multiple of 7 bits...

A signed number is represented similarly, except that the two's complement number is sign extended up to a multiple of 7 bits...

Since it's extended up to a multiple of 7 bits, it should only have zeroes or ones past the size.

Anyway, I think it's pretty important that the bits are not ignored. For example, in #895 you changed the resizable_limits flags from a varuint32 to a varuint1. If we ignore the bits past the size, then we can't safely extend the value.

lukewagner · 2016-12-09T18:46:04Z

@jfbastien If I'm reading the varuintN and varintN sections correctly, they are already defining their respective types as a restriction of LEB128 and no bytes are ignored.

jfbastien · 2016-12-15T23:53:59Z

@binji @lukewagner are you saying that non-canonical LEB128 can only be extra long (either zero or sign extended, up to max size), and that insignificant bits are and should be disallowed?

rossberg · 2016-12-16T07:23:03Z

@jfbastien, that was the intention of the text, and it still reads that way to me.

lukewagner · 2016-12-16T22:15:15Z

Agreed, assuming "insignificant" means "bits that otherwise wouldn't fit in the target uintN".

lukewagner · 2016-12-20T23:45:34Z

@jfbastien So I think we can close this out now? Or perhaps you'd like to create PR that clarifies the wording?

jfbastien · 2016-12-20T23:59:16Z

@lukewagner yeah I'll update the PR to clarify wording. Soon :)

sunfishcode · 2024-02-22T21:23:01Z

As discussed above, this is already documented, so there's no need to add anything.

Non-canonical LEB128

618fa45

We've discussed this before, but the spec doesn't mention it.

binji reviewed Dec 8, 2016

View reviewed changes

jfbastien added 2 commits December 8, 2016 14:57

s/significant/relevant/

8c0b88f

Rephrase the size limit

761eeb3

sunfishcode added this to the MVP milestone Jan 31, 2017

sunfishcode assigned jfbastien Jan 31, 2017

sunfishcode closed this Feb 22, 2024

sunfishcode deleted the jfbastien-patch-2 branch February 22, 2024 21:23

Non-canonical LEB128 #892

Non-canonical LEB128 #892

Uh oh!

Conversation

jfbastien commented Dec 8, 2016

Uh oh!

binji Dec 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jfbastien Dec 8, 2016

Choose a reason for hiding this comment

Uh oh!

binji Dec 9, 2016

Choose a reason for hiding this comment

Uh oh!

jfbastien Dec 9, 2016

Choose a reason for hiding this comment

Uh oh!

binji commented Dec 9, 2016

Uh oh!

rossberg commented Dec 9, 2016

Uh oh!

jfbastien commented Dec 9, 2016

Uh oh!

binji commented Dec 9, 2016

Uh oh!

jfbastien commented Dec 9, 2016

Uh oh!

binji commented Dec 9, 2016

Uh oh!

jfbastien commented Dec 9, 2016

Uh oh!

binji commented Dec 9, 2016

Uh oh!

lukewagner commented Dec 9, 2016

Uh oh!

jfbastien commented Dec 15, 2016

Uh oh!

rossberg commented Dec 16, 2016

Uh oh!

lukewagner commented Dec 16, 2016

Uh oh!

lukewagner commented Dec 20, 2016

Uh oh!

jfbastien commented Dec 20, 2016

Uh oh!

sunfishcode commented Feb 22, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

binji Dec 8, 2016 •

edited

Loading