Skip to content

Conversation

@jfbastien
Copy link
Member

We've discussed this before, but the spec doesn't mention it.

We've discussed this before, but the spec doesn't mention it.
* non-significant zero `0x80` bytes are present in an over-large encoding; and / or
* non-significant LEB128 bits are ignored.

In both cases, the _N_ bit limitation applies.
Copy link
Member

@binji binji Dec 8, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An additional constraint (AIUI) is that the maximum length of a valid non-canonical LEB128 is equal to the maximum length of the canonical LEB128 taking into account all possible values. So, for example, varint32 has a max length of 5 bytes so a non-canonical varint32 cannot be 6 bytes long.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that what this line says? Maybe I misunderstand what you're adding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I interpret this as saying the maximum value must be a 32-bit value (say). So a 6-byte LEB128 that decodes to a value < 2**32 would be valid.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah you're totally right! Updating...

@binji
Copy link
Member

binji commented Dec 9, 2016

Oh, I just noticed this actually is mentioned, including the maximum length:

A LEB128 variable-length integer, limited to N bits (i.e., the values [0, 2^N-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 bytes.

A Signed LEB128 variable-length integer, limited to N bits (i.e., the values [-2^(N-1), +2^(N-1)-1]), represented by at most ceil(N/7) bytes that may contain padding 0x80 or 0xFF bytes.

@rossberg
Copy link
Member

rossberg commented Dec 9, 2016

Yes, as @binji points out, this is already stated. No need to add anything.

@jfbastien
Copy link
Member Author

That's half of the non-canonical stuff. It's missing the other half. And the maximum length. I can put everything in one section if that's clearer.

@binji
Copy link
Member

binji commented Dec 9, 2016

What other half? And it has the maximum length part too (ceil(N/7) bytes). The only thing I see that's missing is that the padding will actually be 0 or 0x7f for the last byte.

@jfbastien
Copy link
Member Author

"non-relevant LEB128 bits (bits past the size) are ignored"

@binji
Copy link
Member

binji commented Dec 9, 2016

Ah, but I think that's incorrect. If we want to extend a varint (from varint32 to a varint64, say) in the future, then it's important that the bits past the size are a zero-extension (for LEB) or sign-extension (for SLEB) of the most significant bit.

@jfbastien
Copy link
Member Author

Is it incorrect? That's the main reason I opened this :)
The wikipedia entry were refer to so authoritatively leads me to believe past-the-end bits can be anything!

@binji
Copy link
Member

binji commented Dec 9, 2016

Hm, I don't read it that way:

To encode an unsigned number using unsigned LEB128 first represent the number in binary. Then zero extend the number up to a multiple of 7 bits...

A signed number is represented similarly, except that the two's complement number is sign extended up to a multiple of 7 bits...

Since it's extended up to a multiple of 7 bits, it should only have zeroes or ones past the size.

Anyway, I think it's pretty important that the bits are not ignored. For example, in #895 you changed the resizable_limits flags from a varuint32 to a varuint1. If we ignore the bits past the size, then we can't safely extend the value.

@lukewagner
Copy link
Member

@jfbastien If I'm reading the varuintN and varintN sections correctly, they are already defining their respective types as a restriction of LEB128 and no bytes are ignored.

@jfbastien
Copy link
Member Author

@binji @lukewagner are you saying that non-canonical LEB128 can only be extra long (either zero or sign extended, up to max size), and that insignificant bits are and should be disallowed?

@rossberg
Copy link
Member

@jfbastien, that was the intention of the text, and it still reads that way to me.

@lukewagner
Copy link
Member

Agreed, assuming "insignificant" means "bits that otherwise wouldn't fit in the target uintN".

@lukewagner
Copy link
Member

@jfbastien So I think we can close this out now? Or perhaps you'd like to create PR that clarifies the wording?

@jfbastien
Copy link
Member Author

@lukewagner yeah I'll update the PR to clarify wording. Soon :)

@sunfishcode sunfishcode added this to the MVP milestone Jan 31, 2017
@sunfishcode
Copy link
Member

As discussed above, this is already documented, so there's no need to add anything.

@sunfishcode sunfishcode deleted the jfbastien-patch-2 branch February 22, 2024 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants