Closed
Conversation
Author
|
also got another question, why bother use varint on select lane frame if the max acceptable lane number is 255 |
Contributor
|
Hey, this looks like really great stuff! I'm swamped right now, but thanks so much for digging into the wire encoding. It looks like you've found a good bug, and are asking good questions. I'll try to get back to you soon. |
Contributor
|
@zpostfacto gentle ping on this :) |
zpostfacto
added a commit
that referenced
this pull request
Apr 15, 2026
This addresses issue #352. The problem was more subtle than the original bug report. The decoder indeed had a bug, but everything worked because the encoder *also* had a bug! The encoder is always sending a lead byte of 0x98 (10011000), which incorrectly set the w bit to 1, and according to the (previous) spec, the packet number should have been encoded using 32-bits. The decoder had a bug checking the wrong bit, 0x40, when w is actually in bit 0x08. But this was dead code because we only get into that branch `if ( ( nFrameType & 0xf0 ) == 0x90 )`, and in that case bit 0x40 cannot be set. The change is subtle because we cannot just fix encoders/decoders to match the spec without risking brekaing interop with existing code deployments. My solution: - Change the spec, inverting the meaning of the w flag, to match the current encoding behaviour. - Fix decoders to be compliant with this new spec. - Bump the protocol version so that, in the future, encoders could chose to use 32-bit packet numbers safely, knowing whether or not the peer was capable of decoding it properly. I did not change the encoder logic at this time. The need to encode 32-bits of the packet number is a theoretical possibility that might become necessary if the bandwidth x delay product gets very high. (The number of packets in flight is high and the bottom 16-bits risks becoming ambiguous to identify a packet.)
Contributor
|
Thanks for finding this bug. I was not able to take your PR exactly as is, because the fix ended up being more subtle in order to maintain interop with broken peers that had the bug. Sorry for the extreme delay in responding. Thanks again! |
zpostfacto
added a commit
that referenced
this pull request
Apr 16, 2026
This addresses issue #352. The problem was more subtle than the original bug report. The decoder indeed had a bug, but everything worked because the encoder *also* had a bug! The encoder is always sending a lead byte of 0x98 (10011000), which incorrectly set the w bit to 1, and according to the (previous) spec, the packet number should have been encoded using 32-bits. The decoder had a bug checking the wrong bit, 0x40, when w is actually in bit 0x08. But this was dead code because we only get into that branch `if ( ( nFrameType & 0xf0 ) == 0x90 )`, and in that case bit 0x40 cannot be set. The change is subtle because we cannot just fix encoders/decoders to match the spec without risking brekaing interop with existing code deployments. My solution: - Change the spec, inverting the meaning of the w flag, to match the current encoding behaviour. - Fix decoders to be compliant with this new spec. - Bump the protocol version so that, in the future, encoders could chose to use 32-bit packet numbers safely, knowing whether or not the peer was capable of decoding it properly. I did not change the encoder logic at this time. The need to encode 32-bits of the packet number is a theoretical possibility that might become necessary if the bandwidth x delay product gets very high. (The number of packets in flight is high and the bottom 16-bits risks becoming ambiguous to identify a packet.) (cherry picked from commit d707afb)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There was a typo for checking against the size of
latest_received_pkt_num.As the packet mask is 0b10010000,
nFrameType & 0x40will always be zero.I changed the mask to
0x10according to the format documentation.