Skip to content

varint decoding alignment for overlong encodings#2045

Closed
jmestwa-coder wants to merge 1 commit into
abseil:masterfrom
jmestwa-coder:varint-decoder-overlong-alignment
Closed

varint decoding alignment for overlong encodings#2045
jmestwa-coder wants to merge 1 commit into
abseil:masterfrom
jmestwa-coder:varint-decoder-overlong-alignment

Conversation

@jmestwa-coder
Copy link
Copy Markdown

Summary

Fix varint decoding where overlong encodings (>10 bytes) leave extra continuation bytes unconsumed, causing cursor misalignment.

Problem

When a varint exceeds the maximum size, the decoder stops early without consuming remaining bytes.
This misaligns the cursor and corrupts parsing of subsequent fields.

Fix

Consume remaining continuation bytes for overlong varints to keep the decoding cursor aligned.

Impact

  • Corrects parsing of fields following malformed varints
  • No change for valid inputs

copybara-service Bot pushed a commit that referenced this pull request May 5, 2026
for the benefit people using of AI to scan the code

This will hopefully prevent pull-requests like #2045 that see a local
problem with DecodeVarint() without understanding the larger picture.

See #2045

PiperOrigin-RevId: 910971345
Change-Id: I76e1524e4577799ae391716928f10b7fff82e112
@derekmauro
Copy link
Copy Markdown
Member

I'm 99% certain this pull request was generated by an AI. That is fine; a pull request should be judged on its merits, not its author.

The reason I suspect an AI is that it identified a local issue in a varint decoder, a niche topic that most human contributors would not even be familiar with. However, by failing to see the context, specifically that encoding and decoding occur in a closed loop, the AI missed that all inputs are trusted. Since we assume valid varints, we don't need a robust protocol buffer encoder/decoder implementation. Our simple version is sufficient, and anything more adds unnecessary complexity.

Even if the inputs were untrusted, this would represent a massive security vulnerability that this PR fails to address properly. The discrepancy between the sophisticated identification of the problem and the poor quality of the fix is a hallmark of AI. A human developer would likely have reported the issue rather than submitting a shallow fix.

A proper solution to untrusted inputs would require a fail-fast mechanism to signal errors to the caller. This would necessitate a significant refactoring of the DecodeVarint() function signature, which currently cannot signal failure. I have yet to see an AI initiate that kind of structural change. Instead, this PR applies an arbitrary rule for cursor movement upon encountering invalid data, which does nothing to mitigate the security and data corruption problem.

97aea89 adds a comment to this file to provide context for both human and AI contributors, hopefully preventing similar misunderstandings in the future.

@derekmauro derekmauro closed this May 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants