varint decoding alignment for overlong encodings#2045
Conversation
|
I'm 99% certain this pull request was generated by an AI. That is fine; a pull request should be judged on its merits, not its author. The reason I suspect an AI is that it identified a local issue in a varint decoder, a niche topic that most human contributors would not even be familiar with. However, by failing to see the context, specifically that encoding and decoding occur in a closed loop, the AI missed that all inputs are trusted. Since we assume valid varints, we don't need a robust protocol buffer encoder/decoder implementation. Our simple version is sufficient, and anything more adds unnecessary complexity. Even if the inputs were untrusted, this would represent a massive security vulnerability that this PR fails to address properly. The discrepancy between the sophisticated identification of the problem and the poor quality of the fix is a hallmark of AI. A human developer would likely have reported the issue rather than submitting a shallow fix. A proper solution to untrusted inputs would require a fail-fast mechanism to signal errors to the caller. This would necessitate a significant refactoring of the DecodeVarint() function signature, which currently cannot signal failure. I have yet to see an AI initiate that kind of structural change. Instead, this PR applies an arbitrary rule for cursor movement upon encountering invalid data, which does nothing to mitigate the security and data corruption problem. 97aea89 adds a comment to this file to provide context for both human and AI contributors, hopefully preventing similar misunderstandings in the future. |
Summary
Fix varint decoding where overlong encodings (>10 bytes) leave extra continuation bytes unconsumed, causing cursor misalignment.
Problem
When a varint exceeds the maximum size, the decoder stops early without consuming remaining bytes.
This misaligns the cursor and corrupts parsing of subsequent fields.
Fix
Consume remaining continuation bytes for overlong varints to keep the decoding cursor aligned.
Impact