Add byte-count limit to TCompactProtocol varint reader#3410
Conversation
mhlakhani
left a comment
There was a problem hiding this comment.
nit: instead of 10 everywhere, should we use a named constant or comment in the code? as it stands this is kind of like a magic number and hard to reason about
f230b90 to
77b24c9
Compare
77b24c9 to
ade316a
Compare
ade316a to
2a59422
Compare
Should be covered now.
Issue 1 — Ruby C extension missing: lib/rb/ext/compact_protocol.c:read_varint64 had an unbounded while (true) loop, and read_varint32 just delegated to it. Both are now fixed with bounded for loops using Issue 2 — 5-byte limit for 32-bit not enforced: The description claimed ceil(32/7) = 5 but neither Ruby path enforced it. Fixed in both the C extension and pure Ruby — read_varint32 now has its own 5-byte loop instead of delegating. What changed:
|
Client: py,go,php,java,netstd,delphi,haxe,lua,swift,rb,dart,erl,rs C++ and Node.js already enforce a 10-byte ceiling on varint reads. This change brings all remaining runtimes into parity by raising a protocol exception (INVALID_DATA) when more than 10 bytes are consumed without finding a terminating byte. Java: both fast-path and slow-path varint readers are covered. Erlang: read_varint/3 gains a when Count < 10 guard clause; call sites updated to propagate the error tuple rather than crashing on badmatch. Rust: VarIntReader trait replaced with local bounded loops; zigzag decode moved to free functions independent of the integer_encoding crate. Ruby: both the pure-Ruby implementation and the C extension are fixed; read_varint32 gains a separate 5-byte limit instead of delegating to read_varint64. Named constants replace bare integer literals in all runtimes, with a comment explaining the derivation (ceil(64/7) = 10, ceil(32/7) = 5). Java and Rust enforce both the 5-byte (32-bit) and 10-byte (64-bit) limits; other runtimes enforce the 10-byte ceiling uniformly. Tests added for py, go, php, java (all four original runtimes), plus rb, swift, dart, and rs which have existing unit-test infrastructure. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2a59422 to
2e38de7
Compare
## Summary - Bumps three vulnerable transitive deps to their patched versions, closing four Dependabot alerts. - Two `thrift` alerts (#223, #224) are left open — upstream has not published a patched version to crates.io. See research note below. ## Changes | Manifest | Package | From | To | Advisory | |---|---|---|---|---| | `yarn.lock` | `ws` | 8.19.0 | 8.20.1 | [GHSA-58qx-3vcg-4xpx](GHSA-58qx-3vcg-4xpx) (#255) | | `yarn.lock` | `brace-expansion` | 5.0.5 | 5.0.6 | [GHSA-jxxr-4gwj-5jf2](GHSA-jxxr-4gwj-5jf2) (#252) | | `analytics-web-app/yarn.lock` | `brace-expansion` | 5.0.5 | 5.0.6 | GHSA-jxxr-4gwj-5jf2 (#251) | | `python/micromegas/poetry.lock` | `idna` | 3.11 | 3.15 | [GHSA-65pc-fj4g-8rjx](GHSA-65pc-fj4g-8rjx) (#253) | ## thrift alerts (#223, #224) — left open The `thrift` Rust crate on crates.io is stuck at 0.17.0 (Nov 2022). Apache Thrift has released through 0.23.0 on GitHub, including the fix for CVE-2026-43868 (PR apache/thrift#3410, merged Apr 2026), but no 0.18+ version has been published to crates.io — the Rust binding was effectively unmaintained (THRIFT-5917). Rust CI was re-enabled in apache/thrift#3417 (Apr 2026) and a release workflow exists (#3027), so a publish is plausibly near-term. Even if we forked, `parquet 57.3.0` constrains to `thrift ^0.17`, so a 0.23.x cannot satisfy resolution without also patching parquet. arrow-rs is independently migrating off the `thrift` crate (custom parser shipped in v57 for reads; writes still pending). Severity is medium (CVSS 5.3, DoS only, requires attacker-controlled parquet metadata). Per CLAUDE.md the alerts stay open until a real fix is available. ## Test plan - [ ] `yarn install --immutable` clean at repo root - [ ] `yarn install --immutable` clean in `analytics-web-app/` - [ ] `poetry check` clean in `python/micromegas/` - [ ] Dependabot alerts #251, #252, #253, #255 auto-close on merge
Summary
The varint reader in
TCompactProtocolloops until it encounters a non-continuation byte (bit 7 clear), without bounding the number of bytes it will consume. The protobuf wire-format specification, which the compact protocol wire format is based on, defines a maximum of 10 bytes for a 64-bit varint (5 bytes for 32-bit). C++ and Node.js already enforce this limit and raise a protocol exception when it is exceeded.This change extends that limit to all remaining runtimes:
lib/py):readVarint()uses a boundedforloop; raisesTProtocolException(INVALID_DATA)on overflow.lib/go):readVarint64()uses a bounded loop; returnsTProtocolException(INVALID_DATA)on overflow.lib/php):readVarint()uses a boundedwhileloop; throwsTProtocolException(INVALID_DATA)on overflow.lib/java): slow-pathreadVarint32()(5-byte limit) andreadVarint64()(10-byte limit) both throwTProtocolException(INVALID_DATA)on overflow.lib/netstd):ReadVarInt32Async()andReadVarInt64Async()use bounded loops; throwTProtocolException(INVALID_DATA)on overflow.lib/delphi):ReadVarInt32()andReadVarInt64()use bounded loops; raiseTProtocolException(INVALID_DATA)on overflow.lib/haxe):readVarint32()andreadVarint64()use bounded loops; throwTProtocolException(INVALID_DATA)on overflow.lib/lua):readVarint()uses a bounded loop; returns an error on overflow.lib/swift):readVarInt()uses a bounded loop; throwsTProtocolError(.invalidData)on overflow.lib/rb): both the pure-Ruby implementation and the C extension are fixed.read_varint32()gains a separate 5-byte limit;read_varint64()a 10-byte limit. Both raiseProtocolException(INVALID_DATA)on overflow.lib/dart):_readVarInt32()and_readVarInt64()use boundedforloops; throwTProtocolError(INVALID_DATA)on overflow.lib/erl):read_varint/3gains awhen Count < 10guard; call sites propagate the error tuple.lib/rs):VarIntReadertrait replaced with local bounded loops for both 32-bit (5 bytes) and 64-bit (10 bytes); zigzag decode moved to free functions independent of theinteger_encodingcrate.Named constants replace bare integer literals in all runtimes, with a comment explaining the derivation (
ceil(64/7) = 10,ceil(32/7) = 5). Java, Ruby, and Rust enforce both the 5-byte (32-bit) and 10-byte (64-bit) limits separately; other runtimes enforce the 10-byte ceiling uniformly for both 32-bit and 64-bit reads.Test plan
dotnet buildcleanbundle exec rspec spec/compact_protocol_spec.rb— 4 new varint limit tests pass (23/23 total); pure-Ruby path exercised (C extension not compiled in Docker)cargo test compact— 75/75 pass including 3 new varint limit tests (rust:latestimage, toolchain override required due to rootrust-toolchainpinning 1.83.0)🤖 Generated with Claude Code