Skip to content

network: pin malformed BlocksByRoot regression test to exact Hive bytes#853

Merged
ch4r10t33r merged 2 commits into
mainfrom
fix/malformed-request-hive-506-regression
May 8, 2026
Merged

network: pin malformed BlocksByRoot regression test to exact Hive bytes#853
ch4r10t33r merged 2 commits into
mainfrom
fix/malformed-request-hive-506-regression

Conversation

@ch4r10t33r
Copy link
Copy Markdown
Contributor

@ch4r10t33r ch4r10t33r commented May 8, 2026

Root cause of Hive test-506 failure

Failing image: 14222bc (v0.4.16) — predates the runtime fix in #845.

The Hive simulator sends encode_request_raw(&[0xab; 64]) as an inbound BlocksByRoot request:

  • Wire: 25 bytes = varint(64) || snappy_frame_compress([0xab; 64])
  • After zeam's snappy frame decode: 64 bytes all 0xAB
  • SSZ offset (bytes 0–3 LE): 0xABABABAB = 2880154539

Without the guard added in #845 the SSZ deserializer sliced bytes[2880154539..] on a 64-byte buffer and panicked, killing the network FFI thread and aborting zeam. The HTTP liveness check then timed out.

Same bytes caused the earlier v0.4.15 failure (image 993f193).

Status

The runtime fix is validateBlocksByRootRequestBytes in #845 (now in v0.4.17). This PR does not change runtime behaviour.

What this PR does

The regression test merged with #845 used a hand-crafted 24-byte approximation (0xABABAAAB...) rather than the exact 64-byte decompressed bytes the simulator sends. This PR:

  • Pins case 1 to the real Hive input: [0xAB] ** 64
  • Adds offset=0 (null-prefix) and offset=0xDEADBEEF (non-uniform garbage) rejection cases
  • Adds a single-root valid case (offset=4, 36 bytes) to guard against over-rejection
  • Fixes the misleading "24-byte payload" comment and clarifies test scope (post-decompression only)
  • Replaces ephemeral Hive suite IDs with durable image SHAs in comments

Refs #843 (upstream ssz.zig bounds-check — the long-term fix; this PR is only the test layer).

The existing regression test for the `reqresp/blocks_by_root/malformed_request`
DoS (fixed in #845) used a manually-crafted 24-byte payload rather than the
exact bytes that the Hive simulator actually sends.

The Hive test does `encode_request_raw(&[0xab; 64])`:
  - Wire: 25 bytes = varint(64) || snappy_compress([0xab; 64])
  - Decompressed (what the Zig validator receives): 64 bytes all 0xAB
  - SSZ offset (bytes 0–3 little-endian): 0xABABABAB = 2880154539

This exact payload triggered the panic in both the v0.4.15 Hive run
(suite 1778164266) and the v0.4.16 run (suite 1778192103, test-506,
May 7 2026). The comment incorrectly described the input as a
"24-byte payload"; this commit corrects it and updates case 1 of the
regression test to use the real 64-byte all-0xAB decompressed bytes,
so any future drift in the validation guard would be caught against
the genuine simulator input rather than a close-but-different proxy.

No behaviour change — the same `offset != 4` guard rejects both the
old 24-byte proxy and the corrected 64-byte input.
@ch4r10t33r ch4r10t33r marked this pull request as ready for review May 8, 2026 07:35
@zclawz
Copy link
Copy Markdown
Contributor

zclawz commented May 8, 2026

Adversarial review

Test-only change, low blast-radius. Net-positive — pinning the regression to the exact decompressed Hive bytes is strictly better than the 24-byte proxy. CI is green. Below are the things I'd push back on before merge.

Factual claims — verified

  • ✅ Hive reqresp/blocks_by_root/malformed_request does send vec![0xab; 64] via encode_request_rawsimulators/lean/src/scenarios/reqresp.rs:1045-1047, which is varint(64) || snappy_frame_compress(&[0xab; 64]) per simulators/lean/src/utils/libp2p_mock.rs:345-354. After zeam's snappy frame decode, the buffer handed to validateBlocksByRootRequestBytes is exactly 64 bytes of 0xAB. Offset = 0xABABABAB = 2880154539. ✅
  • network: pre-validate BlocksByRoot RPC bytes to stop SSZ-panic DoS #845 is merged (2026-05-07T16:52Z), v0.4.17 release commit (Release v0.4.17 (Devnet4) #846) merged 2026-05-07T19:17Z, so the assertion that v0.4.17 contains the runtime fix while v0.4.16 does not is correct.
  • validateBlocksByRootRequestBytes rejects both the old 24-byte proxy and the new 64-byte real input identically (offset != 4 check on bytes 0..4 — same value 0xABABABAB/0xABABAAAB in both cases, both ≠ 4).

Genuine pushback

1. The PR title and body claim this is the regression test for the v0.4.16 / test-506 failure, but the code-path being exercised isn't the same one Hive crashed. Hive's failure mode is: bytes arrive on the wire → snappy frame decode → SSZ. The test only exercises the post-snappy SSZ stage. If a future regression were in the snappy framing layer or the wire/length pre-checks (e.g. minInLength/maxInLength mis-sized, or the snappy decoder rejecting [0xab;64] for some reason), this test would still pass and Hive would still red. Suggest at least one comment line acknowledging the test scope is "decompressed bytes only" and that an end-to-end harness (which the project doesn't have today) would be the real lock. As-is, the body's "ensures future validation drift would be caught against the genuine simulator bytes" overstates what the test does — it locks the validator against drift, not the full ingress pipeline.

2. The Closes #843 line is wrong. #843 is the upstream ssz.zig bounds-check tracking issue ("ssz.zig container deserializer can panic on malformed peer-supplied input"). This PR neither fixes that panic surface nor reduces its scope — the comment in validateBlocksByRootRequestBytes itself flags this as a temporary shim until ssz.zig grows the bounds check. A test refactor in zeam doesn't close an upstream codec issue. The body even contradicts itself: "the upstream ssz.zig bounds check is still the long-term fix". Drop Closes #843 (or downgrade to "Refs #843"); otherwise merging this auto-closes a still-relevant tracker.

3. Case 1's loss of the "high-bit but not all-0xAB" coverage. The old test wrote 0xABABAAAB deliberately (note the AAAB, not ABAB) — a hand-picked offset that's still way out of range but exercises a slightly different bit pattern in bytes[0..4]. The new code is 0xAB ** 64, so all 64 bytes including the offset are now homogeneous. We've lost ~no real coverage (the validator only looks at bytes[0..4] and length), but it's a one-line buff to also keep a "non-uniform garbage" case — e.g. retain a small variant with 0xDEADBEEF offset and 60 bytes of mixed garbage. As written, if a future contributor "optimises" validateBlocksByRootRequestBytes to e.g. reject only when all bytes are equal (a silly but plausible bug), the test would still pass.

4. Case 2 (bad_offset) is now arguably weaker than before. Old test had cases 1 and 2 covering two distinct out-of-range offset families. New case 1 has offset 0xABABABAB (2.88e9, way past body.len); case 2 has offset 8 (close to 4, inside the buffer). Good coverage of "way over" + "near-legal-but-wrong". But there's no case for offset = 0, offset = 3 (less than 4-byte minimum offset), or offset = bytes.len (one-past-end exactly). With ssz.zig still able to panic on these, please add at least offset = 0 — that's the classic null-prefix attack and a rational adversary's first try.

5. The compile-time assert for @sizeOf(types.Root) == 32 lives in validateBlocksByRootRequestBytes (production code) but the test doesn't exercise the assertion explicitly. Not a blocker for this PR, but if Root ever gets wrapped in a struct with padding, the compile-time guard fires and CI breaks at build time — which is good. Document that contract in the test comment too so a future contributor reading just the test understands the 32-byte assumption.

6. Body length math still trusts unsigned division/modulo even with attacker-controlled offset. Walking through validateBlocksByRootRequestBytes with offset = 4, bytes.len = 4: body = bytes[4..] is empty (len 0), 0 % 32 == 0, 0 / 32 == 0 ≤ MAX_REQUEST_BLOCKS — passes (correctly: empty list is valid, sanity case 5 confirms this). With offset = 4, bytes.len = 5: body.len = 1, 1 % 32 = 1 ≠ 0 → rejected. OK. With offset = 4, bytes.len = 36: body.len = 32, 32 % 32 = 0, 32 / 32 = 1 → accepted (one root). All clean. But the test does not include the "exactly one root, offset=4, 36 bytes total" valid case anywhere — only the empty-list happy path (case 5). I'd add that as a sixth case so we're sure the rejection path doesn't drift to over-rejecting non-empty valid requests.

7. Allocator hygiene in the per-case defer req.deinit() pattern is fine, but the test relies on succeeded = false → deinitPayload(...) in ReqRespRequest.deserialize's errdefer. If a future contributor refactors that function and forgets the deinit on error, the validate rejections will start leaking allocations. Not visible from test output but std.testing.allocator would catch it. Worth a one-line note pointing at that wiring so the leak-check is intentional, not accidental.

8. The "Hive suite 1778164266 / 1778192103" identifiers are external and ephemeral. Hive runs get GC'd. In ~6 months these IDs in the source comment will 404. Suggest including the SHA of the failing image and the commit on main you tested against (already in the PR body — promote it into the test comment too). image 14222bc / v0.4.16 is more durable than a Hive suite number.

Nits

Verdict

LGTM after addressing #2 (drop Closes #843) and #4 + #6 (add offset = 0 and a 1-root-valid case). Everything else is suggestion, not blocker.

— review by zclawz 🐭

… test

Addresses review on #853:

- Add offset=0 (null-prefix attack) and offset=0xDEADBEEF (non-uniform
  garbage) as rejection cases so the guard isn't only tested against
  homogeneous 0xAB bytes.
- Add a single-root valid case (offset=4, 36 bytes) to prevent silent
  over-rejection of legitimate non-empty requests.
- Rename cases 1-4 to 1-8 and add clear labels to each block.
- Expand the header comment to acknowledge the test's scope (post-
  decompression only), document the compile-time Root==32 assumption,
  and explain the leak-check guarantee via std.testing.allocator.
- Replace Hive suite IDs with image SHAs (more durable).
- Fix "After the fix" → "After the guard added in #845".
@zclawz
Copy link
Copy Markdown
Contributor

zclawz commented May 8, 2026

Re-review on fc87147d

All blockers from the prior review are addressed. Walking through:

Prior point Status
#2 — drop Closes #843 ✅ Body now says Refs #843
#4 — add offset = 0 (null-prefix) ✅ Case 3
#6 — add a 1-root valid case ✅ Case 8 (offset=4, 32-byte body, asserts len == 1 post-deserialize)
#1 — call out test scope ✅ "Scope: this test exercises the post-decompression validation path only…" added
#3 — non-uniform garbage ✅ Case 2 with 0xDEADBEEF + 0x5A body
#5 — document @sizeOf(Root) == 32 contract ✅ Now in the test header comment
#7errdefer allocator-leak wiring ✅ Documented in header
#8 — promote durable image SHAs over Hive suite IDs ✅ "image 993f193 (v0.4.15)" / "image 14222bc (v0.4.16)" — suite IDs gone
Nit — body framing ✅ "The regression test merged with #845 used a hand-crafted 24-byte approximation…" — honest framing kept

Local verification on fc87147d:

5/12 interface.test.ReqRespRequest.deserialize rejects malformed BlocksByRoot without panicking...OK
All 12 tests passed.

(network package tests, full local zig build test run; std.testing.allocator clean — no leaks.)

Per-case sanity check against validateBlocksByRootRequestBytes:

  • Case 1 ([0xAB] ** 64): offset 0xABABABAB ≠ 4 → reject ✅
  • Case 2 (offset=0xDEADBEEF, body=0x5A * 32): offset ≠ 4 → reject ✅
  • Case 3 (offset=0, body=zeros): offset ≠ 4 → reject ✅
  • Case 4 (offset=8, body=zeros): offset ≠ 4 → reject ✅
  • Case 5 (offset=4, 33-byte body): 33 % 32 ≠ 0 → reject ✅
  • Case 6 (offset=4, MAX_REQUEST_BLOCKS+1 roots): count > limit → reject ✅
  • Case 7 (offset=4, empty body): all checks pass → accept ✅
  • Case 8 (offset=4, 32-byte body): body.len % 32 == 0, count=1 → accept ✅

Every branch of the validator is now covered.

Governance note: I'm not posting a formal "Approve" review (per project policy on bot reviews not being load-bearing on branch protection for consensus-critical code). Treat this as LGTM from the agent — defer to a human approver for the actual merge gate.

— zclawz 🐭

@ch4r10t33r ch4r10t33r merged commit 453063a into main May 8, 2026
13 checks passed
@ch4r10t33r ch4r10t33r deleted the fix/malformed-request-hive-506-regression branch May 8, 2026 09:42
@zclawz zclawz mentioned this pull request May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants