network: pin malformed BlocksByRoot regression test to exact Hive bytes#853
Conversation
The existing regression test for the `reqresp/blocks_by_root/malformed_request` DoS (fixed in #845) used a manually-crafted 24-byte payload rather than the exact bytes that the Hive simulator actually sends. The Hive test does `encode_request_raw(&[0xab; 64])`: - Wire: 25 bytes = varint(64) || snappy_compress([0xab; 64]) - Decompressed (what the Zig validator receives): 64 bytes all 0xAB - SSZ offset (bytes 0–3 little-endian): 0xABABABAB = 2880154539 This exact payload triggered the panic in both the v0.4.15 Hive run (suite 1778164266) and the v0.4.16 run (suite 1778192103, test-506, May 7 2026). The comment incorrectly described the input as a "24-byte payload"; this commit corrects it and updates case 1 of the regression test to use the real 64-byte all-0xAB decompressed bytes, so any future drift in the validation guard would be caught against the genuine simulator input rather than a close-but-different proxy. No behaviour change — the same `offset != 4` guard rejects both the old 24-byte proxy and the corrected 64-byte input.
Adversarial reviewTest-only change, low blast-radius. Net-positive — pinning the regression to the exact decompressed Hive bytes is strictly better than the 24-byte proxy. CI is green. Below are the things I'd push back on before merge. Factual claims — verified
Genuine pushback1. The PR title and body claim this is the regression test for the v0.4.16 / test-506 failure, but the code-path being exercised isn't the same one Hive crashed. Hive's failure mode is: bytes arrive on the wire → snappy frame decode → SSZ. The test only exercises the post-snappy SSZ stage. If a future regression were in the snappy framing layer or the wire/length pre-checks (e.g. 2. The 3. Case 1's loss of the "high-bit but not all- 4. Case 2 ( 5. The compile-time assert for 6. Body length math still trusts unsigned division/modulo even with attacker-controlled offset. Walking through 7. Allocator hygiene in the per-case 8. The "Hive suite 1778164266 / 1778192103" identifiers are external and ephemeral. Hive runs get GC'd. In ~6 months these IDs in the source comment will 404. Suggest including the SHA of the failing image and the commit on Nits
VerdictLGTM after addressing #2 (drop — review by zclawz 🐭 |
… test Addresses review on #853: - Add offset=0 (null-prefix attack) and offset=0xDEADBEEF (non-uniform garbage) as rejection cases so the guard isn't only tested against homogeneous 0xAB bytes. - Add a single-root valid case (offset=4, 36 bytes) to prevent silent over-rejection of legitimate non-empty requests. - Rename cases 1-4 to 1-8 and add clear labels to each block. - Expand the header comment to acknowledge the test's scope (post- decompression only), document the compile-time Root==32 assumption, and explain the leak-check guarantee via std.testing.allocator. - Replace Hive suite IDs with image SHAs (more durable). - Fix "After the fix" → "After the guard added in #845".
Re-review on
|
| Prior point | Status |
|---|---|
#2 — drop Closes #843 |
✅ Body now says Refs #843 |
#4 — add offset = 0 (null-prefix) |
✅ Case 3 |
| #6 — add a 1-root valid case | ✅ Case 8 (offset=4, 32-byte body, asserts len == 1 post-deserialize) |
| #1 — call out test scope | ✅ "Scope: this test exercises the post-decompression validation path only…" added |
| #3 — non-uniform garbage | ✅ Case 2 with 0xDEADBEEF + 0x5A body |
#5 — document @sizeOf(Root) == 32 contract |
✅ Now in the test header comment |
#7 — errdefer allocator-leak wiring |
✅ Documented in header |
| #8 — promote durable image SHAs over Hive suite IDs | ✅ "image 993f193 (v0.4.15)" / "image 14222bc (v0.4.16)" — suite IDs gone |
| Nit — body framing | ✅ "The regression test merged with #845 used a hand-crafted 24-byte approximation…" — honest framing kept |
Local verification on fc87147d:
5/12 interface.test.ReqRespRequest.deserialize rejects malformed BlocksByRoot without panicking...OK
All 12 tests passed.
(network package tests, full local zig build test run; std.testing.allocator clean — no leaks.)
Per-case sanity check against validateBlocksByRootRequestBytes:
- Case 1 (
[0xAB] ** 64): offset0xABABABAB ≠ 4→ reject ✅ - Case 2 (
offset=0xDEADBEEF, body=0x5A * 32): offset ≠ 4 → reject ✅ - Case 3 (
offset=0, body=zeros): offset ≠ 4 → reject ✅ - Case 4 (
offset=8, body=zeros): offset ≠ 4 → reject ✅ - Case 5 (
offset=4, 33-byte body):33 % 32 ≠ 0→ reject ✅ - Case 6 (
offset=4,MAX_REQUEST_BLOCKS+1roots): count > limit → reject ✅ - Case 7 (
offset=4, empty body): all checks pass → accept ✅ - Case 8 (
offset=4, 32-byte body):body.len % 32 == 0, count=1 → accept ✅
Every branch of the validator is now covered.
Governance note: I'm not posting a formal "Approve" review (per project policy on bot reviews not being load-bearing on branch protection for consensus-critical code). Treat this as LGTM from the agent — defer to a human approver for the actual merge gate.
— zclawz 🐭
Root cause of Hive test-506 failure
Failing image:
14222bc(v0.4.16) — predates the runtime fix in #845.The Hive simulator sends
encode_request_raw(&[0xab; 64])as an inboundBlocksByRootrequest:varint(64) || snappy_frame_compress([0xab; 64])0xAB0xABABABAB = 2880154539Without the guard added in #845 the SSZ deserializer sliced
bytes[2880154539..]on a 64-byte buffer and panicked, killing the network FFI thread and aborting zeam. The HTTP liveness check then timed out.Same bytes caused the earlier v0.4.15 failure (image
993f193).Status
The runtime fix is
validateBlocksByRootRequestBytesin #845 (now in v0.4.17). This PR does not change runtime behaviour.What this PR does
The regression test merged with #845 used a hand-crafted 24-byte approximation (
0xABABAAAB...) rather than the exact 64-byte decompressed bytes the simulator sends. This PR:[0xAB] ** 640xDEADBEEF(non-uniform garbage) rejection casesRefs #843 (upstream ssz.zig bounds-check — the long-term fix; this PR is only the test layer).