feat(bench): VM integration test infrastructure + UDP broadcast race fix by maxholman · Pull Request #28 · block65/wallhack

maxholman · 2026-02-22T06:44:17Z

Summary

QEMU VM integration test harness (bench/): two microVMs connected via virtio-net over a Unix socketpair. Entry VM runs wallhack entry, exit VM runs wallhack exit. The entry VM runs a deterministic payload transfer (TCP + UDP echo via the tunnel) and reports a JSON result token. The Python test runner (run_tests.py) orchestrates both VMs, drains their stdout into ring buffers, and checks for pass/fail tokens. Covers smoke (connectivity) and resilience (netem loss/delay) scenarios for QUIC and WebSocket transports.
UDP broadcast race fix (fix(transport): subscribe before spawn): broadcast::send() returns Err(SendError) when there are no active receivers — not just when receivers are lagging. The subscription was created inside the spawned run_send_responses task, after open_uni() returned. For WebSocket/yamux, open_uni() requires a round-trip through the yamux driver; a fast UDP echo could arrive before the subscription was established, killing run_udp_recv silently. Fix: subscribe before spawning and before open_uni(). Functions renamed run_data_out_{instructions,responses} → run_send_{instructions,responses} with signatures that accept a pre-created broadcast::Receiver<T>, making the subscribe-before-open contract compiler-enforced.
socat EOF datagram fix (fix(bench/vm)): socat -T 5 - UDP4:... sends a 0-byte UDP datagram when stdin reaches EOF. This races against the real echo response through the tunnel — the 0-byte echo arrives first, socat writes 0 bytes and exits, producing a SHA256 mismatch (e3b0c44 = hash of empty string). Replaced with nc -u -w 5 which has no EOF-signalling behaviour.
Debugging improvements: failure log tail bumped 50 → 150 lines; saved log directory path printed on failure; UDP empty-response error distinguishes 0-byte from corrupt with an actionable message.

Test plan

just bench::smoke — 2/2 scenarios pass (QUIC + WebSocket)
just bench::resilience — 6/6 scenarios (netem loss/delay combinations)
just check — fmt, clippy, cargo tests, smoke, resilience

Replace the netns/pytest/sudo/pyroute2 test suite with a self-contained QEMU VM pair architecture. The old suite required root, accumulated sudo prompts that silently blocked automated runs, and produced log noise that made CI debugging impossible. New architecture: - Two QEMU VMs per test scenario connected by a socketpair L2 link - VMs are self-contained: kernel boots our PID 1 init script directly - Configuration via kernel cmdline params (wallhack.role/scenario/transport/…) - Results reported as WALLHACK_RESULT: JSON on the serial console - Host runner (Python stdlib only, no venv) polls stdout ring buffers - No SSH, no root, no sudo, no pytest, no pyroute2 New files: - bench/vm/init.sh — VM PID 1 init script (exit + entry roles) - bench/run_tests.py — smoke + resilience + debug-topology runner - bench/run_benchmarks.py — benchmark runner (iperf3, min/median/max JSON) - bench/setup-vm.sh — builds base.qcow2 via cloud-init (run once) Justfile additions: smoke, resilience, benchmark, build-release, debug-topology, setup-vm, fetch-iperf3. just check now runs smoke + resilience alongside cargo test. just benchmark is separate (not in check). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Calling socket.abort() sends RST, which causes the peer to lose data still in its receive buffer — resulting in ECONNRESET even when the transfer completed successfully. Using socket.close() sends FIN and lets the four-way shutdown complete normally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

UDP send_to and response-delivery messages were at TRACE level, making them invisible with --debug. Elevating them means the full UDP round-trip is visible in the ring buffer during integration test failures without needing --trace. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

getrandom(2) blocks until the CRNG has 256 bits of entropy, causing silent hangs in crypto startup on entropy-starved VMs. Probing /dev/random with O_NONBLOCK at startup surfaces the wait as a visible warning rather than an unexplained hang. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- init.sh: replace urandom payloads with deterministic inputs (all-zero TCP, 'U'*64 UDP) so hash mismatches immediately reveal whether the response was empty or corrupted; add hexdump on UDP mismatch - init.sh: replace sleep 2 with socat ,shut-down for half-close so the TCP echo round-trip completes without a mandatory 2s delay per test - init.sh: remove shellcheck disables — use ${DEBUG:+"--debug"} for optional wallhack flags and ${DELAY:+...}/${LOSS:+...} for tc args - init.sh: add socat-tcp log to _fail diagnostics; tee wallhack output to serial console and log file simultaneously - run_tests.py / run_benchmarks.py: fix seen_count deque rotation bug where tokens could be missed once the ring buffer reached maxlen - run_tests.py / run_benchmarks.py: suppress kernel boot noise with quiet loglevel=0 to keep the ring buffer useful - bench.just: add iperf3 to build-initrd so benchmark scenario works - bench/vm/kernel/: add Docker build context for static kernel + socat Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Extracts bench and website targets into dedicated module files (bench/bench.just, website.just) so the root justfile stays focused on the top-level gate. Also fixes gitignore to cover python cache and the old root-level /bin/ artifact directory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

bench/bin/iperf3 is obsolete (iperf3 now lives under bench/vm/staging/, covered by bench/.gitignore). bench/results duplicates the results/ entry already in bench/.gitignore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Broadcast::send() returns Err(SendError) when there are no active receivers, not when receivers are lagging — this silently killed run_udp_recv on the exit side if a UDP echo arrived before the run_send_responses task had subscribed. The subscription was created inside the spawned task, after open_uni() returned. For WebSocket/yamux, open_uni() requires a round-trip through the yamux driver; a fast UDP echo from the far side of the tunnel could arrive and be sent to the broadcast channel before the subscription was established. Fix: subscribe() is called before the task is spawned and before open_uni(), so buffered messages are preserved even during stream setup. Also renames run_data_out_{instructions,responses} to run_send_{instructions,responses} and changes signatures to accept a pre-created broadcast::Receiver<T> instead of &broadcast::Sender<T>, making the subscribe-before-open contract explicit and compiler-enforced. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

socat -T 5 - UDP4:... sends a 0-byte UDP datagram when stdin reaches EOF (after the payload file is fully read). This races against the real echo response arriving through the tunnel — if the 0-byte echo arrives first, socat writes 0 bytes to the output file and exits, producing a SHA256 mismatch (response_size=0, hash matches empty string). nc -u does not have this EOF-signalling behaviour; it reads stdin, sends it as a single datagram, then waits for the -w timeout before exiting. Both QUIC and WebSocket smoke tests pass reliably with nc. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… style AGENTS.md: expand naming conventions section to cover domain logic vs transport layer distinction, prohibited directional terms, required vector-based terminology, and CLI consistency rules. TODO.md: add backlog item for topology/directional naming refactor. wallhack.rs: remove redundant info log on default startup; use let-else chain for entropy check (more idiomatic). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

run_tests.py: - Print saved log directory path on failure so logs are immediately findable without having to browse bench/results/logs manually. - Bump default failure tail from 50 → 150 lines; 50 was routinely missing the interesting early-boot context on init script failures. init.sh: - On UDP sha256 mismatch, distinguish empty response (0 bytes) from a genuine corrupt response with a specific, actionable error message. The original "UDP echo mismatch: expected=... got=e3b0c44..." output silently hid that the response was empty — e3b0c44 is the SHA256 of the empty string, which was opaque during debugging. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ebsite.just bridge.rs: replace |r| std::mem::discriminant(r) with std::mem::discriminant (clippy::redundant_closure). website.just: add `set working-directory := "website"` so pnpm commands run from the package root; the module was executing from the repo root where no package.json exists, causing `just check` to fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Start entry VM first (it's the listener) and wait for WALLHACK_ENTRY_READY_MAGIC_TOKEN before starting exit. The exit node now connects on its first attempt, eliminating the ~500ms retry backoff that occurred when exit started while entry wasn't yet listening. WALLHACK_ENTRY_READY_MAGIC_TOKEN is emitted after wallhack binds its listen port, detected via /proc/net/{udp,tcp} poll at 100ms intervals — no arbitrary sleeps. Also: - INITIAL_RETRY_DELAY: 1s → 50ms (exit + relay) - Add RECONNECT_DELAY: 500ms for post-session drops with reset - Remove unused _wait_for_host from init.sh - Extract _qemu_base() to DRY up QEMU machine args in run_tests.py - Add debug-shell subcommand: boots single interactive busybox VM via rdinit=/bin/sh for kernel/OS debugging - gzip -1 for faster initrd packing Result: wallhack handshake drops from ~1s → ~62ms (10×). Smoke suite: 2/2 pass in ~24s wall clock (both transports). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

busybox nc -u -w N waits the full N-second inactivity timeout after receiving the echo reply, even though the response is already in hand. 1s is still generous for any realistic tunnel RTT (netem max is 100ms). Smoke suite: 2/2 pass in ~8s (was ~24s). Also add `just bench shell` recipe: boots a single interactive busybox VM via rdinit=/bin/sh for kernel/OS debugging. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

parse_listen_addr expanded bare ':port' to '[::]:port', so the display always claimed IPv6 regardless of intent. Two fixes: 1. Expand ':port' to '0.0.0.0:port' — explicit IPv4 wildcard that matches the common case and the exit node's connect behaviour. IPv6 listeners can still be configured explicitly via '[::]:port'. 2. Add local_addr() to the Server trait (impl via endpoint/listener) and use it for the "Listening on ..." log line, so the display reflects the address the OS actually bound rather than the pre-bind parsed argument. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

0.0.0.0 was a breaking change — [::] on Linux is dual-stack by default (IPV6_V6ONLY=0), accepting both IPv4 and IPv6 clients, which is the desired behaviour. The confusing display was the real issue, not the bind address. local_addr() now shows the actual OS-bound address ([::]:port) which is accurate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add crates/cli/src/net.rs with SocketAddrExt trait (.bind_addr()) and shared parse_listen_addr() that probes IPv6 availability and falls back to 0.0.0.0 on IPv4-only kernels, and resolves hostnames via ToSocketAddrs - Remove three duplicate parse_listen_addr() functions and bind_for() from exit.rs; all callers now use crate::net - Fix all ClientConfig constructions in entry.rs and relay.rs to set bind: addr.bind_addr() instead of inheriting DEFAULT_BIND_ADDRESS (Ipv6Addr::UNSPECIFIED), which crashed on IPv4-only kernels - All listen displays now use server.local_addr()? (actual bound address) instead of the pre-bind parsed address Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- bench/vm/kernel/Dockerfile: add -d IPV6 to build an IPv4-only kernel, add COPY of .config to final stage so kernel config is inspectable - bench/vm/init.sh: explicitly bind entry on 0.0.0.0 for IPv4-only kernel, reduce _wait_for_tun timeout from 45s to 5s - bench/run_tests.py: minor formatting fixes (black style), add panic=-1 to debug-shell kernel cmdline so crashes exit immediately Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Replace the hardcoded DEFAULT_BIND_ADDRESS (Ipv6Addr::UNSPECIFIED) with a runtime probe using socket2: ask the kernel to create an AF_INET6 DGRAM socket — no port allocated, purely a capability check. Falls back to Ipv4Addr::UNSPECIFIED on IPv4-only kernels. ipv6_supported() is public in wallhack::client::config so cli::net can reuse it for parse_listen_addr() bare-port expansion, keeping the probe logic in one place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- bench/vm/kernel/Dockerfile: remove -d IPV6, restore default kernel (IPv6 available via olddefconfig default y) - bench/vm/init.sh: revert 0.0.0.0 back to bare :port so parse_listen_addr probe mechanism is exercised in the bench environment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

socat version bumps no longer invalidate the kernel layer cache (~5min build). Each now has its own Dockerfile and just recipe with independent caching. bench/vm/socat/Dockerfile uses a lighter apt install (no kernel build tools). build-initrd now depends on build-kernel and build-socat separately. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- bench/vm/socat/Dockerfile: remove default from ARG SOCAT_VERSION; version is required from caller (bench.just is sole source of truth), consistent with KERNEL_VERSION pattern - client/config.rs: add #[must_use] and AF_INET6 backtick to ipv6_supported() - exit.rs: remove unused Context import, backtick INITIAL_RETRY_DELAY in doc, drop redundant ..Default::default() where all fields are explicit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Eliminate ~150 lines of duplicated code between run_tests.py and run_benchmarks.py by pulling shared constants, preflight, QEMU helpers, log drain, and wait utilities into a common module. qemu_cmd() is unified with optional netem/metric kwargs so both runners share a single implementation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Switch inter-VM networking from AF_UNIX socketpair + fd= to TCP listen/connect ports. The socketpair approach triggered a QEMU 10.2.0 assertion (net_fill_rstate: size == 0) under high-throughput iperf3 TCP load. TCP listen/connect avoids this code path entirely and matches the TASK.md spec. - Replace ping-based latency with iperf3 --json mean_rtt. Busybox ping hung indefinitely waiting for late ICMP replies when packets were dropped. iperf3 TCP RTT (mean_rtt field, microseconds) is more reliable and consistent with the other benchmark scenarios. - Fix iperf3 JSON parsing: iperf3 3.20 uses tab after colon ("field":\tvalue) not space, so -F': ' never split. Use match() to extract the first digit sequence on the matching line. - Switch all throughput scenarios from plain text -f m (integer Mbps) to --json bits_per_second / 1e6 for sub-Mbps precision. - Fix wallhack_version() parsing: --version output is multi-line; split()[-1] was grabbing the compiler date. Use split()[1] instead. - Add docstring to qemu_base() documenting append vs extra parameters. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Sweep of website/ deps to latest within ranges, plus a vite downgrade from 8 -> 7 to match astro's transitive vite (7.3.2) and avoid a rolldown regression with @tailwindcss/vite 4.2.4. Closes alerts #28 #29 #30 #31 #33 #34 #35 #36 #37 #38 #39 #40 #44 #48 covering vite, picomatch, postcss, yaml, astro, smol-toml. - vite ^8.0.1 -> ^7.3.2 (drops the now-redundant vite 8 lineage; astro pulls 7.3.2 transitively, which is the patched version) - astro 6.0.6 -> 6.2.2 (#44) - @tailwindcss/vite 4.2.2 -> 4.2.4 - smol-toml: lockfile bump to 1.6.1 (#28) - postcss: lockfile bump to 8.5.14 (#48) - picomatch: lockfile bumps to 2.3.2 + 4.0.4 (#29 #30 #39 #40) - yaml is now omitted entirely (it was an optional vite peer) Verified: pnpm build succeeds; no @tailwindcss/vite peer-dep warnings.

maxholman and others added 28 commits February 21, 2026 11:31

chore(todo): mark bounded broadcast channels task complete

12ad533

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs: add task backlog for incremental improvements

906c7a5

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: remove stale bench entries from root gitignore

7a5fb36

bench/bin/iperf3 is obsolete (iperf3 now lives under bench/vm/staging/, covered by bench/.gitignore). bench/results duplicates the results/ entry already in bench/.gitignore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(bench): handle missing transport attr for debug-shell subcommand

68a4927

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

chore: add --version verbosity TODO item

de3211e

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

maxholman merged commit 1514b46 into main Feb 22, 2026
4 checks passed

maxholman mentioned this pull request May 6, 2026

fix(deps): clear 14 remaining dependabot advisories in website #108

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bench): VM integration test infrastructure + UDP broadcast race fix#28

feat(bench): VM integration test infrastructure + UDP broadcast race fix#28
maxholman merged 28 commits intomainfrom
feat/vm-integration-tests

maxholman commented Feb 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

maxholman commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

maxholman commented Feb 22, 2026 •

edited

Loading