feat(bench): VM integration test infrastructure + UDP broadcast race fix#28
Merged
feat(bench): VM integration test infrastructure + UDP broadcast race fix#28
Conversation
Replace the netns/pytest/sudo/pyroute2 test suite with a self-contained QEMU VM pair architecture. The old suite required root, accumulated sudo prompts that silently blocked automated runs, and produced log noise that made CI debugging impossible. New architecture: - Two QEMU VMs per test scenario connected by a socketpair L2 link - VMs are self-contained: kernel boots our PID 1 init script directly - Configuration via kernel cmdline params (wallhack.role/scenario/transport/…) - Results reported as WALLHACK_RESULT: JSON on the serial console - Host runner (Python stdlib only, no venv) polls stdout ring buffers - No SSH, no root, no sudo, no pytest, no pyroute2 New files: - bench/vm/init.sh — VM PID 1 init script (exit + entry roles) - bench/run_tests.py — smoke + resilience + debug-topology runner - bench/run_benchmarks.py — benchmark runner (iperf3, min/median/max JSON) - bench/setup-vm.sh — builds base.qcow2 via cloud-init (run once) Justfile additions: smoke, resilience, benchmark, build-release, debug-topology, setup-vm, fetch-iperf3. just check now runs smoke + resilience alongside cargo test. just benchmark is separate (not in check). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Calling socket.abort() sends RST, which causes the peer to lose data still in its receive buffer — resulting in ECONNRESET even when the transfer completed successfully. Using socket.close() sends FIN and lets the four-way shutdown complete normally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
UDP send_to and response-delivery messages were at TRACE level, making them invisible with --debug. Elevating them means the full UDP round-trip is visible in the ring buffer during integration test failures without needing --trace. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
getrandom(2) blocks until the CRNG has 256 bits of entropy, causing silent hangs in crypto startup on entropy-starved VMs. Probing /dev/random with O_NONBLOCK at startup surfaces the wait as a visible warning rather than an unexplained hang. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- init.sh: replace urandom payloads with deterministic inputs (all-zero
TCP, 'U'*64 UDP) so hash mismatches immediately reveal whether the
response was empty or corrupted; add hexdump on UDP mismatch
- init.sh: replace sleep 2 with socat ,shut-down for half-close so the
TCP echo round-trip completes without a mandatory 2s delay per test
- init.sh: remove shellcheck disables — use ${DEBUG:+"--debug"} for
optional wallhack flags and ${DELAY:+...}/${LOSS:+...} for tc args
- init.sh: add socat-tcp log to _fail diagnostics; tee wallhack output
to serial console and log file simultaneously
- run_tests.py / run_benchmarks.py: fix seen_count deque rotation bug
where tokens could be missed once the ring buffer reached maxlen
- run_tests.py / run_benchmarks.py: suppress kernel boot noise with
quiet loglevel=0 to keep the ring buffer useful
- bench.just: add iperf3 to build-initrd so benchmark scenario works
- bench/vm/kernel/: add Docker build context for static kernel + socat
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Extracts bench and website targets into dedicated module files (bench/bench.just, website.just) so the root justfile stays focused on the top-level gate. Also fixes gitignore to cover python cache and the old root-level /bin/ artifact directory. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
bench/bin/iperf3 is obsolete (iperf3 now lives under bench/vm/staging/, covered by bench/.gitignore). bench/results duplicates the results/ entry already in bench/.gitignore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Broadcast::send() returns Err(SendError) when there are no active
receivers, not when receivers are lagging — this silently killed
run_udp_recv on the exit side if a UDP echo arrived before the
run_send_responses task had subscribed.
The subscription was created inside the spawned task, after open_uni()
returned. For WebSocket/yamux, open_uni() requires a round-trip through
the yamux driver; a fast UDP echo from the far side of the tunnel could
arrive and be sent to the broadcast channel before the subscription was
established.
Fix: subscribe() is called before the task is spawned and before
open_uni(), so buffered messages are preserved even during stream setup.
Also renames run_data_out_{instructions,responses} to
run_send_{instructions,responses} and changes signatures to accept
a pre-created broadcast::Receiver<T> instead of &broadcast::Sender<T>,
making the subscribe-before-open contract explicit and compiler-enforced.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
socat -T 5 - UDP4:... sends a 0-byte UDP datagram when stdin reaches EOF (after the payload file is fully read). This races against the real echo response arriving through the tunnel — if the 0-byte echo arrives first, socat writes 0 bytes to the output file and exits, producing a SHA256 mismatch (response_size=0, hash matches empty string). nc -u does not have this EOF-signalling behaviour; it reads stdin, sends it as a single datagram, then waits for the -w timeout before exiting. Both QUIC and WebSocket smoke tests pass reliably with nc. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… style AGENTS.md: expand naming conventions section to cover domain logic vs transport layer distinction, prohibited directional terms, required vector-based terminology, and CLI consistency rules. TODO.md: add backlog item for topology/directional naming refactor. wallhack.rs: remove redundant info log on default startup; use let-else chain for entropy check (more idiomatic). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
run_tests.py: - Print saved log directory path on failure so logs are immediately findable without having to browse bench/results/logs manually. - Bump default failure tail from 50 → 150 lines; 50 was routinely missing the interesting early-boot context on init script failures. init.sh: - On UDP sha256 mismatch, distinguish empty response (0 bytes) from a genuine corrupt response with a specific, actionable error message. The original "UDP echo mismatch: expected=... got=e3b0c44..." output silently hid that the response was empty — e3b0c44 is the SHA256 of the empty string, which was opaque during debugging. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ebsite.just bridge.rs: replace |r| std::mem::discriminant(r) with std::mem::discriminant (clippy::redundant_closure). website.just: add `set working-directory := "website"` so pnpm commands run from the package root; the module was executing from the repo root where no package.json exists, causing `just check` to fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Start entry VM first (it's the listener) and wait for
WALLHACK_ENTRY_READY_MAGIC_TOKEN before starting exit. The exit node
now connects on its first attempt, eliminating the ~500ms retry backoff
that occurred when exit started while entry wasn't yet listening.
WALLHACK_ENTRY_READY_MAGIC_TOKEN is emitted after wallhack binds its
listen port, detected via /proc/net/{udp,tcp} poll at 100ms intervals —
no arbitrary sleeps.
Also:
- INITIAL_RETRY_DELAY: 1s → 50ms (exit + relay)
- Add RECONNECT_DELAY: 500ms for post-session drops with reset
- Remove unused _wait_for_host from init.sh
- Extract _qemu_base() to DRY up QEMU machine args in run_tests.py
- Add debug-shell subcommand: boots single interactive busybox VM
via rdinit=/bin/sh for kernel/OS debugging
- gzip -1 for faster initrd packing
Result: wallhack handshake drops from ~1s → ~62ms (10×).
Smoke suite: 2/2 pass in ~24s wall clock (both transports).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
busybox nc -u -w N waits the full N-second inactivity timeout after receiving the echo reply, even though the response is already in hand. 1s is still generous for any realistic tunnel RTT (netem max is 100ms). Smoke suite: 2/2 pass in ~8s (was ~24s). Also add `just bench shell` recipe: boots a single interactive busybox VM via rdinit=/bin/sh for kernel/OS debugging. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
parse_listen_addr expanded bare ':port' to '[::]:port', so the display always claimed IPv6 regardless of intent. Two fixes: 1. Expand ':port' to '0.0.0.0:port' — explicit IPv4 wildcard that matches the common case and the exit node's connect behaviour. IPv6 listeners can still be configured explicitly via '[::]:port'. 2. Add local_addr() to the Server trait (impl via endpoint/listener) and use it for the "Listening on ..." log line, so the display reflects the address the OS actually bound rather than the pre-bind parsed argument. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
0.0.0.0 was a breaking change — [::] on Linux is dual-stack by default (IPV6_V6ONLY=0), accepting both IPv4 and IPv6 clients, which is the desired behaviour. The confusing display was the real issue, not the bind address. local_addr() now shows the actual OS-bound address ([::]:port) which is accurate. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add crates/cli/src/net.rs with SocketAddrExt trait (.bind_addr()) and shared parse_listen_addr() that probes IPv6 availability and falls back to 0.0.0.0 on IPv4-only kernels, and resolves hostnames via ToSocketAddrs - Remove three duplicate parse_listen_addr() functions and bind_for() from exit.rs; all callers now use crate::net - Fix all ClientConfig constructions in entry.rs and relay.rs to set bind: addr.bind_addr() instead of inheriting DEFAULT_BIND_ADDRESS (Ipv6Addr::UNSPECIFIED), which crashed on IPv4-only kernels - All listen displays now use server.local_addr()? (actual bound address) instead of the pre-bind parsed address Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- bench/vm/kernel/Dockerfile: add -d IPV6 to build an IPv4-only kernel, add COPY of .config to final stage so kernel config is inspectable - bench/vm/init.sh: explicitly bind entry on 0.0.0.0 for IPv4-only kernel, reduce _wait_for_tun timeout from 45s to 5s - bench/run_tests.py: minor formatting fixes (black style), add panic=-1 to debug-shell kernel cmdline so crashes exit immediately Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace the hardcoded DEFAULT_BIND_ADDRESS (Ipv6Addr::UNSPECIFIED) with a runtime probe using socket2: ask the kernel to create an AF_INET6 DGRAM socket — no port allocated, purely a capability check. Falls back to Ipv4Addr::UNSPECIFIED on IPv4-only kernels. ipv6_supported() is public in wallhack::client::config so cli::net can reuse it for parse_listen_addr() bare-port expansion, keeping the probe logic in one place. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- bench/vm/kernel/Dockerfile: remove -d IPV6, restore default kernel (IPv6 available via olddefconfig default y) - bench/vm/init.sh: revert 0.0.0.0 back to bare :port so parse_listen_addr probe mechanism is exercised in the bench environment Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
socat version bumps no longer invalidate the kernel layer cache (~5min build). Each now has its own Dockerfile and just recipe with independent caching. bench/vm/socat/Dockerfile uses a lighter apt install (no kernel build tools). build-initrd now depends on build-kernel and build-socat separately. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- bench/vm/socat/Dockerfile: remove default from ARG SOCAT_VERSION; version is required from caller (bench.just is sole source of truth), consistent with KERNEL_VERSION pattern - client/config.rs: add #[must_use] and AF_INET6 backtick to ipv6_supported() - exit.rs: remove unused Context import, backtick INITIAL_RETRY_DELAY in doc, drop redundant ..Default::default() where all fields are explicit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Eliminate ~150 lines of duplicated code between run_tests.py and run_benchmarks.py by pulling shared constants, preflight, QEMU helpers, log drain, and wait utilities into a common module. qemu_cmd() is unified with optional netem/metric kwargs so both runners share a single implementation. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Switch inter-VM networking from AF_UNIX socketpair + fd= to TCP
listen/connect ports. The socketpair approach triggered a QEMU 10.2.0
assertion (net_fill_rstate: size == 0) under high-throughput iperf3
TCP load. TCP listen/connect avoids this code path entirely and matches
the TASK.md spec.
- Replace ping-based latency with iperf3 --json mean_rtt. Busybox ping
hung indefinitely waiting for late ICMP replies when packets were
dropped. iperf3 TCP RTT (mean_rtt field, microseconds) is more reliable
and consistent with the other benchmark scenarios.
- Fix iperf3 JSON parsing: iperf3 3.20 uses tab after colon
("field":\tvalue) not space, so -F': ' never split. Use match() to
extract the first digit sequence on the matching line.
- Switch all throughput scenarios from plain text -f m (integer Mbps) to
--json bits_per_second / 1e6 for sub-Mbps precision.
- Fix wallhack_version() parsing: --version output is multi-line;
split()[-1] was grabbing the compiler date. Use split()[1] instead.
- Add docstring to qemu_base() documenting append vs extra parameters.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4 tasks
maxholman
added a commit
that referenced
this pull request
May 6, 2026
Sweep of website/ deps to latest within ranges, plus a vite downgrade from 8 -> 7 to match astro's transitive vite (7.3.2) and avoid a rolldown regression with @tailwindcss/vite 4.2.4. Closes alerts #28 #29 #30 #31 #33 #34 #35 #36 #37 #38 #39 #40 #44 #48 covering vite, picomatch, postcss, yaml, astro, smol-toml. - vite ^8.0.1 -> ^7.3.2 (drops the now-redundant vite 8 lineage; astro pulls 7.3.2 transitively, which is the patched version) - astro 6.0.6 -> 6.2.2 (#44) - @tailwindcss/vite 4.2.2 -> 4.2.4 - smol-toml: lockfile bump to 1.6.1 (#28) - postcss: lockfile bump to 8.5.14 (#48) - picomatch: lockfile bumps to 2.3.2 + 4.0.4 (#29 #30 #39 #40) - yaml is now omitted entirely (it was an optional vite peer) Verified: pnpm build succeeds; no @tailwindcss/vite peer-dep warnings.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
QEMU VM integration test harness (
bench/): two microVMs connected via virtio-net over a Unix socketpair. Entry VM runswallhack entry, exit VM runswallhack exit. The entry VM runs a deterministic payload transfer (TCP + UDP echo via the tunnel) and reports a JSON result token. The Python test runner (run_tests.py) orchestrates both VMs, drains their stdout into ring buffers, and checks for pass/fail tokens. Covers smoke (connectivity) and resilience (netem loss/delay) scenarios for QUIC and WebSocket transports.UDP broadcast race fix (
fix(transport): subscribe before spawn):broadcast::send()returnsErr(SendError)when there are no active receivers — not just when receivers are lagging. The subscription was created inside the spawnedrun_send_responsestask, afteropen_uni()returned. For WebSocket/yamux,open_uni()requires a round-trip through the yamux driver; a fast UDP echo could arrive before the subscription was established, killingrun_udp_recvsilently. Fix: subscribe before spawning and beforeopen_uni(). Functions renamedrun_data_out_{instructions,responses}→run_send_{instructions,responses}with signatures that accept a pre-createdbroadcast::Receiver<T>, making the subscribe-before-open contract compiler-enforced.socat EOF datagram fix (
fix(bench/vm)):socat -T 5 - UDP4:...sends a 0-byte UDP datagram when stdin reaches EOF. This races against the real echo response through the tunnel — the 0-byte echo arrives first, socat writes 0 bytes and exits, producing a SHA256 mismatch (e3b0c44= hash of empty string). Replaced withnc -u -w 5which has no EOF-signalling behaviour.Debugging improvements: failure log tail bumped 50 → 150 lines; saved log directory path printed on failure; UDP empty-response error distinguishes 0-byte from corrupt with an actionable message.
Test plan
just bench::smoke— 2/2 scenarios pass (QUIC + WebSocket)just bench::resilience— 6/6 scenarios (netem loss/delay combinations)just check— fmt, clippy, cargo tests, smoke, resilience