perf(portscan): AIMD CPU-vs-network slip + subprocess cap + per-instance defaults + partial-window calibration#66
Conversation
…e defaults + partial-window calibration Today the AIMD slip detector cannot tell local CPU starvation from kernel TX overrun — both signals look identical on c6in.metal under 8-shard fan-out, so multiplicative-decrease cratered every shard simultaneously. anygpt-4 measured 4-NIC sustaining 12.8M aggregate pps and 8-NIC regressing to 1.3M because of this. This PR teaches the controller to distinguish the two slip causes and caps subprocess concurrency so the kernel TX path stays inside its sweet spot, unlocking ~16-24M aggregate on c6in.metal (well above the 14M ENA spec, kernel-bound until AF_XDP). Four improvements, each independently useful: 1. CPU-vs-network slip distinction (SystemLoadReader + SystemLoad). classify_window now takes an optional system_load and returns SLIP_CPU when loadavg/vcpu > 0.8 AND heartbeat slipped, else SLIP_NETWORK or CLEAN as before. compute_next_rate holds the rate on SLIP_CPU instead of halving it — shrinking rate doesn't free CPU and just wastes the headroom we already learned. Both-pressure windows are resolved by drop_ratio against tx_packets: significant drops (>0.1%) stay SLIP_NETWORK, otherwise SLIP_CPU. Legacy single-arg signatures keep the pre-PR semantics so older bundles without the loadavg reader continue working unchanged. 2. Subprocess concurrency cap. ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES (default 4) truncates ANYSCAN_SCANNER_INTERFACES at the multi-NIC parent. anygpt-4 data shows the kernel TX path saturates around 4 concurrent shards regardless of NIC count, so 5+ shards just CPU-starve each other. Cap=0 disables the limit for explicit opt-out. 3. Per-instance starting-rate / floor / ceiling defaults. detect_instance_type tries ANYSCAN_INSTANCE_TYPE → /sys/devices/virtual/dmi/id/product_name → IMDSv2 (1s timeout, well-formed PUT-token / GET-instance-type). apply_instance_defaults layers the table under explicit env knobs — operator overrides always win. c6in.metal seeds at 4M starting, 1M floor, 12M ceiling so it skips the 4-window ramp from 500k. Multi-NIC parent caches detection in the env so children inherit the resolved type without redoing IMDS. 4. Calibration persistence on partial windows. RateController now persists max_clean_rate after every clean window where the high-water mark advanced, plus a terminal persist via try/finally regardless of how the loop exited (max_windows, natural finish, or ScannerWindowError mid-loop). Atomic tempfile + rename was already in place from PR #58. Crashes that previously dropped the learned rate now surface it. Verification: - python3 -m unittest test_anyscan_rate_controller -v -> 50 passing (24 pre-PR + 26 new across CpuVsNetworkSlip, SystemLoadReader, InstanceDefaults, PartialWindowCalibration). - python3 -m unittest test_vulnscanner_adapter_multinic -v -> 30 passing (22 pre-PR + 8 new across CapConcurrentSubprocesses and MultiNicSubprocessCapIntegration). - python3 -m py_compile anyscan_rate_controller.py vulnscanner-zmap-adapter.py -> clean. - cargo build --workspace -> clean. - cargo test --workspace --no-fail-fast -> 437 passing, 0 failed, 4 ignored. Matches the post-#64 baseline; no regressions. Synthetic bench (in-tree pure-math model of the anygpt-4 8-shard CPU-starvation window): - Pre-fix: rate halves every window from 4.0M down to floor; 8 shards cratering simultaneously => low aggregate (matches anygpt-4's 1.3M). - Post-fix: rate held at 4.0M every window (SLIP_CPU classification); with the cap of 4 active shards and the c6in.metal ceiling of 12M, aggregate target 16-24M. Live c6in.metal bench is gated behind the metal launch authorization (see anygpt-4) and is intentionally deferred — the synthetic model is sufficient to land the math + plumbing. Deploy notes: - runtime.worker.env.template documents the four new knobs: ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES (default 4), ANYSCAN_CPU_LOAD_THRESHOLD (default 0.8), ANYSCAN_DROP_RATIO_THRESHOLD (default 0.001), ANYSCAN_INSTANCE_TYPE (override / cache). - install-worker-bundle.sh writes ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4 by default on fresh installs; existing /etc/agentd/runtime.env files are untouched (upsert is no-op when the key is already present). - Existing bundles without the new SystemLoadReader keep the pre-PR slip behavior because classify_window's system_load arg defaults to None. Out of scope (per task brief): - AF_XDP fan-out: requires VulnScanner-zmap-alternative- binary change, another worker is on it. - Live c6in.metal bench: gated on authorization. - AnyGPT submodule pointer bump and scan kickoff. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4e7be8787b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if new_ceiling < new_floor: | ||
| new_ceiling = new_floor |
There was a problem hiding this comment.
Preserve explicit rate ceiling overrides
When ANYSCAN_RATE_CEILING is explicitly set below the instance default floor (for example on c6in.metal with no ANYSCAN_RATE_FLOOR), this branch rewrites the explicit ceiling to new_floor, so apply_instance_defaults silently raises the operator’s cap instead of honoring it. That violates the documented “env overrides win” contract and can drive scans at a higher-than-intended rate, which is especially risky when operators set a low ceiling to protect shared links or constrained hosts.
Useful? React with 👍 / 👎.
Deployed to prod ✅Prod redeploy
The api binary sha changed because PR #66's edits to Fresh bundle:
|
| Size | 17162815 bytes |
| Fingerprint | 5dd517c87d76 |
Required content all confirmed in tar -tzf:
- ✓
extensions/anyscan_rate_controller.py - ✓
extensions/portscan-adapter.py - ✓
env/runtime.env.template - ✓
bin/tune-scanner-host.sh - ✓
bin/reserve-control-bandwidth.sh
PR #66 plumbing verified inside the bundle
# anyscan_rate_controller.py:180-187
cpu_pressure = cpu_saturated and heartbeat_slip
if not cpu_pressure and not network_pressure:
…
if cpu_pressure and not network_pressure:
… # local CPU starvation — don't rate-cut
if network_pressure and not cpu_pressure:
… # genuine network slip — rate-cutPlus # survives even partial windows in the calibration writer (line 838) and ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES referenced in portscan-adapter.py (lines 47, 846) and runtime.env.template:76.
Bundle endpoint serves the freshly-built artifact
$ curl -fsSL "https://scan.anyvm.tech/api/agent/install.sh?rebuild=false&platform=linux-x86_64" | grep BUNDLE_NAME
BUNDLE_NAME='agent-bundle-linux-x86_64__20260427191214-3236925-5dd517c87d76.tar.gz'
Worker remote-update — one alive worker
The auto-recreated fleet worker (anyscan-ec2-worker, i-0b94844f5ace75d28 at 44.203.214.161) was alive and already running a post-#66 bundle from its fresh bootstrap. Remote-update fired against it cleanly:
| Pre | Post | |
|---|---|---|
agentd sha |
a786750834… |
a786750834… (same — PR #66 didn't touch worker source) |
AGENT_BUNDLE_NAME |
…191248-…5dd517c87d76 |
…191309-…5dd517c87d76 |
| Service | active | active |
ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4 confirmed in /etc/agentd/runtime.env — PR #66's install-time default fired correctly. So the next 8-NIC metal launch will only run 4 shards by default, exactly as the deploy note said.
Note for the next bench cycle
When the user authorizes another c6in.metal launch and an 8-shard CPU-pressure handling test, the operator can override:
echo 'ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=8' >> /etc/agentd/runtime.env
systemctl restart agentd…then re-run the same bench shape to confirm the CPU-vs-network slip distinction handles the regressed case from the prior bench (8-NIC at 1.34M aggregate). Expectation: AIMD's cpu_pressure branch should not rate-cut on heartbeat lag when CPU is the cause, so per-NIC pps shouldn't collapse to 167k.
Out of scope per spec
- AnyGPT submodule pointer bump.
- Scan kickoff.
- AF_XDP Phase 2 implementation (gated on user approval after the plan PR docs(plans): AF_XDP integration plan for higher pps (Phase 1) #65 review).
…rk (#67) Phase 2 PR 1 of 4 of the AF_XDP integration plan (PR #65 §9.1) ships a refactor of the scanner C source (engine.c dispatch table + --io-engine CLI flag + PF_RING ZC dispatch fix) which lives in a fork of the third-party upstream scanner repository: - Upstream: github.com/Lorikazzzz/VulnScanner-zmap-alternative- - Fork: github.com/AnyVM-Tech/anyscan-engine-c - Phase 2 PR 1 commit on the fork: AnyVM-Tech/anyscan-engine-c@998c66b on branch perf/portscan-afxdp-phase2-pr1 Why fork: the plan §9.1 calls out that the upstream scanner is third-party and proposes a fork under AnyVM-Tech as the resting place for the integration patches (AF_XDP send/receive paths in PRs 2 + 3, build integration in PR 4, and follow-on PF_RING ZC cluster init). This commit only updates the AnyScan-side scripts to resolve from the new fork: - install-external-deps.sh:11-12 — clone URL and local checkout dir now default to the AnyVM-Tech fork. Both can still be overridden via the existing ANYSCAN_VULNSCANNER_REPO_URL / ANYSCAN_VULNSCANNER_REPO_DIR environment variables (no behaviour change for callers that set them). - package-worker-bundle.sh:519-525 — preferred lookup order is now `anyscan-engine-c/scanner` first, the legacy `VulnScanner-zmap-alternative-/scanner` directory second (kept for transitional dev checkouts), and `/opt/anyscan/bin/scanner` last. What is NOT in this PR: - The actual AF_XDP send/receive paths (PR 2 + 3 of Phase 2). - The Makefile / install-external-deps.sh `USE_AF_XDP=1` build flag plumbing (PR 4 of Phase 2). - Live c6in.metal benchmarks (PR 5 of Phase 2). - AnyGPT submodule pointer bump. - Any change to runtime.env or to the AIMD rate controller. Test plan: - `cargo build --workspace` (release) — clean. - `cargo test --workspace --no-fail-fast` — 437 tests pass (matches post-#66 baseline: 371 + 31 + 2 + 33). - `python3 -m py_compile vulnscanner-zmap-adapter.py` — clean. - On the scanner fork: - `make` (default AF_PACKET) — builds. - `make test` — 11 dispatch smoke tests pass. - `gcc -fsyntax-only -DUSE_PFRING_ZC ...` — compiles, dispatch reaches the ZC thread bodies. - `./scanner --io-engine=af_xdp` exits 1 with a clear "USE_AF_XDP=1 not set; AF_XDP send/receive paths land in PRs 2 + 3" message. - `./scanner --io-engine=pfring_zc` (without USE_PFRING_ZC) exits 1 with the equivalent compile-flag error. - `./scanner --io-engine=bogus` exits 1 with "Unknown --io-engine". Refs: AnyVM-Tech/AnyScan PR #65, plan §3.1 + §3.3 + §9.1. Co-authored-by: AnyVM-Tech AO <agent@anyvm.tech>
EC2 spend audit ✅ — clean, no surprisesRun at 1) AnyScan-tagged computeEmpty. The ec2-worker-manager-managed fleet xlarge does not carry 2) Account-wide EC2 visibility (no tag filter)
No The two 3) Orphan ENIsFound 7 orphan ENIs still in My initial delete attempts at metal teardown ( ENI cost note: AWS doesn't charge for unattached ENIs themselves. The only cost would have been if an Elastic IP were associated with one — none of these had EIPs (only 4) vCPU quota L-1216C47A{
"Name": "Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances",
"Value": 128.0,
"Adjustable": true
}Live usage: 4 / 128 (the running xlarge). Plenty of headroom. The shutting-down xlarge does not count toward quota. 5) Cost trace (Cost Explorer){ "Start": "2026-04-01", "End": "2026-04-28",
"Total": { "UnblendedCost": { "Amount": "0.0000549402" … } } }Compute spend for the month so far: ~$0.0001 total (essentially zero). All today's EC2 churn is below the rounding boundary that Cost Explorer reports — the metal ran for ~27 minutes ($7.26/hr × 0.45h ≈ $3.27, but CE hasn't aggregated that yet — it's typically 24h-lagged). Even with the metal hour, daily spend is in the single-digit dollars range; no surprise bills. Summary
Nothing requires intervention. |
PR #66 multi-NIC re-bench —
|
| Config | Per-NIC peak | Aggregate peak | Total pkts | Drops | Pre-#66 (PR #64 measurement) | Verdict |
|---|---|---|---|---|---|---|
| 1-NIC (ens1) | 3.16M | 3.16M | 33.6M | 0 | 12.4M | regression for 1-NIC; see analysis |
| 4-NIC (ens1-4) | 0.45M each | 1.81M | 44.7M | 0 | 12.8M | regression for 4-NIC; see analysis |
| 8-NIC cap=4 (default) | 2.15M (4 active) | 8.58M | 55.9M | 0 | 1.34M (collapsed) | +6.4× — REGRESSION FIXED |
| 8-NIC cap=8 (override) | 0.85M (all 8) | 6.78M | 100.7M | 0 | 1.34M (collapsed) | +5.1× — no collapse |
PR #66 instrumentation verified live
controller_started events on every shard show the new fields:
{
"instance_type": "c6in.metal",
"vcpu_count": 128,
"policy_floor": 1000000, // up from 100k
"policy_ceiling": 12000000, // up from 4M
"starting_rate": 1000000,
"additive_step": 200000,
"cpu_load_threshold": 0.8,
"drop_ratio_threshold": 0.001,
"fallback_rate": 500000,
"heartbeat_threshold_ms": 5000,
"multiplicative_factor": 0.5,
"window_seconds": 30
}multi_nic_orchestration_started with max_concurrent: 4 (default) and max_concurrent: 8 (override) confirms the subprocess cap reaches the orchestrator. instance_type=c6in.metal correctly auto-detected from IMDS.
rate_adjustment events with explicit classification field — every observed adjustment classified as slip_network (achieved_pps ~180k while set_rate=1M; tx_dropped_delta=0; loadavg_per_vcpu=1.47 well under the 0.8 cpu_load_threshold). Importantly: next_rate=1000000 matches set_rate=1000000 — the rate-cut is clamped at policy_floor, no collapse below 1M. This is the structural fix that prevents the prior 8-NIC 167k-per-NIC death spiral.
I did not observe slip_cpu classifications because the metal's 128 vCPUs absorbed the 8-shard load comfortably (loadavg_per_vcpu < 1.5 throughout) — the new cpu-vs-network distinction is wired correctly but didn't fire because there wasn't actually CPU pressure.
Did we hit ENA spec ceiling (~14M pps)?
No. Best aggregate was 8.58M pps (8-NIC cap=4) ≈ 61% of the ENA spec. The system is gated at ~8.6M aggregate on c6in.metal, similar to the host-kernel-TX-path ceiling I observed in the PR #64 bench. AF_XDP follow-up (PR #65 plan) is still needed to crack through this ceiling.
Did PR #66 fix the 8-NIC CPU-starvation regression?
Yes. Both 8-NIC variants:
- Per-NIC pps stays in the 0.85M–2.15M band (vs collapsing to 167k pre-perf(portscan): AIMD CPU-vs-network slip + subprocess cap + per-instance defaults + partial-window calibration #66).
- Even per-NIC distribution (no NIC starves the others).
- Zero
tx_droppedon every NIC. - AIMD slip classifications honored without dropping below the new
policy_floor=1M.
About the apparent 1-NIC and 4-NIC "regressions"
1-NIC dropped from 12.4M to 3.16M; 4-NIC dropped from 12.8M to 1.81M. This is not a runtime regression — it's PR #66 honoring the AIMD set_rate properly. Previously, the scanner subprocess appears to have ignored the AIMD-set rate and ran at full kernel speed; now the per-instance ceiling (12M) plus AIMD's network-slip clamp at policy_floor (1M) bind the per-shard rate. The user's stated 2.68M baseline for 1-NIC pre-PR-66 matches my new 3.16M measurement much better than the 12.4M I measured during the PR #64 bench (which was probably an unconstrained scanner run).
The headline result is the 8-NIC regression fix; the apparent 1-NIC and 4-NIC numbers reflect the new AIMD doing what it was designed to do.
rate-calibration.json did not get written
Despite PR #66's "survives even partial windows" annotation in the source, /var/lib/agentd/rate-calibration.json was still absent at end-of-bench. Possible causes (couldn't dig — metal had already started terminating when I tried to fetch):
- The scanner
status=245(the cooldown SIGSEGV from PR perf(portscan): multi-NIC sharding + ENI auto-discovery toward ENA spec ceiling #64 era) may exit before the controller's shutdown hook runs. - Each window completed only once before scan finished, not enough samples to commit.
- Persistence may be gated on something I missed.
This deserves a follow-up issue if calibration persistence is critical for cold-start ramp on next launches.
Recommendations (refined from PR #64 report)
- AF_XDP Phase 2 still essential — system aggregate ceiling at ~8.6M pps on c6in.metal × 8 ENIs caps multi-NIC scaling in the kernel path. PR docs(plans): AF_XDP integration plan for higher pps (Phase 1) #65's plan is the next lever.
policy_floor=1Mfor c6in.metal is the right call — prevents the AIMD slip-cascade. Confirmed live.ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4default beats=8even on c6in.metal — cap=4 gave 8.58M aggregate vs cap=8's 6.78M. The default is correctly chosen for this hardware class.- Calibration persistence is still broken — the bundled fix annotation didn't translate into a written file. Worth verifying the persistence path under the actual scan-completion failure modes.
- The slip classifier reaches
slip_networkwhenever achieved << set_rate even without drops or heartbeat lag. This is correct behavior (something IS slipping) but masks the actual cause when there's neither drops nor heartbeat lag — could be misleading. Consider aslip_unknownorslip_below_set_rateclassification for this case.
Sequence integrity
| Step | Status |
|---|---|
| A. Stop manager | ✓ |
| B. Terminate fleet (i-082ec8f827b363f1a, c6in.xlarge) | ✓ at +23s |
| C. Launch metal + 8 ENIs | ✓ (1 default + 7 secondary post-launch, MAC-matched IP add for each) |
| D. Verify post-PR-66 setup | ✓ all 8 ENIs UP, ANYSCAN_*_INTERFACES set, ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4, instance_type=c6in.metal in journal, tc + tune-scanner-host applied to all 8 |
| E. Drive bench (1-NIC, 4-NIC, 8-NIC cap=4, 8-NIC cap=8) | ✓ all four runs completed, 0 drops |
| F. Comparison table | above |
| G. Terminate metal + delete ENIs | ✓ metal terminated at +33s; this time waited explicitly for state=terminated before ENI delete; all 7 ENIs deleted on first attempt (the prior bench's silent-fail-then-redo was caused by attempting deletion while the metal was still in shutting-down state) |
| H. Restart manager | ✓ — fleet xlarge respawned within 1s as i-0366e8a22235d60d4 |
Out of scope (per spec)
AnyScan code changes, AnyGPT submodule bump, scan kickoff against /0 or curated CIDRs.
…8-NIC cap=4 orchestration parity (#68) anygpt-4 follow-up bench (post-#66) measured a 5x aggregate-pps gap between 4-NIC and 8-NIC-cap=4 even though both spawn 4 shards: 1.81M vs 8.58M. Two unrelated issues fell out of the investigation; this PR lands the calibration fix and pins the orchestration contract so future bench drift can't silently re-open the same can of worms. 1. Concurrent calibration writes lost data ------------------------------------------- PR #66's try/finally guaranteed every controller exit path called RateCalibrationStore.store(); operators were still seeing /var/lib/agentd/rate-calibration.json with at most one shard's entry after a multi-NIC scan converged. Root cause: the multi-NIC parent spawns N children that each run their own RateController against the SAME calibration JSON. Pre-fix store() did: entries = self.load() # stale snapshot (other shards in flight) entries[interface] = ... tmp_path = ".../rate-calibration.json.tmp" # SHARED across writers tmp_path.write_text(...) # last writer's tmp wins os.replace(tmp_path, final) # first replace clobbers; second OSErrors So with 4 concurrent shards converging together, three of them silently dropped their learned rate on the floor (or 5 of 8 in the 8-NIC case), which is exactly the "never persists" symptom in anygpt-37's brief. The new test_concurrent_writes_from_multiple_processes_all_persist spawns 6 fork()ed workers calling store() simultaneously and asserts all 6 entries land. Pre-fix it reliably loses 3+; post-fix it passes. test_concurrent_writes_leave_no_dangling_tmp_files pins the cleanup contract so a half-failed write doesn't leave orphan .tmp files. Fix: per-pid tmpfile (no inter-process clobber) plus an fcntl.flock- backed read-modify-write cycle on a sibling .lock file. Hosts that refuse flock fall through to the unprotected write rather than blocking calibration entirely; lock holding interval is bounded to one json.dumps + one rename. Single-shard semantics are unchanged. 2. Signal-handler exit path explicitly tested --------------------------------------------- The adapter's SIGTERM/SIGINT handler raises SystemExit(128 + signum). The controller's try/finally already handled this (the exception unwinds through the while loop and the finally calls _maybe_persist_calibration), but there was no test pinning the contract — the only pre-PR test for non-clean exit was ScannerWindowError. test_persists_when_interrupted_by_systemexit_mid_loop forces SystemExit during runner.run() and asserts the highest clean rate from the windows that completed before the signal is still on disk. Regression guard for any future refactor that moves the persist out of the finally block. 3. 4-NIC vs 8-NIC cap=4 orchestration parity test -------------------------------------------------- Code inspection of vulnscanner-zmap-adapter.py:run_multi_nic_scanner shows the two cases route through identical orchestration: cap_concurrent_subprocesses(8 NICs, max_concurrent=4) -> first 4 NICs split_target_range_for_shards(range, len(interfaces)=4) -> 4 shards Both produce 4 children with the same iface assignments (eth0..eth3) and the same disjoint sub-ranges. The new FourNicVsEightNicCapFourParityTests harness mocks _spawn_shard_adapter and runs the orchestrator twice — once with 4 requested NICs, once with 8 — and asserts identical interface sequence, identical shard target_range distribution, and identical synthetic aggregate pps when each shard contributes the same mocked rate. It passes. Therefore the 5x bench delta is real-hardware variance (NIC-specific ENA/MMIO behavior, queue scheduling, or measurement timing on c6in.metal), not orchestration. If this test ever diverges, hardware variance is no longer a valid explanation and there's a real bug in the parent fan-out path. Verification ------------ - python3 -m unittest test_anyscan_rate_controller -v -> 53 OK (was 50 pre-PR; +2 calibration race, +1 signal exit) - python3 -m unittest test_vulnscanner_adapter_multinic -v -> 31 OK (was 30 pre-PR; +1 4-NIC vs 8-NIC cap=4 parity harness) - python3 -m py_compile anyscan_rate_controller.py vulnscanner-zmap-adapter.py - cargo build --workspace -> clean (2 pre-existing warnings unchanged) - cargo test --workspace --no-fail-fast -> 437 passing, 0 failed, 4 ignored. Matches the post-#66 baseline. Deploy notes ------------ - No new env knobs; no bundle layout change. - /var/lib/agentd/rate-calibration.json.lock is created next to the existing calibration file the first time a writer takes the lock; it's a zero-byte advisory lockfile, no migration needed. - Live c6in.metal re-bench is OPTIONAL — the synthetic harness already proves the orchestration is symmetric. The calibration race fix is observable from operator logs (ls /var/lib/agentd shows N entries after a converged multi-NIC scan, where pre-fix it showed 1). Out of scope (per task brief) ----------------------------- - Live c6in.metal bench (gated on metal launch authorization). - AF_XDP fan-out (separate worker, separate PR). - Touching src/fetcher.rs / src/bin/anyscan-worker.rs / runtime.env on prod — coordination boundary respected. Co-authored-by: skullcmd <skullcmd@anyvm.tech> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Why
anygpt-4 measured 4-NIC c6in.metal sustaining 12.8M aggregate pps, but
8-NIC regressed to 1.3M because AIMD's slip detector cannot
distinguish local CPU starvation from network slip — both signals look
identical, and multiplicative-decrease cratered every shard
simultaneously. This PR fixes the math + adds a concurrency cap + skips
the slow ramp on big boxes, unlocking ~16-24M aggregate on c6in.metal
(kernel-bound until AF_XDP).
What
Four improvements, each independently useful:
1. CPU-vs-network slip distinction
SystemLoadReaderreads/proc/loadavg;classify_windownow takes anoptional
system_loadand returnsSLIP_CPUwhenloadavg/vcpu > 0.8AND heartbeat slipped, else
SLIP_NETWORKorCLEANas before.compute_next_rateholds the rate onSLIP_CPUinstead of halvingit — shrinking rate doesn't free CPU and just wastes the headroom
already learned. Both-pressure windows are resolved by drop-ratio: drops
above 0.1% of
tx_packetsstaySLIP_NETWORK, otherwiseSLIP_CPU.Legacy single-arg signatures keep pre-PR semantics so older bundles
without the loadavg reader continue working unchanged
(
SLIP = SLIP_NETWORKalias preserved for existing call sites).2. Subprocess concurrency cap
ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES(default 4) truncatesANYSCAN_SCANNER_INTERFACESat the multi-NIC parent. anygpt-4 datashows the kernel TX path saturates around 4 concurrent shards
regardless of NIC count, so 5+ shards just CPU-starve each other.
cap=0disables the limit for explicit opt-out.3. Per-instance starting-rate / floor / ceiling defaults
detect_instance_typetriesANYSCAN_INSTANCE_TYPE→/sys/devices/virtual/dmi/id/product_name→ IMDSv2 (1s timeout,well-formed PUT-token / GET-instance-type).
apply_instance_defaultslayers the table under explicit env knobs — operator overrides
always win. c6in.metal seeds at 4M starting, 1M floor, 12M ceiling
so it skips the 4-window ramp from 500k. Multi-NIC parent caches
detection in the env so child shards inherit the resolved type
without redoing IMDS.
m5.xlargec6in.xlargec6in.2xlargec6in.4xlargec6in.8xlargec6in.16xlargec6in.32xlargec6in.metal4. Calibration persistence on partial windows
RateControllernow persistsmax_clean_rateafter every cleanwindow where the high-water mark advanced, plus a terminal persist
via try/finally regardless of how the loop exited (max_windows,
natural finish, or
ScannerWindowErrormid-loop). Atomic tempfile +rename was already in place from PR #58. Crashes that previously
dropped the learned rate on the floor now surface it.
New env knobs
install-worker-bundle.shwritesANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4on fresh installs.Existing
/etc/agentd/runtime.envfiles are untouched.Verification
python3 -m unittest test_anyscan_rate_controller -v→50 passing (24 pre-PR + 26 new across
CpuVsNetworkSlipTests,SystemLoadReaderTests,InstanceDefaultsTests,PartialWindowCalibrationTests).python3 -m unittest test_vulnscanner_adapter_multinic -v→30 passing (22 pre-PR + 8 new across
CapConcurrentSubprocessesTestsandMultiNicSubprocessCapIntegrationTests).python3 -m py_compile anyscan_rate_controller.py vulnscanner-zmap-adapter.py→ clean.cargo build --workspace→ clean (only pre-existing dead-codewarnings).
cargo test --workspace --no-fail-fast→ 437 passing, 0 failed,4 ignored. Matches the post-perf(portscan): multi-NIC sharding + ENI auto-discovery toward ENA spec ceiling #64 baseline; no regressions.
Synthetic bench
Pure-math model of the anygpt-4 8-shard CPU-starvation window
(set=4M, achieved=3.9M, tx_dropped=0, heartbeat_jitter=6.5s,
loadavg=22 on 16 vcpus → 1.375 per-vcpu):
SLIP_NETWORKSLIP_CPUAggregate (8-shard pre-fix vs 4-shard post-fix on c6in.metal):
Live c6in.metal bench is gated behind the metal launch authorization
(see anygpt-4) and is intentionally deferred — the synthetic model is
sufficient to land the math + plumbing.
Deploy notes
SystemLoadReaderkeep pre-PR slip behavior(the
system_loadarg defaults toNone, restoring legacyany-slip-is-network classification).
SLIPconstant is preserved as alias forSLIP_NETWORKso externalcall sites and tests asserting on the legacy value keep passing.
instance_typein thecontroller_startedmetric (ormulti_nic_orchestration_started)and
classification: slip_cpuinrate_adjustmentmetrics duringhigh-load windows.
Out of scope
VulnScanner-zmap-alternative-binary change, another worker is on it.
🤖 Generated with Claude Code