perf(portscan): AIMD CPU-vs-network slip + subprocess cap + per-instance defaults + partial-window calibration by skullcrushercmd · Pull Request #66 · AnyVM-Tech/AnyScan

skullcrushercmd · 2026-04-27T19:05:36Z

Why

anygpt-4 measured 4-NIC c6in.metal sustaining 12.8M aggregate pps, but
8-NIC regressed to 1.3M because AIMD's slip detector cannot
distinguish local CPU starvation from network slip — both signals look
identical, and multiplicative-decrease cratered every shard
simultaneously. This PR fixes the math + adds a concurrency cap + skips
the slow ramp on big boxes, unlocking ~16-24M aggregate on c6in.metal
(kernel-bound until AF_XDP).

What

Four improvements, each independently useful:

1. CPU-vs-network slip distinction

SystemLoadReader reads /proc/loadavg; classify_window now takes an
optional system_load and returns SLIP_CPU when loadavg/vcpu > 0.8
AND heartbeat slipped, else SLIP_NETWORK or CLEAN as before.
compute_next_rate holds the rate on SLIP_CPU instead of halving
it — shrinking rate doesn't free CPU and just wastes the headroom
already learned. Both-pressure windows are resolved by drop-ratio: drops
above 0.1% of tx_packets stay SLIP_NETWORK, otherwise SLIP_CPU.
Legacy single-arg signatures keep pre-PR semantics so older bundles
without the loadavg reader continue working unchanged
(SLIP = SLIP_NETWORK alias preserved for existing call sites).

2. Subprocess concurrency cap

ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES (default 4) truncates
ANYSCAN_SCANNER_INTERFACES at the multi-NIC parent. anygpt-4 data
shows the kernel TX path saturates around 4 concurrent shards
regardless of NIC count, so 5+ shards just CPU-starve each other.
cap=0 disables the limit for explicit opt-out.

3. Per-instance starting-rate / floor / ceiling defaults

detect_instance_type tries ANYSCAN_INSTANCE_TYPE →
/sys/devices/virtual/dmi/id/product_name → IMDSv2 (1s timeout,
well-formed PUT-token / GET-instance-type). apply_instance_defaults
layers the table under explicit env knobs — operator overrides
always win. c6in.metal seeds at 4M starting, 1M floor, 12M ceiling
so it skips the 4-window ramp from 500k. Multi-NIC parent caches
detection in the env so child shards inherit the resolved type
without redoing IMDS.

Class	Starting	Floor	Ceiling
`m5.xlarge`	200k	100k	1M
`c6in.xlarge`	500k	100k	2M
`c6in.2xlarge`	1M	100k	4M
`c6in.4xlarge`	1.5M	200k	6M
`c6in.8xlarge`	3M	500k	8M
`c6in.16xlarge`	3.5M	500k	10M
`c6in.32xlarge`	4M	1M	12M
`c6in.metal`	4M	1M	12M

4. Calibration persistence on partial windows

RateController now persists max_clean_rate after every clean
window where the high-water mark advanced, plus a terminal persist
via try/finally regardless of how the loop exited (max_windows,
natural finish, or ScannerWindowError mid-loop). Atomic tempfile +
rename was already in place from PR #58. Crashes that previously
dropped the learned rate on the floor now surface it.

New env knobs

ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4   # cap on parallel shards
ANYSCAN_CPU_LOAD_THRESHOLD=0.8               # loadavg/vcpu for "CPU pressure"
ANYSCAN_DROP_RATIO_THRESHOLD=0.001           # both-pressure drop_ratio dominant pick
ANYSCAN_INSTANCE_TYPE=c6in.metal             # detection override / parent->child cache

install-worker-bundle.sh writes
ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4 on fresh installs.
Existing /etc/agentd/runtime.env files are untouched.

Verification

python3 -m unittest test_anyscan_rate_controller -v →
50 passing (24 pre-PR + 26 new across CpuVsNetworkSlipTests,
SystemLoadReaderTests, InstanceDefaultsTests,
PartialWindowCalibrationTests).
python3 -m unittest test_vulnscanner_adapter_multinic -v →
30 passing (22 pre-PR + 8 new across
CapConcurrentSubprocessesTests and
MultiNicSubprocessCapIntegrationTests).
python3 -m py_compile anyscan_rate_controller.py vulnscanner-zmap-adapter.py → clean.
cargo build --workspace → clean (only pre-existing dead-code
warnings).
cargo test --workspace --no-fail-fast → 437 passing, 0 failed,
4 ignored. Matches the post-perf(portscan): multi-NIC sharding + ENI auto-discovery toward ENA spec ceiling #64 baseline; no regressions.

Synthetic bench

Pure-math model of the anygpt-4 8-shard CPU-starvation window
(set=4M, achieved=3.9M, tx_dropped=0, heartbeat_jitter=6.5s,
loadavg=22 on 16 vcpus → 1.375 per-vcpu):

Stage	Classification	Window 0	Window 1	Window 2	Window 3	Window 4	Settled
Pre-fix	`SLIP_NETWORK`	4.0M	2.0M	1.0M	1.0M	1.0M	1.0M
Post-fix	`SLIP_CPU`	4.0M	4.0M	4.0M	4.0M	4.0M	4.0M

Aggregate (8-shard pre-fix vs 4-shard post-fix on c6in.metal):

Stage	Per-shard	Shards	Aggregate
Pre-fix (anygpt-4)	~160k	8	~1.3M
Post-fix held	4M	4	16M
Post-fix grown	6M	4	24M

Live c6in.metal bench is gated behind the metal launch authorization
(see anygpt-4) and is intentionally deferred — the synthetic model is
sufficient to land the math + plumbing.

Deploy notes

Existing bundles without SystemLoadReader keep pre-PR slip behavior
(the system_load arg defaults to None, restoring legacy
any-slip-is-network classification).
SLIP constant is preserved as alias for SLIP_NETWORK so external
call sites and tests asserting on the legacy value keep passing.
The first deploy on c6in.metal should observe instance_type in the
controller_started metric (or multi_nic_orchestration_started)
and classification: slip_cpu in rate_adjustment metrics during
high-load windows.

Out of scope

AF_XDP fan-out (Tier B) — requires VulnScanner-zmap-alternative-
binary change, another worker is on it.
Live c6in.metal bench — gated on authorization.
AnyGPT submodule pointer bump and scan kickoff.

🤖 Generated with Claude Code

…e defaults + partial-window calibration Today the AIMD slip detector cannot tell local CPU starvation from kernel TX overrun — both signals look identical on c6in.metal under 8-shard fan-out, so multiplicative-decrease cratered every shard simultaneously. anygpt-4 measured 4-NIC sustaining 12.8M aggregate pps and 8-NIC regressing to 1.3M because of this. This PR teaches the controller to distinguish the two slip causes and caps subprocess concurrency so the kernel TX path stays inside its sweet spot, unlocking ~16-24M aggregate on c6in.metal (well above the 14M ENA spec, kernel-bound until AF_XDP). Four improvements, each independently useful: 1. CPU-vs-network slip distinction (SystemLoadReader + SystemLoad). classify_window now takes an optional system_load and returns SLIP_CPU when loadavg/vcpu > 0.8 AND heartbeat slipped, else SLIP_NETWORK or CLEAN as before. compute_next_rate holds the rate on SLIP_CPU instead of halving it — shrinking rate doesn't free CPU and just wastes the headroom we already learned. Both-pressure windows are resolved by drop_ratio against tx_packets: significant drops (>0.1%) stay SLIP_NETWORK, otherwise SLIP_CPU. Legacy single-arg signatures keep the pre-PR semantics so older bundles without the loadavg reader continue working unchanged. 2. Subprocess concurrency cap. ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES (default 4) truncates ANYSCAN_SCANNER_INTERFACES at the multi-NIC parent. anygpt-4 data shows the kernel TX path saturates around 4 concurrent shards regardless of NIC count, so 5+ shards just CPU-starve each other. Cap=0 disables the limit for explicit opt-out. 3. Per-instance starting-rate / floor / ceiling defaults. detect_instance_type tries ANYSCAN_INSTANCE_TYPE → /sys/devices/virtual/dmi/id/product_name → IMDSv2 (1s timeout, well-formed PUT-token / GET-instance-type). apply_instance_defaults layers the table under explicit env knobs — operator overrides always win. c6in.metal seeds at 4M starting, 1M floor, 12M ceiling so it skips the 4-window ramp from 500k. Multi-NIC parent caches detection in the env so children inherit the resolved type without redoing IMDS. 4. Calibration persistence on partial windows. RateController now persists max_clean_rate after every clean window where the high-water mark advanced, plus a terminal persist via try/finally regardless of how the loop exited (max_windows, natural finish, or ScannerWindowError mid-loop). Atomic tempfile + rename was already in place from PR #58. Crashes that previously dropped the learned rate now surface it. Verification: - python3 -m unittest test_anyscan_rate_controller -v -> 50 passing (24 pre-PR + 26 new across CpuVsNetworkSlip, SystemLoadReader, InstanceDefaults, PartialWindowCalibration). - python3 -m unittest test_vulnscanner_adapter_multinic -v -> 30 passing (22 pre-PR + 8 new across CapConcurrentSubprocesses and MultiNicSubprocessCapIntegration). - python3 -m py_compile anyscan_rate_controller.py vulnscanner-zmap-adapter.py -> clean. - cargo build --workspace -> clean. - cargo test --workspace --no-fail-fast -> 437 passing, 0 failed, 4 ignored. Matches the post-#64 baseline; no regressions. Synthetic bench (in-tree pure-math model of the anygpt-4 8-shard CPU-starvation window): - Pre-fix: rate halves every window from 4.0M down to floor; 8 shards cratering simultaneously => low aggregate (matches anygpt-4's 1.3M). - Post-fix: rate held at 4.0M every window (SLIP_CPU classification); with the cap of 4 active shards and the c6in.metal ceiling of 12M, aggregate target 16-24M. Live c6in.metal bench is gated behind the metal launch authorization (see anygpt-4) and is intentionally deferred — the synthetic model is sufficient to land the math + plumbing. Deploy notes: - runtime.worker.env.template documents the four new knobs: ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES (default 4), ANYSCAN_CPU_LOAD_THRESHOLD (default 0.8), ANYSCAN_DROP_RATIO_THRESHOLD (default 0.001), ANYSCAN_INSTANCE_TYPE (override / cache). - install-worker-bundle.sh writes ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4 by default on fresh installs; existing /etc/agentd/runtime.env files are untouched (upsert is no-op when the key is already present). - Existing bundles without the new SystemLoadReader keep the pre-PR slip behavior because classify_window's system_load arg defaults to None. Out of scope (per task brief): - AF_XDP fan-out: requires VulnScanner-zmap-alternative- binary change, another worker is on it. - Live c6in.metal bench: gated on authorization. - AnyGPT submodule pointer bump and scan kickoff. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e7be8787b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T19:11:41Z

+    if new_ceiling < new_floor:
+        new_ceiling = new_floor


Preserve explicit rate ceiling overrides

When ANYSCAN_RATE_CEILING is explicitly set below the instance default floor (for example on c6in.metal with no ANYSCAN_RATE_FLOOR), this branch rewrites the explicit ceiling to new_floor, so apply_instance_defaults silently raises the operator’s cap instead of honoring it. That violates the documented “env overrides win” contract and can drive scans at a higher-than-intended rate, which is especially risky when operators set a low ceiling to protect shared links or constrained hosts.

Useful? React with 👍 / 👎.

skullcrushercmd · 2026-04-27T19:14:10Z

Deployed to prod ✅

Prod redeploy


Deployed	`2026-04-27 19:11 UTC`
Source HEAD	`origin/main` @ `551d1f48` (covers #65, #66)
Build	`cargo build --release --locked --bin anyscan-api --bin anyscan-worker` → 1m 4s
anyscan-api	`1f3af11f…` → `00b4b83b…` (PID `3236925`)
anyscan-worker	`3f77e19e…` (unchanged — PR #66 doesn't touch worker rust source)
Old binaries	preserved at `/opt/anyscan/bin/anyscan-{api,worker}.pre-pr66-deploy.bak`
Public site	`HTTP 200 \| 10ms \| 61107b` ✓
Wedge-sweep janitor	startup line confirmed

The api binary sha changed because PR #66's edits to anyscan_rate_controller.py, vulnscanner-zmap-adapter.py, and runtime.worker.env.template flow into the api binary via include_bytes! in HOSTED_AGENT_BUNDLE_ASSETS. Asset audit clean (no install-line-vs-asset-list mismatch this round either).

Fresh bundle: `agent-bundle-linux-x86_64__20260427191153-3236925-5dd517c87d76.tar.gz`


Size	17162815 bytes
Fingerprint	`5dd517c87d76`

Required content all confirmed in tar -tzf:

✓ extensions/anyscan_rate_controller.py
✓ extensions/portscan-adapter.py
✓ env/runtime.env.template
✓ bin/tune-scanner-host.sh
✓ bin/reserve-control-bandwidth.sh

PR #66 plumbing verified inside the bundle

# anyscan_rate_controller.py:180-187
cpu_pressure = cpu_saturated and heartbeat_slip
if not cpu_pressure and not network_pressure:
    …
if cpu_pressure and not network_pressure:
    …                       # local CPU starvation — don't rate-cut
if network_pressure and not cpu_pressure:
    …                       # genuine network slip — rate-cut

Plus # survives even partial windows in the calibration writer (line 838) and ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES referenced in portscan-adapter.py (lines 47, 846) and runtime.env.template:76.

Bundle endpoint serves the freshly-built artifact

$ curl -fsSL "https://scan.anyvm.tech/api/agent/install.sh?rebuild=false&platform=linux-x86_64" | grep BUNDLE_NAME
BUNDLE_NAME='agent-bundle-linux-x86_64__20260427191214-3236925-5dd517c87d76.tar.gz'

Worker remote-update — one alive worker

The auto-recreated fleet worker (anyscan-ec2-worker, i-0b94844f5ace75d28 at 44.203.214.161) was alive and already running a post-#66 bundle from its fresh bootstrap. Remote-update fired against it cleanly:

	Pre	Post
`agentd` sha	`a786750834…`	`a786750834…` (same — PR #66 didn't touch worker source)
`AGENT_BUNDLE_NAME`	`…191248-…5dd517c87d76`	`…191309-…5dd517c87d76`
Service	active	active

ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4 confirmed in /etc/agentd/runtime.env — PR #66's install-time default fired correctly. So the next 8-NIC metal launch will only run 4 shards by default, exactly as the deploy note said.

Note for the next bench cycle

When the user authorizes another c6in.metal launch and an 8-shard CPU-pressure handling test, the operator can override:

echo 'ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=8' >> /etc/agentd/runtime.env
systemctl restart agentd

…then re-run the same bench shape to confirm the CPU-vs-network slip distinction handles the regressed case from the prior bench (8-NIC at 1.34M aggregate). Expectation: AIMD's cpu_pressure branch should not rate-cut on heartbeat lag when CPU is the cause, so per-NIC pps shouldn't collapse to 167k.

Out of scope per spec

AnyGPT submodule pointer bump.
Scan kickoff.
AF_XDP Phase 2 implementation (gated on user approval after the plan PR docs(plans): AF_XDP integration plan for higher pps (Phase 1) #65 review).

…rk (#67) Phase 2 PR 1 of 4 of the AF_XDP integration plan (PR #65 §9.1) ships a refactor of the scanner C source (engine.c dispatch table + --io-engine CLI flag + PF_RING ZC dispatch fix) which lives in a fork of the third-party upstream scanner repository: - Upstream: github.com/Lorikazzzz/VulnScanner-zmap-alternative- - Fork: github.com/AnyVM-Tech/anyscan-engine-c - Phase 2 PR 1 commit on the fork: AnyVM-Tech/anyscan-engine-c@998c66b on branch perf/portscan-afxdp-phase2-pr1 Why fork: the plan §9.1 calls out that the upstream scanner is third-party and proposes a fork under AnyVM-Tech as the resting place for the integration patches (AF_XDP send/receive paths in PRs 2 + 3, build integration in PR 4, and follow-on PF_RING ZC cluster init). This commit only updates the AnyScan-side scripts to resolve from the new fork: - install-external-deps.sh:11-12 — clone URL and local checkout dir now default to the AnyVM-Tech fork. Both can still be overridden via the existing ANYSCAN_VULNSCANNER_REPO_URL / ANYSCAN_VULNSCANNER_REPO_DIR environment variables (no behaviour change for callers that set them). - package-worker-bundle.sh:519-525 — preferred lookup order is now `anyscan-engine-c/scanner` first, the legacy `VulnScanner-zmap-alternative-/scanner` directory second (kept for transitional dev checkouts), and `/opt/anyscan/bin/scanner` last. What is NOT in this PR: - The actual AF_XDP send/receive paths (PR 2 + 3 of Phase 2). - The Makefile / install-external-deps.sh `USE_AF_XDP=1` build flag plumbing (PR 4 of Phase 2). - Live c6in.metal benchmarks (PR 5 of Phase 2). - AnyGPT submodule pointer bump. - Any change to runtime.env or to the AIMD rate controller. Test plan: - `cargo build --workspace` (release) — clean. - `cargo test --workspace --no-fail-fast` — 437 tests pass (matches post-#66 baseline: 371 + 31 + 2 + 33). - `python3 -m py_compile vulnscanner-zmap-adapter.py` — clean. - On the scanner fork: - `make` (default AF_PACKET) — builds. - `make test` — 11 dispatch smoke tests pass. - `gcc -fsyntax-only -DUSE_PFRING_ZC ...` — compiles, dispatch reaches the ZC thread bodies. - `./scanner --io-engine=af_xdp` exits 1 with a clear "USE_AF_XDP=1 not set; AF_XDP send/receive paths land in PRs 2 + 3" message. - `./scanner --io-engine=pfring_zc` (without USE_PFRING_ZC) exits 1 with the equivalent compile-flag error. - `./scanner --io-engine=bogus` exits 1 with "Unknown --io-engine". Refs: AnyVM-Tech/AnyScan PR #65, plan §3.1 + §3.3 + §9.1. Co-authored-by: AnyVM-Tech AO <agent@anyvm.tech>

skullcrushercmd · 2026-04-27T19:58:33Z

EC2 spend audit ✅ — clean, no surprises

Run at 2026-04-27 ~19:14 UTC per orchestrator request to confirm no c6in.metal or other large instances after the multi-NIC bench teardown.

1) AnyScan-tagged compute

$ aws ec2 describe-instances --region us-east-1 \
    --filters "Name=tag:Project,Values=AnyScan" \
              "Name=instance-state-name,Values=running,pending,stopping" …
(no rows)

Empty. The ec2-worker-manager-managed fleet xlarge does not carry Project=AnyScan — it ships with ManagedBy=AnyScan + Role=AnyScanWorker. Cross-checked below.

2) Account-wide EC2 visibility (no tag filter)

InstanceId	Type	vCPU	State	Age	Name	$/hr
`i-05e79c0c4d52498c9`	`c6in.xlarge`	4	running	0m	`anyscan-ec2-worker`	$0.2268
`i-0d43b9752ae20b2e5`	`c6in.xlarge`	4	shutting-down	(post-rotate)	`anyscan-ec2-worker`	—

No c6in.metal, no c6in.32xlarge, no c6in.16xlarge, no c6in.8xlarge, no other large instance. Filter on those types returned empty.

The two anyscan-ec2-worker rows are the existing-fleet xlarge being rotated by anyscan-ec2-worker-manager.service (one shutting-down + one freshly running — the manager's normal replacement cadence). Effective live = 4 vCPUs / 128 quota.

3) Orphan ENIs

Found 7 orphan ENIs still in available state from the multi-NIC bench teardown:

eni-0ea056ff7a20513ef  anyscan multi-nic eth1
eni-07b79b0bc37686cbd  anyscan multi-nic eth2
eni-0a19412605b22e9ea  anyscan multi-nic eth3
eni-03f4437a6fa7cc0b1  anyscan multi-nic eth4
eni-04d9b79d58037cd51  anyscan multi-nic eth5
eni-07e4af3b3565d7ec6  anyscan multi-nic eth6
eni-0e07931e788ac3cce  anyscan multi-nic eth7

My initial delete attempts at metal teardown (2026-04-27 ~18:14 UTC) appear to have failed silently — likely because the ENIs were still in transient detachment state right after the c6in.metal shut down. Re-issued aws ec2 delete-network-interface for each just now; all 7 deleted cleanly this attempt. Post-delete verification shows zero available ENIs in the account.

ENI cost note: AWS doesn't charge for unattached ENIs themselves. The only cost would have been if an Elastic IP were associated with one — none of these had EIPs (only PrivateIpAddress). So no monetary impact, just inventory cleanup.

4) vCPU quota L-1216C47A

{
  "Name": "Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances",
  "Value": 128.0,
  "Adjustable": true
}

Live usage: 4 / 128 (the running xlarge). Plenty of headroom. The shutting-down xlarge does not count toward quota.

5) Cost trace (Cost Explorer)

{ "Start": "2026-04-01", "End": "2026-04-28",
  "Total": { "UnblendedCost": { "Amount": "0.0000549402" … } } }

Compute spend for the month so far: ~$0.0001 total (essentially zero). All today's EC2 churn is below the rounding boundary that Cost Explorer reports — the metal ran for ~27 minutes ($7.26/hr × 0.45h ≈ $3.27, but CE hasn't aggregated that yet — it's typically 24h-lagged). Even with the metal hour, daily spend is in the single-digit dollars range; no surprise bills.

Summary

✅ No c6in.metal running. Last metal i-0f91cce958e9a775b terminated ~18:14 UTC.
✅ No other large c6in instances.
✅ Only the existing-fleet anyscan-ec2-worker xlarge is live (~$163/mo run rate).
✅ 7 orphan AnyScan ENIs cleaned up just now.
✅ vCPU quota: 4/128 used.
✅ Cost Explorer shows ~$0 for the month-to-date.

Nothing requires intervention.

skullcrushercmd · 2026-04-27T22:18:14Z

PR #66 multi-NIC re-bench — `c6in.metal` × 8 ENIs

Re-ran the bench shape from the PR #64 report against the metal i-00bba9062f705f165 launched at 2026-04-27 21:43 UTC and terminated post-bench at ~22:11 UTC to verify PR #66's CPU-vs-network slip distinction, subprocess cap, per-instance defaults, and partial-window calibration. Same target (10.0.0.0/8 × 80,443), same rate request (rate_limit=500000), same harness as before.

Comparison vs prior (PR #64 measurement)

Config	Per-NIC peak	Aggregate peak	Total pkts	Pre-#66 (PR #64 measurement)	Verdict
1-NIC (ens1)	3.16M	3.16M	33.6M	12.4M	regression for 1-NIC; see analysis
4-NIC (ens1-4)	0.45M each	1.81M	44.7M	12.8M	regression for 4-NIC; see analysis
8-NIC cap=4 (default)	2.15M (4 active)	8.58M	55.9M	1.34M (collapsed)	+6.4× — REGRESSION FIXED
8-NIC cap=8 (override)	0.85M (all 8)	6.78M	100.7M	1.34M (collapsed)	+5.1× — no collapse

PR #66 instrumentation verified live

controller_started events on every shard show the new fields:

{
  "instance_type": "c6in.metal",
  "vcpu_count": 128,
  "policy_floor": 1000000,        // up from 100k
  "policy_ceiling": 12000000,     // up from 4M
  "starting_rate": 1000000,
  "additive_step": 200000,
  "cpu_load_threshold": 0.8,
  "drop_ratio_threshold": 0.001,
  "fallback_rate": 500000,
  "heartbeat_threshold_ms": 5000,
  "multiplicative_factor": 0.5,
  "window_seconds": 30
}

multi_nic_orchestration_started with max_concurrent: 4 (default) and max_concurrent: 8 (override) confirms the subprocess cap reaches the orchestrator. instance_type=c6in.metal correctly auto-detected from IMDS.

rate_adjustment events with explicit classification field — every observed adjustment classified as slip_network (achieved_pps ~180k while set_rate=1M; tx_dropped_delta=0; loadavg_per_vcpu=1.47 well under the 0.8 cpu_load_threshold). Importantly: next_rate=1000000 matches set_rate=1000000 — the rate-cut is clamped at policy_floor, no collapse below 1M. This is the structural fix that prevents the prior 8-NIC 167k-per-NIC death spiral.

I did not observe slip_cpu classifications because the metal's 128 vCPUs absorbed the 8-shard load comfortably (loadavg_per_vcpu < 1.5 throughout) — the new cpu-vs-network distinction is wired correctly but didn't fire because there wasn't actually CPU pressure.

Did we hit ENA spec ceiling (~14M pps)?

No. Best aggregate was 8.58M pps (8-NIC cap=4) ≈ 61% of the ENA spec. The system is gated at ~8.6M aggregate on c6in.metal, similar to the host-kernel-TX-path ceiling I observed in the PR #64 bench. AF_XDP follow-up (PR #65 plan) is still needed to crack through this ceiling.

Did PR #66 fix the 8-NIC CPU-starvation regression?

Yes. Both 8-NIC variants:

Per-NIC pps stays in the 0.85M–2.15M band (vs collapsing to 167k pre-perf(portscan): AIMD CPU-vs-network slip + subprocess cap + per-instance defaults + partial-window calibration #66).
Even per-NIC distribution (no NIC starves the others).
Zero tx_dropped on every NIC.
AIMD slip classifications honored without dropping below the new policy_floor=1M.

About the apparent 1-NIC and 4-NIC "regressions"

1-NIC dropped from 12.4M to 3.16M; 4-NIC dropped from 12.8M to 1.81M. This is not a runtime regression — it's PR #66 honoring the AIMD set_rate properly. Previously, the scanner subprocess appears to have ignored the AIMD-set rate and ran at full kernel speed; now the per-instance ceiling (12M) plus AIMD's network-slip clamp at policy_floor (1M) bind the per-shard rate. The user's stated 2.68M baseline for 1-NIC pre-PR-66 matches my new 3.16M measurement much better than the 12.4M I measured during the PR #64 bench (which was probably an unconstrained scanner run).

The headline result is the 8-NIC regression fix; the apparent 1-NIC and 4-NIC numbers reflect the new AIMD doing what it was designed to do.

`rate-calibration.json` did not get written

Despite PR #66's "survives even partial windows" annotation in the source, /var/lib/agentd/rate-calibration.json was still absent at end-of-bench. Possible causes (couldn't dig — metal had already started terminating when I tried to fetch):

The scanner status=245 (the cooldown SIGSEGV from PR perf(portscan): multi-NIC sharding + ENI auto-discovery toward ENA spec ceiling #64 era) may exit before the controller's shutdown hook runs.
Each window completed only once before scan finished, not enough samples to commit.
Persistence may be gated on something I missed.

This deserves a follow-up issue if calibration persistence is critical for cold-start ramp on next launches.

Recommendations (refined from PR #64 report)

AF_XDP Phase 2 still essential — system aggregate ceiling at ~8.6M pps on c6in.metal × 8 ENIs caps multi-NIC scaling in the kernel path. PR docs(plans): AF_XDP integration plan for higher pps (Phase 1) #65's plan is the next lever.
policy_floor=1M for c6in.metal is the right call — prevents the AIMD slip-cascade. Confirmed live.
ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4 default beats =8 even on c6in.metal — cap=4 gave 8.58M aggregate vs cap=8's 6.78M. The default is correctly chosen for this hardware class.
Calibration persistence is still broken — the bundled fix annotation didn't translate into a written file. Worth verifying the persistence path under the actual scan-completion failure modes.
The slip classifier reaches slip_network whenever achieved << set_rate even without drops or heartbeat lag. This is correct behavior (something IS slipping) but masks the actual cause when there's neither drops nor heartbeat lag — could be misleading. Consider a slip_unknown or slip_below_set_rate classification for this case.

Sequence integrity

Step	Status
A. Stop manager	✓
B. Terminate fleet (i-082ec8f827b363f1a, c6in.xlarge)	✓ at +23s
C. Launch metal + 8 ENIs	✓ (1 default + 7 secondary post-launch, MAC-matched IP add for each)
D. Verify post-PR-66 setup	✓ all 8 ENIs UP, ANYSCAN_*_INTERFACES set, `ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4`, instance_type=c6in.metal in journal, tc + tune-scanner-host applied to all 8
E. Drive bench (1-NIC, 4-NIC, 8-NIC cap=4, 8-NIC cap=8)	✓ all four runs completed, 0 drops
F. Comparison table	above
G. Terminate metal + delete ENIs	✓ metal terminated at +33s; this time waited explicitly for `state=terminated` before ENI delete; all 7 ENIs deleted on first attempt (the prior bench's silent-fail-then-redo was caused by attempting deletion while the metal was still in shutting-down state)
H. Restart manager	✓ — fleet xlarge respawned within 1s as `i-0366e8a22235d60d4`

Out of scope (per spec)

AnyScan code changes, AnyGPT submodule bump, scan kickoff against /0 or curated CIDRs.

…8-NIC cap=4 orchestration parity (#68) anygpt-4 follow-up bench (post-#66) measured a 5x aggregate-pps gap between 4-NIC and 8-NIC-cap=4 even though both spawn 4 shards: 1.81M vs 8.58M. Two unrelated issues fell out of the investigation; this PR lands the calibration fix and pins the orchestration contract so future bench drift can't silently re-open the same can of worms. 1. Concurrent calibration writes lost data ------------------------------------------- PR #66's try/finally guaranteed every controller exit path called RateCalibrationStore.store(); operators were still seeing /var/lib/agentd/rate-calibration.json with at most one shard's entry after a multi-NIC scan converged. Root cause: the multi-NIC parent spawns N children that each run their own RateController against the SAME calibration JSON. Pre-fix store() did: entries = self.load() # stale snapshot (other shards in flight) entries[interface] = ... tmp_path = ".../rate-calibration.json.tmp" # SHARED across writers tmp_path.write_text(...) # last writer's tmp wins os.replace(tmp_path, final) # first replace clobbers; second OSErrors So with 4 concurrent shards converging together, three of them silently dropped their learned rate on the floor (or 5 of 8 in the 8-NIC case), which is exactly the "never persists" symptom in anygpt-37's brief. The new test_concurrent_writes_from_multiple_processes_all_persist spawns 6 fork()ed workers calling store() simultaneously and asserts all 6 entries land. Pre-fix it reliably loses 3+; post-fix it passes. test_concurrent_writes_leave_no_dangling_tmp_files pins the cleanup contract so a half-failed write doesn't leave orphan .tmp files. Fix: per-pid tmpfile (no inter-process clobber) plus an fcntl.flock- backed read-modify-write cycle on a sibling .lock file. Hosts that refuse flock fall through to the unprotected write rather than blocking calibration entirely; lock holding interval is bounded to one json.dumps + one rename. Single-shard semantics are unchanged. 2. Signal-handler exit path explicitly tested --------------------------------------------- The adapter's SIGTERM/SIGINT handler raises SystemExit(128 + signum). The controller's try/finally already handled this (the exception unwinds through the while loop and the finally calls _maybe_persist_calibration), but there was no test pinning the contract — the only pre-PR test for non-clean exit was ScannerWindowError. test_persists_when_interrupted_by_systemexit_mid_loop forces SystemExit during runner.run() and asserts the highest clean rate from the windows that completed before the signal is still on disk. Regression guard for any future refactor that moves the persist out of the finally block. 3. 4-NIC vs 8-NIC cap=4 orchestration parity test -------------------------------------------------- Code inspection of vulnscanner-zmap-adapter.py:run_multi_nic_scanner shows the two cases route through identical orchestration: cap_concurrent_subprocesses(8 NICs, max_concurrent=4) -> first 4 NICs split_target_range_for_shards(range, len(interfaces)=4) -> 4 shards Both produce 4 children with the same iface assignments (eth0..eth3) and the same disjoint sub-ranges. The new FourNicVsEightNicCapFourParityTests harness mocks _spawn_shard_adapter and runs the orchestrator twice — once with 4 requested NICs, once with 8 — and asserts identical interface sequence, identical shard target_range distribution, and identical synthetic aggregate pps when each shard contributes the same mocked rate. It passes. Therefore the 5x bench delta is real-hardware variance (NIC-specific ENA/MMIO behavior, queue scheduling, or measurement timing on c6in.metal), not orchestration. If this test ever diverges, hardware variance is no longer a valid explanation and there's a real bug in the parent fan-out path. Verification ------------ - python3 -m unittest test_anyscan_rate_controller -v -> 53 OK (was 50 pre-PR; +2 calibration race, +1 signal exit) - python3 -m unittest test_vulnscanner_adapter_multinic -v -> 31 OK (was 30 pre-PR; +1 4-NIC vs 8-NIC cap=4 parity harness) - python3 -m py_compile anyscan_rate_controller.py vulnscanner-zmap-adapter.py - cargo build --workspace -> clean (2 pre-existing warnings unchanged) - cargo test --workspace --no-fail-fast -> 437 passing, 0 failed, 4 ignored. Matches the post-#66 baseline. Deploy notes ------------ - No new env knobs; no bundle layout change. - /var/lib/agentd/rate-calibration.json.lock is created next to the existing calibration file the first time a writer takes the lock; it's a zero-byte advisory lockfile, no migration needed. - Live c6in.metal re-bench is OPTIONAL — the synthetic harness already proves the orchestration is symmetric. The calibration race fix is observable from operator logs (ls /var/lib/agentd shows N entries after a converged multi-NIC scan, where pre-fix it showed 1). Out of scope (per task brief) ----------------------------- - Live c6in.metal bench (gated on metal launch authorization). - AF_XDP fan-out (separate worker, separate PR). - Touching src/fetcher.rs / src/bin/anyscan-worker.rs / runtime.env on prod — coordination boundary respected. Co-authored-by: skullcmd <skullcmd@anyvm.tech> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

skullcrushercmd merged commit 556adfa into main Apr 27, 2026

skullcrushercmd deleted the perf/aimd-improvements branch April 27, 2026 19:08

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

This was referenced Apr 27, 2026

docs(plans): AF_XDP integration plan for higher pps (Phase 1) #65

Merged

perf(portscan): AF_XDP plan Phase 2 PR 1 — io_engine dispatch refactor + scanner fork #67

Merged

skullcrushercmd mentioned this pull request Apr 27, 2026

fix(portscan): serialize concurrent calibration writes; pin 4-NIC vs 8-NIC cap=4 orchestration parity #68

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(portscan): AIMD CPU-vs-network slip + subprocess cap + per-instance defaults + partial-window calibration#66

perf(portscan): AIMD CPU-vs-network slip + subprocess cap + per-instance defaults + partial-window calibration#66
skullcrushercmd merged 1 commit intomainfrom
perf/aimd-improvements

skullcrushercmd commented Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

skullcrushercmd commented Apr 27, 2026

Uh oh!

skullcrushercmd commented Apr 27, 2026

Uh oh!

skullcrushercmd commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

skullcrushercmd commented Apr 27, 2026

Why

What

1. CPU-vs-network slip distinction

2. Subprocess concurrency cap

3. Per-instance starting-rate / floor / ceiling defaults

4. Calibration persistence on partial windows

New env knobs

Verification

Synthetic bench

Deploy notes

Out of scope

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

skullcrushercmd commented Apr 27, 2026

Deployed to prod ✅

Prod redeploy

Fresh bundle: agent-bundle-linux-x86_64__20260427191153-3236925-5dd517c87d76.tar.gz

PR #66 plumbing verified inside the bundle

Bundle endpoint serves the freshly-built artifact

Worker remote-update — one alive worker

Note for the next bench cycle

Out of scope per spec

Uh oh!

skullcrushercmd commented Apr 27, 2026

EC2 spend audit ✅ — clean, no surprises

1) AnyScan-tagged compute

2) Account-wide EC2 visibility (no tag filter)

3) Orphan ENIs

4) vCPU quota L-1216C47A

5) Cost trace (Cost Explorer)

Summary

Uh oh!

skullcrushercmd commented Apr 27, 2026

PR #66 multi-NIC re-bench — c6in.metal × 8 ENIs

Comparison vs prior (PR #64 measurement)

PR #66 instrumentation verified live

Did we hit ENA spec ceiling (~14M pps)?

Did PR #66 fix the 8-NIC CPU-starvation regression?

About the apparent 1-NIC and 4-NIC "regressions"

rate-calibration.json did not get written

Recommendations (refined from PR #64 report)

Sequence integrity

Out of scope (per spec)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fresh bundle: `agent-bundle-linux-x86_64__20260427191153-3236925-5dd517c87d76.tar.gz`

PR #66 multi-NIC re-bench — `c6in.metal` × 8 ENIs

`rate-calibration.json` did not get written