Skip to content

perf(portscan): AIMD CPU-vs-network slip + subprocess cap + per-instance defaults + partial-window calibration#66

Merged
skullcrushercmd merged 1 commit intomainfrom
perf/aimd-improvements
Apr 27, 2026
Merged

perf(portscan): AIMD CPU-vs-network slip + subprocess cap + per-instance defaults + partial-window calibration#66
skullcrushercmd merged 1 commit intomainfrom
perf/aimd-improvements

Conversation

@skullcrushercmd
Copy link
Copy Markdown
Contributor

Why

anygpt-4 measured 4-NIC c6in.metal sustaining 12.8M aggregate pps, but
8-NIC regressed to 1.3M because AIMD's slip detector cannot
distinguish local CPU starvation from network slip — both signals look
identical, and multiplicative-decrease cratered every shard
simultaneously. This PR fixes the math + adds a concurrency cap + skips
the slow ramp on big boxes, unlocking ~16-24M aggregate on c6in.metal
(kernel-bound until AF_XDP).

What

Four improvements, each independently useful:

1. CPU-vs-network slip distinction

SystemLoadReader reads /proc/loadavg; classify_window now takes an
optional system_load and returns SLIP_CPU when loadavg/vcpu > 0.8
AND heartbeat slipped, else SLIP_NETWORK or CLEAN as before.
compute_next_rate holds the rate on SLIP_CPU instead of halving
it — shrinking rate doesn't free CPU and just wastes the headroom
already learned. Both-pressure windows are resolved by drop-ratio: drops
above 0.1% of tx_packets stay SLIP_NETWORK, otherwise SLIP_CPU.
Legacy single-arg signatures keep pre-PR semantics so older bundles
without the loadavg reader continue working unchanged
(SLIP = SLIP_NETWORK alias preserved for existing call sites).

2. Subprocess concurrency cap

ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES (default 4) truncates
ANYSCAN_SCANNER_INTERFACES at the multi-NIC parent. anygpt-4 data
shows the kernel TX path saturates around 4 concurrent shards
regardless of NIC count, so 5+ shards just CPU-starve each other.
cap=0 disables the limit for explicit opt-out.

3. Per-instance starting-rate / floor / ceiling defaults

detect_instance_type tries ANYSCAN_INSTANCE_TYPE
/sys/devices/virtual/dmi/id/product_name → IMDSv2 (1s timeout,
well-formed PUT-token / GET-instance-type). apply_instance_defaults
layers the table under explicit env knobs — operator overrides
always win
. c6in.metal seeds at 4M starting, 1M floor, 12M ceiling
so it skips the 4-window ramp from 500k. Multi-NIC parent caches
detection in the env so child shards inherit the resolved type
without redoing IMDS.

Class Starting Floor Ceiling
m5.xlarge 200k 100k 1M
c6in.xlarge 500k 100k 2M
c6in.2xlarge 1M 100k 4M
c6in.4xlarge 1.5M 200k 6M
c6in.8xlarge 3M 500k 8M
c6in.16xlarge 3.5M 500k 10M
c6in.32xlarge 4M 1M 12M
c6in.metal 4M 1M 12M

4. Calibration persistence on partial windows

RateController now persists max_clean_rate after every clean
window
where the high-water mark advanced, plus a terminal persist
via try/finally
regardless of how the loop exited (max_windows,
natural finish, or ScannerWindowError mid-loop). Atomic tempfile +
rename was already in place from PR #58. Crashes that previously
dropped the learned rate on the floor now surface it.

New env knobs

ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4   # cap on parallel shards
ANYSCAN_CPU_LOAD_THRESHOLD=0.8               # loadavg/vcpu for "CPU pressure"
ANYSCAN_DROP_RATIO_THRESHOLD=0.001           # both-pressure drop_ratio dominant pick
ANYSCAN_INSTANCE_TYPE=c6in.metal             # detection override / parent->child cache

install-worker-bundle.sh writes
ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4 on fresh installs.
Existing /etc/agentd/runtime.env files are untouched.

Verification

  • python3 -m unittest test_anyscan_rate_controller -v
    50 passing (24 pre-PR + 26 new across CpuVsNetworkSlipTests,
    SystemLoadReaderTests, InstanceDefaultsTests,
    PartialWindowCalibrationTests).
  • python3 -m unittest test_vulnscanner_adapter_multinic -v
    30 passing (22 pre-PR + 8 new across
    CapConcurrentSubprocessesTests and
    MultiNicSubprocessCapIntegrationTests).
  • python3 -m py_compile anyscan_rate_controller.py vulnscanner-zmap-adapter.py → clean.
  • cargo build --workspace → clean (only pre-existing dead-code
    warnings).
  • cargo test --workspace --no-fail-fast437 passing, 0 failed,
    4 ignored. Matches the post-perf(portscan): multi-NIC sharding + ENI auto-discovery toward ENA spec ceiling #64 baseline; no regressions.

Synthetic bench

Pure-math model of the anygpt-4 8-shard CPU-starvation window
(set=4M, achieved=3.9M, tx_dropped=0, heartbeat_jitter=6.5s,
loadavg=22 on 16 vcpus → 1.375 per-vcpu):

Stage Classification Window 0 Window 1 Window 2 Window 3 Window 4 Settled
Pre-fix SLIP_NETWORK 4.0M 2.0M 1.0M 1.0M 1.0M 1.0M
Post-fix SLIP_CPU 4.0M 4.0M 4.0M 4.0M 4.0M 4.0M

Aggregate (8-shard pre-fix vs 4-shard post-fix on c6in.metal):

Stage Per-shard Shards Aggregate
Pre-fix (anygpt-4) ~160k 8 ~1.3M
Post-fix held 4M 4 16M
Post-fix grown 6M 4 24M

Live c6in.metal bench is gated behind the metal launch authorization
(see anygpt-4) and is intentionally deferred — the synthetic model is
sufficient to land the math + plumbing.

Deploy notes

  • Existing bundles without SystemLoadReader keep pre-PR slip behavior
    (the system_load arg defaults to None, restoring legacy
    any-slip-is-network classification).
  • SLIP constant is preserved as alias for SLIP_NETWORK so external
    call sites and tests asserting on the legacy value keep passing.
  • The first deploy on c6in.metal should observe instance_type in the
    controller_started metric (or multi_nic_orchestration_started)
    and classification: slip_cpu in rate_adjustment metrics during
    high-load windows.

Out of scope

  • AF_XDP fan-out (Tier B) — requires VulnScanner-zmap-alternative-
    binary change, another worker is on it.
  • Live c6in.metal bench — gated on authorization.
  • AnyGPT submodule pointer bump and scan kickoff.

🤖 Generated with Claude Code

…e defaults + partial-window calibration

Today the AIMD slip detector cannot tell local CPU starvation from kernel
TX overrun — both signals look identical on c6in.metal under 8-shard
fan-out, so multiplicative-decrease cratered every shard simultaneously.
anygpt-4 measured 4-NIC sustaining 12.8M aggregate pps and 8-NIC
regressing to 1.3M because of this. This PR teaches the controller to
distinguish the two slip causes and caps subprocess concurrency so the
kernel TX path stays inside its sweet spot, unlocking ~16-24M aggregate
on c6in.metal (well above the 14M ENA spec, kernel-bound until AF_XDP).

Four improvements, each independently useful:

1. CPU-vs-network slip distinction (SystemLoadReader + SystemLoad).
   classify_window now takes an optional system_load and returns
   SLIP_CPU when loadavg/vcpu > 0.8 AND heartbeat slipped, else
   SLIP_NETWORK or CLEAN as before. compute_next_rate holds the rate
   on SLIP_CPU instead of halving it — shrinking rate doesn't free
   CPU and just wastes the headroom we already learned. Both-pressure
   windows are resolved by drop_ratio against tx_packets: significant
   drops (>0.1%) stay SLIP_NETWORK, otherwise SLIP_CPU. Legacy
   single-arg signatures keep the pre-PR semantics so older bundles
   without the loadavg reader continue working unchanged.

2. Subprocess concurrency cap. ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES
   (default 4) truncates ANYSCAN_SCANNER_INTERFACES at the multi-NIC
   parent. anygpt-4 data shows the kernel TX path saturates around
   4 concurrent shards regardless of NIC count, so 5+ shards just
   CPU-starve each other. Cap=0 disables the limit for explicit opt-out.

3. Per-instance starting-rate / floor / ceiling defaults. detect_instance_type
   tries ANYSCAN_INSTANCE_TYPE → /sys/devices/virtual/dmi/id/product_name →
   IMDSv2 (1s timeout, well-formed PUT-token / GET-instance-type).
   apply_instance_defaults layers the table under explicit env knobs —
   operator overrides always win. c6in.metal seeds at 4M starting,
   1M floor, 12M ceiling so it skips the 4-window ramp from 500k.
   Multi-NIC parent caches detection in the env so children inherit
   the resolved type without redoing IMDS.

4. Calibration persistence on partial windows. RateController now
   persists max_clean_rate after every clean window where the high-water
   mark advanced, plus a terminal persist via try/finally regardless
   of how the loop exited (max_windows, natural finish, or
   ScannerWindowError mid-loop). Atomic tempfile + rename was already
   in place from PR #58. Crashes that previously dropped the learned
   rate now surface it.

Verification:
- python3 -m unittest test_anyscan_rate_controller -v
  -> 50 passing (24 pre-PR + 26 new across CpuVsNetworkSlip,
  SystemLoadReader, InstanceDefaults, PartialWindowCalibration).
- python3 -m unittest test_vulnscanner_adapter_multinic -v
  -> 30 passing (22 pre-PR + 8 new across CapConcurrentSubprocesses
  and MultiNicSubprocessCapIntegration).
- python3 -m py_compile anyscan_rate_controller.py
  vulnscanner-zmap-adapter.py -> clean.
- cargo build --workspace -> clean.
- cargo test --workspace --no-fail-fast -> 437 passing, 0 failed,
  4 ignored. Matches the post-#64 baseline; no regressions.

Synthetic bench (in-tree pure-math model of the anygpt-4 8-shard
CPU-starvation window):
- Pre-fix: rate halves every window from 4.0M down to floor; 8 shards
  cratering simultaneously => low aggregate (matches anygpt-4's 1.3M).
- Post-fix: rate held at 4.0M every window (SLIP_CPU classification);
  with the cap of 4 active shards and the c6in.metal ceiling of 12M,
  aggregate target 16-24M.
Live c6in.metal bench is gated behind the metal launch authorization
(see anygpt-4) and is intentionally deferred — the synthetic model is
sufficient to land the math + plumbing.

Deploy notes:
- runtime.worker.env.template documents the four new knobs:
  ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES (default 4),
  ANYSCAN_CPU_LOAD_THRESHOLD (default 0.8),
  ANYSCAN_DROP_RATIO_THRESHOLD (default 0.001),
  ANYSCAN_INSTANCE_TYPE (override / cache).
- install-worker-bundle.sh writes
  ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4 by default on fresh
  installs; existing /etc/agentd/runtime.env files are untouched
  (upsert is no-op when the key is already present).
- Existing bundles without the new SystemLoadReader keep the
  pre-PR slip behavior because classify_window's system_load arg
  defaults to None.

Out of scope (per task brief):
- AF_XDP fan-out: requires VulnScanner-zmap-alternative- binary change,
  another worker is on it.
- Live c6in.metal bench: gated on authorization.
- AnyGPT submodule pointer bump and scan kickoff.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@skullcrushercmd skullcrushercmd merged commit 556adfa into main Apr 27, 2026
@skullcrushercmd skullcrushercmd deleted the perf/aimd-improvements branch April 27, 2026 19:08
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4e7be8787b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +661 to +662
if new_ceiling < new_floor:
new_ceiling = new_floor
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve explicit rate ceiling overrides

When ANYSCAN_RATE_CEILING is explicitly set below the instance default floor (for example on c6in.metal with no ANYSCAN_RATE_FLOOR), this branch rewrites the explicit ceiling to new_floor, so apply_instance_defaults silently raises the operator’s cap instead of honoring it. That violates the documented “env overrides win” contract and can drive scans at a higher-than-intended rate, which is especially risky when operators set a low ceiling to protect shared links or constrained hosts.

Useful? React with 👍 / 👎.

@skullcrushercmd
Copy link
Copy Markdown
Contributor Author

Deployed to prod ✅

Prod redeploy

Deployed `2026-04-27 19:11 UTC`
Source HEAD `origin/main` @ `551d1f48` (covers #65, #66)
Build `cargo build --release --locked --bin anyscan-api --bin anyscan-worker` → 1m 4s
anyscan-api `1f3af11f…` → `00b4b83b…` (PID 3236925)
anyscan-worker `3f77e19e…` (unchanged — PR #66 doesn't touch worker rust source)
Old binaries preserved at `/opt/anyscan/bin/anyscan-{api,worker}.pre-pr66-deploy.bak`
Public site `HTTP 200 | 10ms | 61107b` ✓
Wedge-sweep janitor startup line confirmed

The api binary sha changed because PR #66's edits to anyscan_rate_controller.py, vulnscanner-zmap-adapter.py, and runtime.worker.env.template flow into the api binary via include_bytes! in HOSTED_AGENT_BUNDLE_ASSETS. Asset audit clean (no install-line-vs-asset-list mismatch this round either).

Fresh bundle: agent-bundle-linux-x86_64__20260427191153-3236925-5dd517c87d76.tar.gz

Size 17162815 bytes
Fingerprint 5dd517c87d76

Required content all confirmed in tar -tzf:

  • extensions/anyscan_rate_controller.py
  • extensions/portscan-adapter.py
  • env/runtime.env.template
  • bin/tune-scanner-host.sh
  • bin/reserve-control-bandwidth.sh

PR #66 plumbing verified inside the bundle

# anyscan_rate_controller.py:180-187
cpu_pressure = cpu_saturated and heartbeat_slip
if not cpu_pressure and not network_pressure:
    …
if cpu_pressure and not network_pressure:
    …                       # local CPU starvation — don't rate-cut
if network_pressure and not cpu_pressure:
    …                       # genuine network slip — rate-cut

Plus # survives even partial windows in the calibration writer (line 838) and ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES referenced in portscan-adapter.py (lines 47, 846) and runtime.env.template:76.

Bundle endpoint serves the freshly-built artifact

$ curl -fsSL "https://scan.anyvm.tech/api/agent/install.sh?rebuild=false&platform=linux-x86_64" | grep BUNDLE_NAME
BUNDLE_NAME='agent-bundle-linux-x86_64__20260427191214-3236925-5dd517c87d76.tar.gz'

Worker remote-update — one alive worker

The auto-recreated fleet worker (anyscan-ec2-worker, i-0b94844f5ace75d28 at 44.203.214.161) was alive and already running a post-#66 bundle from its fresh bootstrap. Remote-update fired against it cleanly:

Pre Post
agentd sha a786750834… a786750834… (same — PR #66 didn't touch worker source)
AGENT_BUNDLE_NAME …191248-…5dd517c87d76 …191309-…5dd517c87d76
Service active active

ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4 confirmed in /etc/agentd/runtime.env — PR #66's install-time default fired correctly. So the next 8-NIC metal launch will only run 4 shards by default, exactly as the deploy note said.

Note for the next bench cycle

When the user authorizes another c6in.metal launch and an 8-shard CPU-pressure handling test, the operator can override:

echo 'ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=8' >> /etc/agentd/runtime.env
systemctl restart agentd

…then re-run the same bench shape to confirm the CPU-vs-network slip distinction handles the regressed case from the prior bench (8-NIC at 1.34M aggregate). Expectation: AIMD's cpu_pressure branch should not rate-cut on heartbeat lag when CPU is the cause, so per-NIC pps shouldn't collapse to 167k.

Out of scope per spec

skullcrushercmd added a commit that referenced this pull request Apr 27, 2026
…rk (#67)

Phase 2 PR 1 of 4 of the AF_XDP integration plan (PR #65 §9.1) ships a
refactor of the scanner C source (engine.c dispatch table + --io-engine
CLI flag + PF_RING ZC dispatch fix) which lives in a fork of the third-party
upstream scanner repository:

  - Upstream:        github.com/Lorikazzzz/VulnScanner-zmap-alternative-
  - Fork:            github.com/AnyVM-Tech/anyscan-engine-c
  - Phase 2 PR 1 commit on the fork:
      AnyVM-Tech/anyscan-engine-c@998c66b on
      branch perf/portscan-afxdp-phase2-pr1

Why fork: the plan §9.1 calls out that the upstream scanner is third-party
and proposes a fork under AnyVM-Tech as the resting place for the
integration patches (AF_XDP send/receive paths in PRs 2 + 3, build
integration in PR 4, and follow-on PF_RING ZC cluster init).

This commit only updates the AnyScan-side scripts to resolve from the new
fork:

  - install-external-deps.sh:11-12 — clone URL and local checkout dir now
    default to the AnyVM-Tech fork. Both can still be overridden via the
    existing ANYSCAN_VULNSCANNER_REPO_URL / ANYSCAN_VULNSCANNER_REPO_DIR
    environment variables (no behaviour change for callers that set them).
  - package-worker-bundle.sh:519-525 — preferred lookup order is now
    `anyscan-engine-c/scanner` first, the legacy
    `VulnScanner-zmap-alternative-/scanner` directory second (kept for
    transitional dev checkouts), and `/opt/anyscan/bin/scanner` last.

What is NOT in this PR:
  - The actual AF_XDP send/receive paths (PR 2 + 3 of Phase 2).
  - The Makefile / install-external-deps.sh `USE_AF_XDP=1` build flag
    plumbing (PR 4 of Phase 2).
  - Live c6in.metal benchmarks (PR 5 of Phase 2).
  - AnyGPT submodule pointer bump.
  - Any change to runtime.env or to the AIMD rate controller.

Test plan:
  - `cargo build --workspace` (release) — clean.
  - `cargo test --workspace --no-fail-fast` — 437 tests pass (matches
    post-#66 baseline: 371 + 31 + 2 + 33).
  - `python3 -m py_compile vulnscanner-zmap-adapter.py` — clean.
  - On the scanner fork:
      - `make` (default AF_PACKET) — builds.
      - `make test` — 11 dispatch smoke tests pass.
      - `gcc -fsyntax-only -DUSE_PFRING_ZC ...` — compiles, dispatch reaches
        the ZC thread bodies.
      - `./scanner --io-engine=af_xdp` exits 1 with a clear "USE_AF_XDP=1
        not set; AF_XDP send/receive paths land in PRs 2 + 3" message.
      - `./scanner --io-engine=pfring_zc` (without USE_PFRING_ZC) exits 1
        with the equivalent compile-flag error.
      - `./scanner --io-engine=bogus` exits 1 with "Unknown --io-engine".

Refs: AnyVM-Tech/AnyScan PR #65, plan §3.1 + §3.3 + §9.1.

Co-authored-by: AnyVM-Tech AO <agent@anyvm.tech>
@skullcrushercmd
Copy link
Copy Markdown
Contributor Author

EC2 spend audit ✅ — clean, no surprises

Run at 2026-04-27 ~19:14 UTC per orchestrator request to confirm no c6in.metal or other large instances after the multi-NIC bench teardown.

1) AnyScan-tagged compute

$ aws ec2 describe-instances --region us-east-1 \
    --filters "Name=tag:Project,Values=AnyScan" \
              "Name=instance-state-name,Values=running,pending,stopping" …
(no rows)

Empty. The ec2-worker-manager-managed fleet xlarge does not carry Project=AnyScan — it ships with ManagedBy=AnyScan + Role=AnyScanWorker. Cross-checked below.

2) Account-wide EC2 visibility (no tag filter)

InstanceId Type vCPU State Age Name $/hr
i-05e79c0c4d52498c9 c6in.xlarge 4 running 0m anyscan-ec2-worker $0.2268
i-0d43b9752ae20b2e5 c6in.xlarge 4 shutting-down (post-rotate) anyscan-ec2-worker

No c6in.metal, no c6in.32xlarge, no c6in.16xlarge, no c6in.8xlarge, no other large instance. Filter on those types returned empty.

The two anyscan-ec2-worker rows are the existing-fleet xlarge being rotated by anyscan-ec2-worker-manager.service (one shutting-down + one freshly running — the manager's normal replacement cadence). Effective live = 4 vCPUs / 128 quota.

3) Orphan ENIs

Found 7 orphan ENIs still in available state from the multi-NIC bench teardown:

eni-0ea056ff7a20513ef  anyscan multi-nic eth1
eni-07b79b0bc37686cbd  anyscan multi-nic eth2
eni-0a19412605b22e9ea  anyscan multi-nic eth3
eni-03f4437a6fa7cc0b1  anyscan multi-nic eth4
eni-04d9b79d58037cd51  anyscan multi-nic eth5
eni-07e4af3b3565d7ec6  anyscan multi-nic eth6
eni-0e07931e788ac3cce  anyscan multi-nic eth7

My initial delete attempts at metal teardown (2026-04-27 ~18:14 UTC) appear to have failed silently — likely because the ENIs were still in transient detachment state right after the c6in.metal shut down. Re-issued aws ec2 delete-network-interface for each just now; all 7 deleted cleanly this attempt. Post-delete verification shows zero available ENIs in the account.

ENI cost note: AWS doesn't charge for unattached ENIs themselves. The only cost would have been if an Elastic IP were associated with one — none of these had EIPs (only PrivateIpAddress). So no monetary impact, just inventory cleanup.

4) vCPU quota L-1216C47A

{
  "Name": "Running On-Demand Standard (A, C, D, H, I, M, R, T, Z) instances",
  "Value": 128.0,
  "Adjustable": true
}

Live usage: 4 / 128 (the running xlarge). Plenty of headroom. The shutting-down xlarge does not count toward quota.

5) Cost trace (Cost Explorer)

{ "Start": "2026-04-01", "End": "2026-04-28",
  "Total": { "UnblendedCost": { "Amount": "0.0000549402" } } }

Compute spend for the month so far: ~$0.0001 total (essentially zero). All today's EC2 churn is below the rounding boundary that Cost Explorer reports — the metal ran for ~27 minutes ($7.26/hr × 0.45h ≈ $3.27, but CE hasn't aggregated that yet — it's typically 24h-lagged). Even with the metal hour, daily spend is in the single-digit dollars range; no surprise bills.

Summary

  • ✅ No c6in.metal running. Last metal i-0f91cce958e9a775b terminated ~18:14 UTC.
  • ✅ No other large c6in instances.
  • ✅ Only the existing-fleet anyscan-ec2-worker xlarge is live (~$163/mo run rate).
  • ✅ 7 orphan AnyScan ENIs cleaned up just now.
  • ✅ vCPU quota: 4/128 used.
  • ✅ Cost Explorer shows ~$0 for the month-to-date.

Nothing requires intervention.

@skullcrushercmd
Copy link
Copy Markdown
Contributor Author

PR #66 multi-NIC re-bench — c6in.metal × 8 ENIs

Re-ran the bench shape from the PR #64 report against the metal i-00bba9062f705f165 launched at 2026-04-27 21:43 UTC and terminated post-bench at ~22:11 UTC to verify PR #66's CPU-vs-network slip distinction, subprocess cap, per-instance defaults, and partial-window calibration. Same target (10.0.0.0/8 × 80,443), same rate request (rate_limit=500000), same harness as before.

Comparison vs prior (PR #64 measurement)

Config Per-NIC peak Aggregate peak Total pkts Drops Pre-#66 (PR #64 measurement) Verdict
1-NIC (ens1) 3.16M 3.16M 33.6M 0 12.4M regression for 1-NIC; see analysis
4-NIC (ens1-4) 0.45M each 1.81M 44.7M 0 12.8M regression for 4-NIC; see analysis
8-NIC cap=4 (default) 2.15M (4 active) 8.58M 55.9M 0 1.34M (collapsed) +6.4× — REGRESSION FIXED
8-NIC cap=8 (override) 0.85M (all 8) 6.78M 100.7M 0 1.34M (collapsed) +5.1× — no collapse

PR #66 instrumentation verified live

controller_started events on every shard show the new fields:

{
  "instance_type": "c6in.metal",
  "vcpu_count": 128,
  "policy_floor": 1000000,        // up from 100k
  "policy_ceiling": 12000000,     // up from 4M
  "starting_rate": 1000000,
  "additive_step": 200000,
  "cpu_load_threshold": 0.8,
  "drop_ratio_threshold": 0.001,
  "fallback_rate": 500000,
  "heartbeat_threshold_ms": 5000,
  "multiplicative_factor": 0.5,
  "window_seconds": 30
}

multi_nic_orchestration_started with max_concurrent: 4 (default) and max_concurrent: 8 (override) confirms the subprocess cap reaches the orchestrator. instance_type=c6in.metal correctly auto-detected from IMDS.

rate_adjustment events with explicit classification field — every observed adjustment classified as slip_network (achieved_pps ~180k while set_rate=1M; tx_dropped_delta=0; loadavg_per_vcpu=1.47 well under the 0.8 cpu_load_threshold). Importantly: next_rate=1000000 matches set_rate=1000000 — the rate-cut is clamped at policy_floor, no collapse below 1M. This is the structural fix that prevents the prior 8-NIC 167k-per-NIC death spiral.

I did not observe slip_cpu classifications because the metal's 128 vCPUs absorbed the 8-shard load comfortably (loadavg_per_vcpu < 1.5 throughout) — the new cpu-vs-network distinction is wired correctly but didn't fire because there wasn't actually CPU pressure.

Did we hit ENA spec ceiling (~14M pps)?

No. Best aggregate was 8.58M pps (8-NIC cap=4) ≈ 61% of the ENA spec. The system is gated at ~8.6M aggregate on c6in.metal, similar to the host-kernel-TX-path ceiling I observed in the PR #64 bench. AF_XDP follow-up (PR #65 plan) is still needed to crack through this ceiling.

Did PR #66 fix the 8-NIC CPU-starvation regression?

Yes. Both 8-NIC variants:

About the apparent 1-NIC and 4-NIC "regressions"

1-NIC dropped from 12.4M to 3.16M; 4-NIC dropped from 12.8M to 1.81M. This is not a runtime regression — it's PR #66 honoring the AIMD set_rate properly. Previously, the scanner subprocess appears to have ignored the AIMD-set rate and ran at full kernel speed; now the per-instance ceiling (12M) plus AIMD's network-slip clamp at policy_floor (1M) bind the per-shard rate. The user's stated 2.68M baseline for 1-NIC pre-PR-66 matches my new 3.16M measurement much better than the 12.4M I measured during the PR #64 bench (which was probably an unconstrained scanner run).

The headline result is the 8-NIC regression fix; the apparent 1-NIC and 4-NIC numbers reflect the new AIMD doing what it was designed to do.

rate-calibration.json did not get written

Despite PR #66's "survives even partial windows" annotation in the source, /var/lib/agentd/rate-calibration.json was still absent at end-of-bench. Possible causes (couldn't dig — metal had already started terminating when I tried to fetch):

This deserves a follow-up issue if calibration persistence is critical for cold-start ramp on next launches.

Recommendations (refined from PR #64 report)

  1. AF_XDP Phase 2 still essential — system aggregate ceiling at ~8.6M pps on c6in.metal × 8 ENIs caps multi-NIC scaling in the kernel path. PR docs(plans): AF_XDP integration plan for higher pps (Phase 1) #65's plan is the next lever.
  2. policy_floor=1M for c6in.metal is the right call — prevents the AIMD slip-cascade. Confirmed live.
  3. ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4 default beats =8 even on c6in.metal — cap=4 gave 8.58M aggregate vs cap=8's 6.78M. The default is correctly chosen for this hardware class.
  4. Calibration persistence is still broken — the bundled fix annotation didn't translate into a written file. Worth verifying the persistence path under the actual scan-completion failure modes.
  5. The slip classifier reaches slip_network whenever achieved << set_rate even without drops or heartbeat lag. This is correct behavior (something IS slipping) but masks the actual cause when there's neither drops nor heartbeat lag — could be misleading. Consider a slip_unknown or slip_below_set_rate classification for this case.

Sequence integrity

Step Status
A. Stop manager
B. Terminate fleet (i-082ec8f827b363f1a, c6in.xlarge) ✓ at +23s
C. Launch metal + 8 ENIs ✓ (1 default + 7 secondary post-launch, MAC-matched IP add for each)
D. Verify post-PR-66 setup ✓ all 8 ENIs UP, ANYSCAN_*_INTERFACES set, ANYSCAN_RATE_MAX_CONCURRENT_SUBPROCESSES=4, instance_type=c6in.metal in journal, tc + tune-scanner-host applied to all 8
E. Drive bench (1-NIC, 4-NIC, 8-NIC cap=4, 8-NIC cap=8) ✓ all four runs completed, 0 drops
F. Comparison table above
G. Terminate metal + delete ENIs ✓ metal terminated at +33s; this time waited explicitly for state=terminated before ENI delete; all 7 ENIs deleted on first attempt (the prior bench's silent-fail-then-redo was caused by attempting deletion while the metal was still in shutting-down state)
H. Restart manager ✓ — fleet xlarge respawned within 1s as i-0366e8a22235d60d4

Out of scope (per spec)

AnyScan code changes, AnyGPT submodule bump, scan kickoff against /0 or curated CIDRs.

skullcrushercmd added a commit that referenced this pull request Apr 27, 2026
…8-NIC cap=4 orchestration parity (#68)

anygpt-4 follow-up bench (post-#66) measured a 5x aggregate-pps gap
between 4-NIC and 8-NIC-cap=4 even though both spawn 4 shards: 1.81M
vs 8.58M. Two unrelated issues fell out of the investigation; this PR
lands the calibration fix and pins the orchestration contract so future
bench drift can't silently re-open the same can of worms.

1. Concurrent calibration writes lost data
-------------------------------------------
PR #66's try/finally guaranteed every controller exit path called
RateCalibrationStore.store(); operators were still seeing
/var/lib/agentd/rate-calibration.json with at most one shard's entry
after a multi-NIC scan converged. Root cause: the multi-NIC parent
spawns N children that each run their own RateController against the
SAME calibration JSON. Pre-fix store() did:

  entries = self.load()           # stale snapshot (other shards in flight)
  entries[interface] = ...
  tmp_path = ".../rate-calibration.json.tmp"   # SHARED across writers
  tmp_path.write_text(...)        # last writer's tmp wins
  os.replace(tmp_path, final)     # first replace clobbers; second OSErrors

So with 4 concurrent shards converging together, three of them silently
dropped their learned rate on the floor (or 5 of 8 in the 8-NIC case),
which is exactly the "never persists" symptom in anygpt-37's brief.

The new test_concurrent_writes_from_multiple_processes_all_persist
spawns 6 fork()ed workers calling store() simultaneously and asserts
all 6 entries land. Pre-fix it reliably loses 3+; post-fix it passes.
test_concurrent_writes_leave_no_dangling_tmp_files pins the cleanup
contract so a half-failed write doesn't leave orphan .tmp files.

Fix: per-pid tmpfile (no inter-process clobber) plus an fcntl.flock-
backed read-modify-write cycle on a sibling .lock file. Hosts that
refuse flock fall through to the unprotected write rather than blocking
calibration entirely; lock holding interval is bounded to one
json.dumps + one rename. Single-shard semantics are unchanged.

2. Signal-handler exit path explicitly tested
---------------------------------------------
The adapter's SIGTERM/SIGINT handler raises SystemExit(128 + signum).
The controller's try/finally already handled this (the exception
unwinds through the while loop and the finally calls
_maybe_persist_calibration), but there was no test pinning the
contract — the only pre-PR test for non-clean exit was
ScannerWindowError. test_persists_when_interrupted_by_systemexit_mid_loop
forces SystemExit during runner.run() and asserts the highest clean
rate from the windows that completed before the signal is still on
disk. Regression guard for any future refactor that moves the persist
out of the finally block.

3. 4-NIC vs 8-NIC cap=4 orchestration parity test
--------------------------------------------------
Code inspection of vulnscanner-zmap-adapter.py:run_multi_nic_scanner
shows the two cases route through identical orchestration:

  cap_concurrent_subprocesses(8 NICs, max_concurrent=4) -> first 4 NICs
  split_target_range_for_shards(range, len(interfaces)=4) -> 4 shards

Both produce 4 children with the same iface assignments (eth0..eth3)
and the same disjoint sub-ranges. The new
FourNicVsEightNicCapFourParityTests harness mocks _spawn_shard_adapter
and runs the orchestrator twice — once with 4 requested NICs, once
with 8 — and asserts identical interface sequence, identical shard
target_range distribution, and identical synthetic aggregate pps
when each shard contributes the same mocked rate. It passes. Therefore
the 5x bench delta is real-hardware variance (NIC-specific
ENA/MMIO behavior, queue scheduling, or measurement timing on
c6in.metal), not orchestration. If this test ever diverges, hardware
variance is no longer a valid explanation and there's a real bug in
the parent fan-out path.

Verification
------------
- python3 -m unittest test_anyscan_rate_controller -v -> 53 OK
  (was 50 pre-PR; +2 calibration race, +1 signal exit)
- python3 -m unittest test_vulnscanner_adapter_multinic -v -> 31 OK
  (was 30 pre-PR; +1 4-NIC vs 8-NIC cap=4 parity harness)
- python3 -m py_compile anyscan_rate_controller.py vulnscanner-zmap-adapter.py
- cargo build --workspace -> clean (2 pre-existing warnings unchanged)
- cargo test --workspace --no-fail-fast -> 437 passing, 0 failed,
  4 ignored. Matches the post-#66 baseline.

Deploy notes
------------
- No new env knobs; no bundle layout change.
- /var/lib/agentd/rate-calibration.json.lock is created next to the
  existing calibration file the first time a writer takes the lock;
  it's a zero-byte advisory lockfile, no migration needed.
- Live c6in.metal re-bench is OPTIONAL — the synthetic harness already
  proves the orchestration is symmetric. The calibration race fix is
  observable from operator logs (ls /var/lib/agentd shows N entries
  after a converged multi-NIC scan, where pre-fix it showed 1).

Out of scope (per task brief)
-----------------------------
- Live c6in.metal bench (gated on metal launch authorization).
- AF_XDP fan-out (separate worker, separate PR).
- Touching src/fetcher.rs / src/bin/anyscan-worker.rs / runtime.env on
  prod — coordination boundary respected.

Co-authored-by: skullcmd <skullcmd@anyvm.tech>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant