Skip to content

feat(scan): k8s-native default probe rules#115

Merged
WZ merged 3 commits into
mainfrom
feat/proactive-scan-sensible-defaults
Apr 22, 2026
Merged

feat(scan): k8s-native default probe rules#115
WZ merged 3 commits into
mainfrom
feat/proactive-scan-sensible-defaults

Conversation

@WZ
Copy link
Copy Markdown
Owner

@WZ WZ commented Apr 22, 2026

Stacked on #114. Merge chain: #108#112#113#114 → this.

What this PR does

Replaces the three default probe rules with k8s-native equivalents. The old defaults used service="{service}" label selectors that don't exist on most k8s Prometheus setups. Confirmed live: all three old defaults returned rawResultCount: 0 on every tested service.

Smoke-test findings that drove this

Full writeup in docs/plans/2026-04-22-scan-smoke-test-findings.md (local only). Highlights:

Default rule (old) rawResultCount against real Prom
up{service="stolon-proxy"} 0 — no service= label in this stack
http_requests_total{service="stolon-proxy"} 0 — same
http_request_duration_seconds_bucket{service="stolon-proxy"} 0 — same
New default rule Tested value
kube_deployment_status_replicas_available{deployment="stolon-proxy"} 2
kube_statefulset_status_replicas_ready{statefulset="..."} works per-service
kube_daemonset_status_number_ready{daemonset="..."} works per-service

The existing service-health-poller matches by the same label precedence (service-health-poller.ts:162-166 — deployment → statefulset → daemonset → job → instance). This PR brings the scan probe to the same schema.

End-to-end dispatch was proven working: 3 scan-triggered investigations ran to completion, persisted, scored via rca-eval --source scan (50/100 avg on synthetic trigger, expected — the "silent degradation" failure mode flagged in the design doc when there's no real evidence to investigate).

Changes

src/config/schema.ts:

  • 3 new default rules, one per k8s workload type (Deployment / StatefulSet / DaemonSet).
  • consecutiveTicks: 3 across the board. Rolling deploys briefly drop _available below desired; 3 ticks × 4h cron = 12h filter window, long enough to ignore rollout churn, short enough to catch real outages on the next cadence.
  • Dropped the old error_rate and latency_p99 defaults. Both assumed labeled application-level HTTP metrics (http_requests_total / http_request_duration_seconds_bucket) that are too environment-specific. Operators with those metrics add them via Settings → Scan → Rules (Step 3 UI) or config.yaml override.
  • Comment above the default block explains the reasoning + points to the smoke-test findings.

src/config/schema.test.ts:

  • Assertion updated to match new defaults.
  • Added invariant check: all defaults must include {service} placeholder (regression-guard against someone accidentally committing a rule that can't substitute per-service).

What this doesn't do

  • Doesn't cover non-k8s environments. Bare-metal, ECS, Nomad stacks will need custom rules via the GUI or config.yaml override. Documented in the schema comment.
  • Doesn't re-add error_rate / latency_p99. These need labels that only exist in some stacks. Adding them back as defaults would just reproduce the same rawResultCount: 0 problem for most users. Left for operator-added via GUI.

Test plan

  • Type check + full suite — 1296/1296 passing on branch
  • Fresh install with no scan.probe.metrics in config.yaml → gets the 3 k8s defaults
  • Existing users with source: "gui" on rules → unaffected (DB override wins over config default, same pattern as scan.enabled etc.)
  • Existing users with custom scan.probe.metrics in config.yaml → unaffected (explicit override)
  • On a k8s Prometheus with kube-state-metrics: POST /api/scan/rules/test returns rawResultCount: 1 and a numeric value for at least one service per rule-type
  • On a non-k8s environment: rawResultCount: 0 on all three defaults → operator follows the GUI override path

Rollout

Zero runtime behavior change on merge for anyone who already has scan.enabled: true — they presumably have working rules already (either config.yaml override or DB/GUI override). Only config.yaml-default users see different defaults on next reload.

@WZ WZ force-pushed the feat/proactive-scan-lane-b-per-service branch from 8a0d233 to 05231c6 Compare April 22, 2026 15:25
@WZ WZ force-pushed the feat/proactive-scan-sensible-defaults branch from 51ca7ed to 79b11f6 Compare April 22, 2026 15:25
WZ added 3 commits April 22, 2026 09:50
Replace the three default probe rules in config.scan.probe.metrics. The old
defaults used 'service="{service}"' label selectors which don't exist on
most k8s Prometheus setups (kube-state-metrics labels are deployment= /
statefulset= / daemonset=). Confirmed by smoke test 2026-04-22: all three
old defaults returned rawResultCount:0 for every tested service in a live
k8s environment.

New defaults, one per workload type:
  deployment_availability   kube_deployment_status_replicas_available{deployment="{service}"} < 1
  statefulset_availability  kube_statefulset_status_replicas_ready{statefulset="{service}"} < 1
  daemonset_availability    kube_daemonset_status_number_ready{daemonset="{service}"} < 1

Each service matches exactly one (or none, for services not tracked by
kube-state-metrics). Non-matching rules return empty vector and score 0
harmlessly. Matches the label precedence the existing service-health-poller
already uses (service-health-poller.ts:162-166).

Dropped the old error_rate + latency_p99 defaults. Both assume labeled
application-level HTTP metrics (http_requests_total, http_request_duration_
seconds_bucket) which are too environment-specific to be useful defaults —
operators with those metrics add them via Settings → Scan → Rules (Step 3)
or config.yaml override.

consecutiveTicks bumped from 1/2 to 3 across the board. Rolling deploys
briefly drop _available below desired; 3 ticks on the default 4h cron
(12h) is long enough to filter any reasonable rollout window without
missing a real incident.

For non-k8s environments: override via Settings → Scan with queries that
match your label schema. The GUI rule editor from Step 3 is the supported
path for per-environment tuning.

Full test suite 1296/1296 green (schema.test.ts updated to assert the new
defaults + {service} placeholder presence + correct label selectors).

Findings writeup: docs/plans/2026-04-22-scan-smoke-test-findings.md (local).
Review-caught regression fix. The three k8s availability defaults would
trip on legitimate edge cases the previous iteration missed:

  - HPA min=0 / cron-style / paused deployments (spec=0, available=0):
    would false-positive on `available < 1`.
  - Arch-mismatched daemonsets (e.g. kube-flannel-ds-arm on amd64-only
    nodes): observed firing in the 2026-04-22 smoke test because
    desired_number_scheduled=0 and number_ready=0 both satisfy `< 1`.
  - StatefulSets scaled to 0 for maintenance: same pattern.

Fix: every default query is now ANDed against a "desired/spec > 0" guard.
When the workload is intentionally at zero, the guard's right side returns
empty, the `and` eliminates the left side, the query's total result is
empty, which the probe scores as NaN → no trip. When the guard is true,
the left side's value flows through unchanged for threshold evaluation.

PromQL changes (one per rule):
  deployment:   + and kube_deployment_spec_replicas{deployment="..."} > 0
  statefulset:  + and kube_statefulset_replicas{statefulset="..."} > 0
  daemonset:    + and kube_daemonset_status_desired_number_scheduled{daemonset="..."} > 0

Tested against live Prometheus: `stolon-proxy` (healthy, spec=2, avail=2)
still returns value=2 with the guarded query. rawResultCount=1.

Also: comment math fixed. Previous comment claimed "3 ticks × 4h = 12h"
for the hysteresis window. Correct math is 8h (ticks at t=0, t=4h, t=8h
= 3 consecutive breaches = trip on the 3rd). Updated.

Test additions: regression assertions in schema.test.ts that lock the
guard clauses into the default rule schema. Any future change that
removes the `> 0` guard or the `kube_*_spec_*` / `_desired_*` metric
references will fail the suite. 1296/1296 green.

Net effect: operators running the default rules will get cleaner,
false-positive-free scans at the cost of NOT detecting "service intended
to run but spec=0 config drift." That's a config-management issue, not
a runtime anomaly — out of scope for the scan probe.
Adds a quality gate to rca-eval. `--min-score N` exits non-zero when the
average total score falls below N, so operators can gate pre-ship checks
on RCA quality instead of eyeballing a table.

Example usage:
  npx tsx src/eval/rca-eval.ts --min-score 75
  npm run test:rca-eval          # min-score 20 (interim floor)

Interim floor is 20 because current avg is 24/100 — the real issue is
that 77% of reports persist as empty stubs (0/0/0/0/0), not that
non-empty reports score poorly. Raise the floor once reliability work
lands. CI wiring intentionally deferred: ci.yml runs without dops.sqlite,
so a fixture DB is needed before this can become a hard CI check.

Surfaced by /autoplan CEO review — the RCA eval harness had no
programmatic gate, so quality regressions (like the current 67→24 drop)
had no way to fail a build.
@WZ WZ force-pushed the feat/proactive-scan-sensible-defaults branch from 1943508 to 08fb02c Compare April 22, 2026 16:51
@WZ WZ changed the base branch from feat/proactive-scan-lane-b-per-service to main April 22, 2026 16:51
@WZ WZ merged commit 4bb01f4 into main Apr 22, 2026
WZ added a commit that referenced this pull request Apr 23, 2026
* feat(scan): Slice A — schema + registry shape for discovery-owned probe rules

Prepares the scan pipeline for discovery-written probe rules. Schema adds
the per-rule `source` discriminator ("metrics" | "logs") and the per-service
`probeRules` field; registry persists a new top-level object shape
{services, globalProbeRules} with forward-compat reads of legacy flat-array
files. No runtime behavior change — Slice B (discovery prompt) and Slice C
(four-track probe evaluator) build on this.

Schema (src/config/schema.ts):
  - ProbeMetricRuleSchema gains `source: z.enum(["metrics","logs"]).default("metrics")`
    so the probe knows which MCP tool role to dispatch against. Default is
    "metrics" — every existing config and #115 k8s-native rule continues to
    route through the Prometheus tool unchanged.
  - ServiceSchema gains `probeRules: z.array(ProbeMetricRuleSchema).optional().default([])`.
    Populated by discovery in Slice B; empty today.
  - ProbeSchema gains `logsQueryTimeoutMs: z.number().default(10_000)`.
    Loki `count_over_time` over a 15m window regularly takes 5-20s; reusing
    the 3s Prometheus timeout would produce silent false negatives. Eng-review
    decision 2026-04-22.
  - ProbeMetricRuleSchema + ThresholdSchema moved above ServiceSchema so the
    probeRules field can reference them (Zod requires forward declaration).

Registry (src/services/registry.ts):
  - Persisted shape inverted from ServiceConfig[] to RegistryFile
    {services, globalProbeRules}.
  - Forward-compat reader: legacy flat-array services.yaml files parse into
    {services: parsed, globalProbeRules: []} without migration. The first
    write upgrades the file on disk. Operators on old files continue to load.
  - Public API preserved: load()/save()/getVersion()/rollback()/listVersions()
    keep their signatures. No call site in routes.ts, agents.ts,
    scan-scheduler.ts, or service-health-poller.ts had to change.
  - save() internally reads current globalProbeRules and carries them forward
    on every write — routes.ts metadata edits, service renames, and rollback
    cannot silently clobber the discovery-written top-level rules. Eng-review
    decision 2026-04-22 (silent-clobber guard).
  - rollback() preserves CURRENT globalProbeRules (not historic ones). Historic
    snapshots from before Slice A have globalProbeRules=[]; rolling back to
    them would wipe the live rules.
  - New methods: loadGlobalRules(), saveGlobalRules(rules, source),
    loadAll() (atomic combined snapshot for the scan probe's per-tick read),
    saveAll(file, source) (atomic combined write — preferred for discovery,
    produces one version entry instead of two), getVersionFile(id) (historic
    snapshot including globals-as-of-that-version).

GUI validator (src/server/scan-rule-validator.ts):
  - RuleSchema gains `source: z.enum(["metrics","logs"]).default("metrics")`
    so GUI-authored rules parse into the canonical ProbeMetricRule shape.
    Log-source rules come from discovery in Slice B; the GUI editor stays
    metrics-only for v1.

Discovery types (src/types/discovery-types.ts):
  - ValidatedServiceConfigSchema and ServiceRegistryVersionSchema pre-thread
    probeRules so Slice B can populate the field without a second type-level
    change. Empty default today.

Tests (10 new, 3 updated):
  - schema.test.ts: source defaults, log-source accept, invalid source reject,
    logsQueryTimeoutMs default + validation, per-service probeRules parse.
  - registry.test.ts: legacy-shape read, saveGlobalRules round-trip,
    save()-preserves-globals (silent-clobber guard), rollback()-preserves-
    current-globals, saveAll() atomic write (one version entry), forward-
    compat loadAll() on legacy files, first-save upgrades on-disk shape.
  - scan-service-override.test.ts + scan-settings.test.ts: two shape
    assertions now include `source: "metrics"` where the validator defaults it.

Full suite: 1381/1381 pass in isolation per file. One rate-limit test flakes
under parallel load, passes standalone — not caused by this change.

Design doc: ~/.gstack/projects/WZ-dops-assistant/wli02-feat-llm-driven-probe-rules-design-20260422-continuation.md

* feat(discover): Slice B — LLM writes probe rules + discover-eval harness

Teaches the discovery agent to emit per-service probeRules and top-level
globalProbeRules, then plumbs them through validate → accept → registry
so they land in services.yaml. Adds discover-eval as a quality gate so a
prompt regression that stops emitting rules surfaces before it ships.

No runtime behavior change for the probe yet — Slice C (four-track probe
evaluator) is what actually reads the rules. This slice is data-producing
only: the fields show up in services.yaml, the probe still runs against
the hardcoded config.yaml defaults.

Prompt changes (src/agents/discover.ts):
  - Output shape: JSON OBJECT {services, globalProbeRules} instead of a
    bare services array. Bare-array form stays accepted for backward-compat
    with stacks that don't produce globals.
  - Per-service probeRules guidance: the agent writes `pod_restarts` for
    k8s workloads with a resolvable namespace, and `log_errors` for
    services with non-empty logLabels. Both are spec'd with canonical
    names, selectors, and thresholds so output stays consistent run to run.
  - globalProbeRules guidance: label-key introspection (majority-wins on
    app / service / job / deployment / statefulset / daemonset). Rewrites
    availability rules using the correct key for the stack. When the stack
    matches the hardcoded k8s defaults, globalProbeRules stays empty (no
    duplication).

Data flow (src/workflows/steps/discover.ts, src/workflows/discovery.ts,
src/server/agents.ts, src/server/ws-handler.ts, src/cli/commands/discover.tsx):
  - runDiscoverStep returns `{services, globalProbeRules}`; both legacy
    bare-array and new object-form agent outputs parse correctly.
  - runDiscovery threads the pair through validation.
  - IDiscoverAgent.discover() returns DiscoveryResult; accept() takes an
    optional third globalProbeRules arg.
  - MastraDiscoverAdapter.accept() routes to registryStore.saveAll() when
    globals are provided (one atomic version entry), else falls through to
    save() which preserves the file's current globals (silent-clobber
    guard from Slice A). Exactly the API shape Slice A's registry was
    designed to accept.
  - ws-handler caches the full DiscoveryResult in pendingDiscovery so the
    discover:accept handler can pull globalProbeRules server-side — the UI
    only echoes services over the wire.
  - cli discover.tsx holds globalProbeRules in state alongside services
    and passes both on accept.

Eval harness (src/eval/discover-eval.ts, src/eval/fixtures/discover-k8s-fixture.yaml):
  - Four 25-point dimensions, 100 total: globals present, per-service
    present (partial credit when minority of services have rules),
    PromQL parses, LogQL parses.
  - Lightweight parsers catch the failure modes an LLM actually produces:
    empty string, unbalanced braces/parens, YOUR_NAMESPACE / <namespace>
    placeholder tokens. Runtime NaN fail-safe in Slice C catches
    "parses but returns empty vector".
  - Fixture at src/eval/fixtures/discover-k8s-fixture.yaml represents
    expected well-formed output on an `app=`-keyed k8s stack. Scores
    100/100 today; any drift in scoring calibration surfaces as a test
    failure immediately.
  - npm run test:discover-eval — gates at --min-score 75 against the
    fixture. Wire up against a live services.yaml in CI once the agent
    has run on a real cluster.

Tests:
  - discovery.test.ts: new case covering the object-form agent output
    path; existing cases updated to assert {services, globalProbeRules}
    shape.
  - discover-eval.test.ts (new, 17 tests): PromQL/LogQL parsers, each
    dimension scorer, top-level eval against fixture + empty + legacy
    flat-array inputs.

Full suite: 1399/1400. One rate-limit flake under parallel load, passes
standalone — same pre-existing one from the Slice A run.

Design doc: ~/.gstack/projects/WZ-dops-assistant/wli02-feat-llm-driven-probe-rules-design-20260422-continuation.md

* feat(scan): Slice C — four-track probe evaluator + Loki metric-queryType

Makes the probe READ what Slice B's discovery agent writes. Replaces the
single-track config.yaml-only loop with a four-track evaluator that fans out
across global rules, per-service rules (metrics + logs), config defaults,
and a probe.logs fallback. Hysteresis state keys are origin-namespaced so
same-named rules on different tracks track independently.

This is the slice that actually changes runtime behavior. Stacks where
discovery has run will now evaluate the discovery-written rules; stacks
where it hasn't continue to fall through to PR #115's k8s-native defaults
(byte-identical regression-tested).

Probe (src/server/anomaly-probe.ts):
  - ProbeOptions gains `registryStore: ServiceRegistryStore` and
    `lokiDatasourceUid?: string`. The store is read once per tick via
    `loadAll()` for an atomic {services, globalProbeRules} snapshot —
    discovery running mid-tick can't produce a half-state.
  - New `RuleOrigin` type ("global" | "service" | "default" | "override" |
    "logs-fallback"). Threaded into ProbeHit and the state key.
  - `stateKey(service, origin, ruleName)` exported — namespaced so a
    global "availability" and a per-service "availability" with the same
    name share neither hysteresis nor a state-clearing pass.
  - Shared `withTimeoutAndAbort(tool, args, signal, ms)` helper extracted
    from executeInstant. Both metric and log executors use it. DRY win
    flagged in the code-quality review.
  - `executeInstantLogs(...)` calls the Loki MCP tool with
    `queryType: "metric"` so `count_over_time(...)` returns a scalar
    (vs `query_loki_logs`'s default which returns log lines). Uses
    probe.logsQueryTimeoutMs (10s default) — Prom's 3s timeout would
    silently NaN out on slow Loki clusters. NaN on any failure, never
    throws. If the tool rejects queryType:"metric" (older Grafana MCP),
    the error is logged once via withTimeoutAndAbort and every log-source
    rule scores NaN for the rest of the tick.
  - `findLogQueryTool(tools)` mirrors findMetricQueryTool: prefers
    `query_loki_logs`, falls back to generic log-query tool names.
  - `runProbe` rewritten as four-track evaluator. Per service:
    * Operator override (disabled or rules) wins. Marked origin:"override".
    * Tracks 1 vs 4 — globalProbeRules REPLACE config.yaml defaults when
      non-empty (origin:"global"); otherwise config defaults fire
      (origin:"default"). Tier 4 stays as the ultimate fallback for
      stacks where discovery has never run.
    * Tracks 2 + 3 — per-service probeRules are ADDITIVE on top of the
      base. Discovery writes only the rules it has unique context for
      (pod_restarts with real namespace, log_errors with real Loki
      labels). origin:"service".
    * probe.logs generic fallback fires only when probe.logs.enabled,
      the service has logLabels, and no per-service log-source rule was
      written. origin:"logs-fallback". Threshold scales errorRateThreshold
      by window minutes (raw count = per-min × min).
  - Single shared mapWithConcurrency semaphore across all four tracks +
    fallback — probe.concurrency caps total in-flight queries per tick.
  - Service registry resolution and per-service config lookup is O(1) via
    a Map built once per tick.

Scheduler (src/server/scan-scheduler.ts):
  - Threads registryStore + lokiDatasourceUid through to runProbe.
  - New optional dep `getLokiDatasourceUid?: () => string | undefined` —
    when undefined, log-source rules score NaN, metrics-source rules
    continue. Wired live so operators can add Loki later without restart.
  - resetHysteresisForService updated for 3-part key: switched from
    `slice(0, lastIndexOf(":")) === service` to `startsWith(service+":")`.
    The previous parser would have silently failed to clear any
    Slice-C-format key (it would have compared "svc-a" to "svc-a:global").
    Bug never reached prod — caught in this slice's own development.
  - resetHysteresisForChangedRules works unchanged: rule name is still
    after the last colon in the new key format.

Tests (src/server/anomaly-probe.test.ts):
  - All 14 existing runProbe call sites updated to pass `registryStore`
    via a small `fakeRegistryStore({...})` helper. State-key assertions
    use the new origin-namespaced helper `defaultKey(service, ruleName)`
    so the change is visible at the test layer too.
  - `mockLogsTools` added at module scope; getToolsByRole mock now routes
    by role so log-source tracks can supply their own tool surface.
  - Nine new four-track scenarios:
    * Track 1 replaces track 4 when global rules exist.
    * Track 4 regression: byte-identical to PR #115 when no globals.
    * Track 2 additive — global + per-service both fire.
    * Track 3 calls the LOGS tool (not metrics) with queryType:"metric"
      and the Loki UID.
    * Track 3 scores NaN with no logs tool wired.
    * probe.logs fallback fires from logLabels when no per-service
      log-source rule exists.
    * probe.logs fallback does NOT fire when one does.
    * Origin-namespaced state keys: same-named rules on different tracks
      keep independent counters.
    * Registry snapshot atomicity: loadAll() called exactly once per
      tick across N services.

Full suite: 1407/1407 (102/102 files green this run — no rate-limit flake).
Type check clean.

Design doc: ~/.gstack/projects/WZ-dops-assistant/wli02-feat-llm-driven-probe-rules-design-20260422-continuation.md

* fix(scan): address /review adversarial findings — validate LLM output, preserve globals on empty sweep, GC orphaned hysteresis

Three HIGH-priority findings from the pre-landing /review that both Claude
(subagent) and Codex adversarial passes converged on. Plus three mechanical
auto-fixes from the critical-pass.

## Mechanical auto-fixes

- `src/eval/rca-eval.test.ts` — ProbeHit literal missing the new required
  `origin` field. tsconfig.json excludes `*.test.ts` from tsc so the type
  error didn't surface; vitest passed because buildInvestigationMessage
  doesn't read `origin`. Fixed before the next consumer trips over it.
- `src/web/components/scan/types.ts` — comment claimed "keep in lockstep
  with ProbeMetricRule" but RuleDraft deliberately omits `source` (GUI
  editor is metrics-only in v1; validator defaults to "metrics" on PUT).
  Comment updated to reflect the actual contract.
- `src/server/anomaly-probe.ts buildGenericLogQLFromLabels` — escape
  backslash before double-quote so logLabel values containing `\` don't
  double-escape. k8s RFC 1123 label values never need this in practice,
  but the function takes an untrusted map and the fix is 3 characters.
- `src/server/scan-scheduler.ts:329` — inline comment said
  `"service:ruleName"` but Slice C changed the format to
  `"service:origin:ruleName"`. Parse still works (lastIndexOf gets the
  right colon), but the comment would mislead a future maintainer trying
  to extract the service prefix. Updated with an explicit warning not to
  use `slice(0, colonIdx)` for service extraction.

## Adversarial fix A — validate LLM-written rules before persistence

Cross-model HIGH (Claude F3 + Codex #3). runDiscoverStep was casting raw
JSON from the agent to ProbeMetricRule with `as`. Worst cases documented:
malformed threshold op silently never trips; rule name `db:slow` breaks
state-key parsing; `source: "log"` (typo) routes a LogQL query to the
Prometheus tool and NaNs — the operator sees a services.yaml that claims
log monitoring exists, but none runs.

Fix in `src/workflows/steps/discover.ts`:
- `ProbeMetricRuleSchema` exported from `src/config/schema.ts` so the
  discovery path can safeParse against the canonical shape.
- `validateDiscoveredRules(raw, source)` drops rules that fail shape
  validation (including invalid threshold op, missing query, wrong
  source value) AND rules whose name violates `^[^:]+$` (the state-key
  delimiter constraint scan-rule-validator already enforces on the GUI
  path). Structured warn log per dropped rule so operators can see what
  the LLM produced.
- `validateDiscoveredServices(raw)` applies the same guard to per-service
  `probeRules` before the service reaches the full-shape validator.

The scan-rule-validator (GUI path) is explicitly scoped — reusing it
would have bundled its `.strict()` mode which rejects unknown keys and
would break when we add fields later. Parallel schema is the right
tradeoff.

## Adversarial fix B — globals survive empty service sweep

Codex HIGH #2. runDiscoverStep:283 required `parsed.services.length > 0`
to accept the object form, but runDiscovery:41 has an explicit
`complete-empty` phase that surfaces globalProbeRules when services is
empty. Today a transient empty discovery would silently wipe the learned
label-key override, pushing the probe back to config.yaml defaults.

Fix: the object-form branch now accepts when EITHER services OR
globalProbeRules is non-empty. A discovery run that only wrote globals
(label-key introspection succeeded, service sweep came back empty) is
a valid useful result.

## Adversarial fix C — GC orphaned hysteresis state at tick start

Cross-model HIGH (Claude F1 + Codex #5). `resetHysteresisForChangedRules`
only ran on `PUT /api/scan/settings` (config.yaml reload). Discovery's
accept() path writes to services.yaml but never resets state. Old
counters under `{svc}:global:OLD_RULE_NAME` leaked forever after a
discovery-driven rule rename. Same pattern for removed per-service
rules and hidden services.

Fix in `runProbe` (`src/server/anomaly-probe.ts`): after building the
tick's task list, compute the set of active stateKeys, and drop every
consecutiveState entry not in that set. Covers every rule-change path —
discovery writes, config reloads, override toggles, hidden-service
changes — with one Map diff per tick. Debug-log the orphan count.

## Testing

- `discovery.test.ts`: +3 new tests exercising the three fixes
  (B: object form accepted with empty services + non-empty globals;
  A: rule with `db:slow` name dropped; A: rule with invalid threshold
  op dropped). Mock adapter extended with a per-test reply-override
  escape hatch.
- `anomaly-probe.test.ts`: +1 test pre-loading orphan state keys,
  running a tick with a different rule name, verifying both orphans
  were GC'd. Exercises both the "rule renamed" and "service removed"
  cases in one assertion.
- Full suite: 102/102 files green. Type check clean.

## Deferred (noted in the design doc "Not in Scope")

- File locking on registry writes — RMW race window is real but narrow;
  no existing writer uses locking. P2.
- Static query-cost bounds in discover-eval — runtime timeouts
  (queryTimeoutMs 3s, logsQueryTimeoutMs 10s) cap per-query wall-clock.
  P2 if prompt-driven DDoS ever surfaces.

Design doc: ~/.gstack/projects/WZ-dops-assistant/wli02-feat-llm-driven-probe-rules-design-20260422-continuation.md

* chore: bump version and changelog (v0.1.0.0)

MINOR bump — discovery-owned probe rules ship. Discovery agent writes
per-service `probeRules` and top-level `globalProbeRules` into services.yaml;
four-track probe evaluator reads them; new discover-eval harness gates
output quality.

Also syncs `package.json.version` (0.1.14, stale from old 3-digit scheme)
back in line with the VERSION file's 4-digit scheme.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: update CLAUDE.md for discovery-owned probe rules (v0.1.0.0)

- Add Where-to-Look entries for anomaly-probe, scan-scheduler, discover-eval
- Note services.yaml shape change ({services, globalProbeRules}, flat-array forward-compat)
- Document four-track probe evaluator + LLM output validation
- Add npm run test:discover-eval to commands
- Refresh test-file count (73 -> 100+)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@WZ WZ deleted the feat/proactive-scan-sensible-defaults branch April 25, 2026 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant