fix(scan+discovery): four P1/P2 quick wins — terminal-emit, NaN log, budget warn, webhook hint#133
Merged
Conversation
…ng, budget warn, webhook hint
AP2 — src/workflows/discovery.ts: wrap the workflow body in try/finally so
the caller always receives a terminal phase ("complete" / "complete-empty"
/ "complete-validation-failed" / "complete-failed") even when the
validation step stalls or throws mid-flow. If validation fails, fall back
to the unverified discovered services tagged with confidence "unverified"
and a validationNotes explaining the fallback, so services.yaml is still
written instead of being lost to a thrown promise. Covers the "declared
likely fixed but never re-verified" gap from QA Issue #7.
AP4 — src/server/anomaly-probe.ts: log at INFO when a rule scores NaN
(empty vector, Loki datasource unwired, MCP timeout/error). The tick
summary already counts probeErrors / queriesEmpty in aggregate; this
per-rule line is what operators need to locate the specific broken rule.
AP6 — src/server/anomaly-probe.ts: WARN at the end of task build when a
tick generates more than 200 queries. Not a hard block; observability
only so operators notice cardinality growth (many services × many
per-service rules) before it floods the metrics backend.
AP9 — src/server/webhook-handler.ts: expand the 503 hint from "Set
webhook.secret in config.yaml and restart" to name the config section
and explicitly mention restarting the server, so operators hitting this
code path can act without a support loop.
All four changes are covered by unit tests alongside the affected files.
Follow-ups from /review on this branch:
- discovery.ts: factor terminal phase strings into an exported
TERMINAL_DISCOVERY_PHASES union/const so the WS bridge no longer
relies on a `startsWith("complete")` string-prefix convention. Move
the "complete-failed" assignment into an explicit outer catch (was
an initializer default). Pass the Error object (not message) to the
validation-failure WARN so the stack survives. Clarify the
validationNotes text ("validation did not complete") so it reads
correctly on the throw-after-start path.
- ws-handler.ts: swap `phase.startsWith("complete")` for the shared
TERMINAL_DISCOVERY_PHASES membership check.
- anomaly-probe.ts: tighten AP6 threshold comment (drop arithmetic-
wrong citation) and trim the AP4 NaN-log comment (remove
self-contradicting abort paragraph).
- tests: add a beforeEach reset for loggerCalls so future assertions
don't inherit prior-test noise; drop the dead "query budget"
predicate clause; add a boundary test at tasks.length === 200; add
a kind="error" path test for the AP4 NaN log; assert the
confidence="unverified" + validationNotes contract on the AP2
validation-throws fallback.
QA turned up a duplicate 503 response in `src/server/index.ts:438` registered when no webhook.secret is configured at all (vs the "secret set but wrong bearer" path the initial AP9 fix touched in webhook-handler.ts). The two bodies had drifted already; the fallback stub was still emitting the old terse hint. Extract WEBHOOK_NOT_CONFIGURED_BODY as a shared constant exported from webhook-handler.ts so both routes render the same operator-facing text. Existing webhook-handler.test.ts regex coverage still guards the contract; the index.ts stub now imports the constant directly so it cannot drift without the regex failing too.
4 tasks
WZ
added a commit
that referenced
this pull request
Apr 25, 2026
…budget warn, webhook hint (#133) * fix(scan+discovery): four P1/P2 quick wins — terminal-emit, NaN logging, budget warn, webhook hint AP2 — src/workflows/discovery.ts: wrap the workflow body in try/finally so the caller always receives a terminal phase ("complete" / "complete-empty" / "complete-validation-failed" / "complete-failed") even when the validation step stalls or throws mid-flow. If validation fails, fall back to the unverified discovered services tagged with confidence "unverified" and a validationNotes explaining the fallback, so services.yaml is still written instead of being lost to a thrown promise. Covers the "declared likely fixed but never re-verified" gap from QA Issue #7. AP4 — src/server/anomaly-probe.ts: log at INFO when a rule scores NaN (empty vector, Loki datasource unwired, MCP timeout/error). The tick summary already counts probeErrors / queriesEmpty in aggregate; this per-rule line is what operators need to locate the specific broken rule. AP6 — src/server/anomaly-probe.ts: WARN at the end of task build when a tick generates more than 200 queries. Not a hard block; observability only so operators notice cardinality growth (many services × many per-service rules) before it floods the metrics backend. AP9 — src/server/webhook-handler.ts: expand the 503 hint from "Set webhook.secret in config.yaml and restart" to name the config section and explicitly mention restarting the server, so operators hitting this code path can act without a support loop. All four changes are covered by unit tests alongside the affected files. * refactor: address /review findings on quick-wins PR Follow-ups from /review on this branch: - discovery.ts: factor terminal phase strings into an exported TERMINAL_DISCOVERY_PHASES union/const so the WS bridge no longer relies on a `startsWith("complete")` string-prefix convention. Move the "complete-failed" assignment into an explicit outer catch (was an initializer default). Pass the Error object (not message) to the validation-failure WARN so the stack survives. Clarify the validationNotes text ("validation did not complete") so it reads correctly on the throw-after-start path. - ws-handler.ts: swap `phase.startsWith("complete")` for the shared TERMINAL_DISCOVERY_PHASES membership check. - anomaly-probe.ts: tighten AP6 threshold comment (drop arithmetic- wrong citation) and trim the AP4 NaN-log comment (remove self-contradicting abort paragraph). - tests: add a beforeEach reset for loggerCalls so future assertions don't inherit prior-test noise; drop the dead "query budget" predicate clause; add a boundary test at tasks.length === 200; add a kind="error" path test for the AP4 NaN log; assert the confidence="unverified" + validationNotes contract on the AP2 validation-throws fallback. * fix(webhook): apply AP9 hint text to the unconfigured-fallback route too QA turned up a duplicate 503 response in `src/server/index.ts:438` registered when no webhook.secret is configured at all (vs the "secret set but wrong bearer" path the initial AP9 fix touched in webhook-handler.ts). The two bodies had drifted already; the fallback stub was still emitting the old terse hint. Extract WEBHOOK_NOT_CONFIGURED_BODY as a shared constant exported from webhook-handler.ts so both routes render the same operator-facing text. Existing webhook-handler.test.ts regex coverage still guards the contract; the index.ts stub now imports the constant directly so it cannot drift without the regex failing too.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Batch of four low-risk items from docs/TODOS.md (P1 and P2 S/XS effort) landed together.
src/workflows/discovery.ts: wrap the workflow in try/finally so the caller always receives a terminal phase. A validation-step stall/throw now falls back to the unverified discovered services (taggedconfidence: "unverified"with a validationNotes explanation) instead of losing them to a thrown promise. Closes the "declared likely fixed but never re-verified" gap from QA Issue fix: investigation robustness — 6/10 to 10/10 conclusive RCA #7.src/server/anomaly-probe.ts: log at INFO when a rule scores NaN (empty vector, Loki datasource unwired, MCP timeout/error). Aggregate tick counts already existed; this adds the per-rule line operators need to find the specific broken rule.src/server/anomaly-probe.ts: WARN when a tick generates more than 200 queries. Observability only — not a hard block. Catches cardinality growth (services × per-service rules) before it floods the metrics backend.src/server/webhook-handler.ts: expand the 503 hint from"Set webhook.secret in config.yaml and restart"to name the config section and explicitly say "restart the server" so operators can act without a support loop.All four changes are covered by unit tests alongside the affected files.
Test plan
npx vitest run src/workflows/discovery.test.ts— 12/12 pass (adds 2 AP2 tests)npx vitest run src/server/anomaly-probe.test.ts— 49/49 pass (adds 1 AP4 + 2 AP6 tests)npx vitest run src/server/webhook-handler.test.ts— 11/11 pass (AP9 regex tightened)npx tsc --noEmit— cleannpx vitest run— 1649/1650 pass; the single failure is a known-flaky rate-limit timing test (rate-limit.test.ts:19) that passes in isolation and is unrelated to this diff (last touched PR feat(llm-logger): call-start + first-chunk logging, fix discover tool tracking #85)Notes
"complete"phase is emitted at the same logical point the function used to return.webhookis actually a top-level config section, not nested underscan.🤖 Generated with Claude Code