feat: CLI, discovery, RCA pipeline, panel images, and investigation hardening#1
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Prompt now instructs LLM to use list_panels before get_panel_image to get correct panel IDs - Prompt explicitly forbids base64/markdown image syntax in text - AgentCore strips any  from response text - Increased dev timeouts (llmCallMs 120s, toolExecutionMs 60s) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…or CLI
Config loader now supports ${VAR:-fallback} syntax. Slack env vars in
dev/config.yaml use :- so CLI mode works without Slack credentials.
Added GRAFANA_MCP_URL and CONFIG_PATH to dev/.env.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…config.yaml No more manual env var export needed. Just run: npm run cli Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bold, inline code, headers, numbered/bullet lists are now formatted with colors and styles instead of showing raw markdown syntax. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Auto-load dev/.env with override=true (no manual env export needed) - Add markdown renderer for CLI output (bold, code, headers, lists) - Add tool_choice=auto and debug logging to LLM client - Disable TLS verification for self-signed certs - Default to gpt-4o with OpenAI API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tool calls are now persisted in the message history as dimmed text, visible after the response so you can see what tools the agent used. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enables tool calling with both gpt-4o and vLLM-hosted gpt-oss-120b,
which has a known bug with Chat Completions tool calling (vLLM #22578)
but works correctly via the Responses API.
- Rewrite doChat() to use openai.responses.create() instead of
openai.chat.completions.create()
- Add conversion helpers: convertToResponsesInput, convertTools,
convertResponseFormat
- Replace OpenAI.Chat types with local ResponseFormat type
- Simplify Message.tool_calls to { id, name, args } shape
- Update agent core.ts + investigation.ts message building
- Update all LLM tests with Responses API mock shapes
- Add unit tests for convertToResponsesInput
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Derive image file extension from mimeType instead of hardcoding .png
- Make base64 stripping regex more specific (require ;base64, prefix)
- Log warnings for unhandled response content/output types in LLM client
- Use process.cwd() for dotenv path, make configurable via DOTENV_PATH
- Restrict env var default value regex to single-line matches
- Improve discovery prompt to mention searching dashboards first
- Use stable React keys in Markdown renderer instead of index-only
- Log errors from execFile("open") instead of silently swallowing
- Show concise "Completed N tool calls" instead of repeating tool list
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rewrite parseInline() with character-scanning parser for robust handling of unclosed markers and multiple bold/code sections - Document Message type's intentional divergence from OpenAI Chat types - Document convertResponseFormat() maps to Responses API text config - Add 15 tests for CLI pure functions: parseInline, formatRcaText, saveAndOpenImages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add TokenUsage type to LlmResponse so callers can track input/output tokens per LLM call. Wire an onTokenUsage callback through AgentTask and InvestigationAgent so CLI or other interfaces can accumulate and display token counts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire onTokenUsage callback into CLI App to accumulate input/output tokens per request. Display totals alongside tool call count in the status line, e.g. "7 tool calls · 1234 tokens (800 in / 434 out)". Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Detect consecutive pipe-delimited lines as table blocks, compute max column widths, and pad cells for aligned output. Separator rows (|---|---|) are stripped. Inline formatting (bold, code) is preserved within cells. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…utput Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… maxIterations Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t from intent classifier LLMs sometimes return non-breaking hyphens (U+2011) or en-dashes instead of ASCII hyphens, causing matchService to fail on names like "ingestion-server". Also removes the 60+ service name list injection from IntentClassifier — the LLM was unreliable at exact matching from huge lists. matchService fuzzy logic handles lookup instead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR merges several MVP features into the codebase: an Ink-based CLI interface, LLM-driven service discovery, richer RCA evidence (including panel screenshots), Slack image uploads, and a migration of the OpenAI client to the Responses API.
Changes:
- Added Ink/React CLI entrypoints and rendering utilities (markdown rendering, tool-call/token logging, image saving).
- Implemented DiscoveryAgent + config schema support and integrated optional discovery into startup flows.
- Extended RCA data types/prompts + MCP/agent pipelines to collect and deliver panel images, and switched the OpenAI client to the Responses API.
Reviewed changes
Copilot reviewed 39 out of 42 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tsconfig.json | Enables JSX compilation for Ink/React CLI components. |
| src/notifications/rca-blocks.test.ts | Updates RCA test fixture for new report fields. |
| src/mcp/client.ts | Changes tool call return type to include text + images. |
| src/mcp/client.test.ts | Updates tool-call tests and adds image-part coverage. |
| src/llm/openai.ts | Migrates OpenAI wrapper to Responses API and adds conversion helpers + usage tracking. |
| src/llm/openai.test.ts | Updates tests for Responses API output parsing and conversion helpers. |
| src/interfaces/slack.ts | Uploads tool-captured images to Slack threads; adds channelId plumbing. |
| src/interfaces/slack.test.ts | Adds/updates tests for Slack image uploads and new context shape. |
| src/interfaces/cli/cli-utils.test.ts | Adds unit tests for CLI markdown parsing, RCA formatting, and image saving behavior. |
| src/interfaces/cli/Markdown.tsx | Implements CLI markdown renderer (inline formatting + basic tables). |
| src/interfaces/cli/App.tsx | Implements Ink CLI REPL, tool-call logging, token usage display, RCA formatting, and image saving/opening. |
| src/index.ts | Integrates optional service discovery and uses merged services for scheduler/Slack bot. |
| src/discover.tsx | Adds an Ink-based “discover services and write config” entrypoint. |
| src/config/schema.ts | Adds discovery config schema + exported type. |
| src/config/schema.test.ts | Adds tests validating discovery schema defaults. |
| src/config/loader.ts | Adds support for ${VAR:-default} env var interpolation. |
| src/cli.tsx | Adds CLI entrypoint wiring MCP/LLM/agents/memory into Ink app. |
| src/agent/types.ts | Adds token/tool callbacks to AgentTask and images to AgentResult (+ ImageAttachment type). |
| src/agent/rca-types.ts | Extends RCA types with dashboard links, panel images, and richer log findings. |
| src/agent/rca-prompts.ts | Updates prompts + JSON schemas for enriched evidence and dashboard link extraction. |
| src/agent/rca-prompts.test.ts | Adds tests covering new RCA schema fields and prompt requirements. |
| src/agent/prompts.ts | Updates conversational system prompt to encourage panel image capture and forbid base64 in text. |
| src/agent/investigation.ts | Collects panel images during investigation phases; threads token usage reporting through phases. |
| src/agent/investigation.test.ts | Updates mocks/fixtures for ToolResult and new log findings fields. |
| src/agent/intent.ts | Adds service matching helper (including unicode hyphen normalization); updates classifier signature. |
| src/agent/intent.test.ts | Adds tests for matchService behavior. |
| src/agent/discovery.ts | Adds LLM-driven DiscoveryAgent that executes tool calls and returns ServiceConfig candidates. |
| src/agent/discovery.test.ts | Adds tests for discovery output parsing, exclusions, and token-usage callback. |
| src/agent/discovery-prompts.ts | Adds discovery prompt + schema and message builder. |
| src/agent/discovery-prompts.test.ts | Adds tests for discovery prompt/schema/message builder. |
| src/agent/core.ts | Collects tool-returned images, sanitizes tool outputs, strips base64 markdown from responses, and adds token/tool callbacks. |
| src/agent/core.test.ts | Updates mocks for ToolResult and adds tests for image capture + callback behavior. |
| package.json | Adds CLI/discover scripts and Ink/React/dotenv deps. |
| package-lock.json | Locks new dependency graph for CLI/Ink/React additions. |
| docs/plans/2026-03-02-rca-evidence-enrichment.md | Adds implementation plan for RCA evidence enrichment. |
| docs/plans/2026-03-02-rca-evidence-enrichment-design.md | Adds design doc for evidence enrichment scope/behavior. |
| docs/plans/2026-02-26-grafana-screenshots-plan.md | Adds implementation plan for Grafana screenshot delivery. |
| docs/plans/2026-02-26-grafana-screenshots-design.md | Adds design doc for screenshot pipeline and error handling. |
| docs/plans/2026-02-26-cli-mode-plan.md | Adds implementation plan for CLI mode. |
| docs/plans/2026-02-26-cli-mode-design.md | Adds design doc for CLI UX/architecture. |
| docs/architecture-overiew.md | Adds/updates architecture documentation (needs alignment with Responses API). |
| .gitignore | Ignores dev/ local config directory. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "dependencies": { | ||
| "@inkjs/ui": "^2.0.0", | ||
| "@modelcontextprotocol/sdk": "^1.0.0", | ||
| "@slack/bolt": "^3.22.0", | ||
| "dotenv": "^17.3.1", | ||
| "ink": "^6.8.0", | ||
| "node-cron": "^3.0.3", | ||
| "openai": "^4.77.0", | ||
| "pino": "^9.5.0", | ||
| "prom-client": "^15.1.3", | ||
| "react": "^19.2.4", | ||
| "yaml": "^2.7.0", | ||
| "zod": "^3.24.0" |
There was a problem hiding this comment.
ink-table is imported by the new CLI Markdown renderer (src/interfaces/cli/Markdown.tsx) but it is not listed in dependencies here. This will break fresh installs / CI builds that only rely on package.json. Add ink-table as a direct dependency (or remove the import/use).
| "cli": "NODE_TLS_REJECT_UNAUTHORIZED=0 NODE_NO_WARNINGS=1 tsx src/cli.tsx", | ||
| "discover": "NODE_TLS_REJECT_UNAUTHORIZED=0 NODE_NO_WARNINGS=1 tsx src/discover.tsx" |
There was a problem hiding this comment.
The cli/discover scripts disable TLS verification via NODE_TLS_REJECT_UNAUTHORIZED=0. That makes it easy to accidentally run these entrypoints with insecure HTTPS handling and can mask certificate issues. Consider removing this flag, or gating it behind an explicit opt-in env var (and documenting why it’s needed).
| "cli": "NODE_TLS_REJECT_UNAUTHORIZED=0 NODE_NO_WARNINGS=1 tsx src/cli.tsx", | |
| "discover": "NODE_TLS_REJECT_UNAUTHORIZED=0 NODE_NO_WARNINGS=1 tsx src/discover.tsx" | |
| "cli": "NODE_NO_WARNINGS=1 tsx src/cli.tsx", | |
| "discover": "NODE_NO_WARNINGS=1 tsx src/discover.tsx" |
| setToolCalls((prev) => { | ||
| const parts: string[] = []; | ||
| if (prev.length > 0) { | ||
| parts.push(`${prev.length} tool call${prev.length === 1 ? "" : "s"}`); | ||
| } | ||
| const { inputTokens, outputTokens } = tokenTotals.current; | ||
| if (inputTokens > 0 || outputTokens > 0) { | ||
| parts.push(`${inputTokens + outputTokens} tokens (${inputTokens} in / ${outputTokens} out)`); | ||
| } | ||
| if (parts.length > 0) { | ||
| addMessage({ id: randomUUID(), role: "toolcalls", content: parts.join(" · ") }); | ||
| } | ||
| return []; |
There was a problem hiding this comment.
Avoid side effects inside the setToolCalls state updater. The updater function should be pure, but it currently calls addMessage(...) while computing the summary, which can behave unpredictably with React batching/re-render behavior. Compute the summary outside the state setter (e.g., using a ref for the last toolCalls + token totals), then clear toolCalls in a separate setToolCalls([]) call.
| collectedImages.push({ | ||
| filename: `panel-${call.name}-${j}-${k}.png`, |
There was a problem hiding this comment.
Image attachments collected from MCP tool results always use a .png filename and don’t include any disambiguator for the outer i loop. This can (a) produce the wrong extension when mimeType isn’t image/png, and (b) overwrite earlier images if the same tool is called again in a later iteration. Derive the extension from mimeType (as AgentCore does) and include a unique suffix (e.g., i or a UUID) in the filename.
| collectedImages.push({ | |
| filename: `panel-${call.name}-${j}-${k}.png`, | |
| const ext = | |
| img.mimeType && img.mimeType.startsWith("image/") | |
| ? img.mimeType.substring("image/".length) | |
| : "bin"; | |
| collectedImages.push({ | |
| filename: `panel-${call.name}-${i}-${j}-${k}.${ext}`, |
| const images: ImageContent[] = parts | ||
| .filter((p) => p.type === "image") | ||
| .map((p) => ({ mimeType: p.mimeType ?? "image/png", data: p.data ?? "" })); | ||
| toolCallsTotal.inc({ tool: name, status: "success" }); |
There was a problem hiding this comment.
callTool() currently includes image parts even when data is missing, mapping them to an empty string. Downstream code decodes img.data as base64 and may end up uploading/handling empty buffers. Filter out image parts with missing/empty data (and optionally validate base64) so images only contains usable entries.
| await mcp.disconnect(); | ||
| setStatus("done"); | ||
| } catch (err) { | ||
| log(`Error: ${err instanceof Error ? err.message : String(err)}`); | ||
| setStatus("error"); |
There was a problem hiding this comment.
The MCP connection isn’t cleaned up on early cancellation or errors: if cancelled becomes true after mcp.connect() (or if any later step throws), mcp.disconnect() is skipped. Wrap the workflow in a try/finally (and/or track mcp in outer scope) so the MCP client is always disconnected, even on errors or when the component unmounts.
| await mcp.disconnect(); | |
| setStatus("done"); | |
| } catch (err) { | |
| log(`Error: ${err instanceof Error ? err.message : String(err)}`); | |
| setStatus("error"); | |
| setStatus("done"); | |
| } catch (err) { | |
| log(`Error: ${err instanceof Error ? err.message : String(err)}`); | |
| setStatus("error"); | |
| } finally { | |
| try { | |
| await mcp.disconnect(); | |
| } catch { | |
| // Ignore disconnect errors to avoid masking the original failure | |
| } |
| A thin wrapper around the `openai` SDK. The single method `chat(messages, tools)` calls the OpenAI chat completions API and returns a typed discriminated union: | ||
|
|
||
| - `{ type: "text", content: string }` — the LLM produced a final response | ||
| - `{ type: "tool_calls", calls: ToolCall[] }` — the LLM wants to call one or more tools | ||
|
|
||
| This shape is what the Agent Core loops on. The LLM client never loops itself — it makes exactly one API call per invocation. | ||
|
|
||
| Guards are in place for two failure modes: an empty `choices` array (content filtering) and malformed JSON in tool call arguments. |
There was a problem hiding this comment.
This doc section still describes the LLM client as calling the Chat Completions API and mentions an empty choices array, but src/llm/openai.ts has been switched to the Responses API (response.output, input_tokens, etc.). Update the documentation here to match the current implementation to avoid misleading readers.
| A thin wrapper around the `openai` SDK. The single method `chat(messages, tools)` calls the OpenAI chat completions API and returns a typed discriminated union: | |
| - `{ type: "text", content: string }` — the LLM produced a final response | |
| - `{ type: "tool_calls", calls: ToolCall[] }` — the LLM wants to call one or more tools | |
| This shape is what the Agent Core loops on. The LLM client never loops itself — it makes exactly one API call per invocation. | |
| Guards are in place for two failure modes: an empty `choices` array (content filtering) and malformed JSON in tool call arguments. | |
| A thin wrapper around the `openai` SDK. The single method `chat(messages, tools)` calls the OpenAI Responses API and returns a typed discriminated union: | |
| - `{ type: "text", content: string }` — the LLM produced a final response | |
| - `{ type: "tool_calls", calls: ToolCall[] }` — the LLM wants to call one or more tools | |
| This shape is what the Agent Core loops on. The LLM client never loops itself — it makes exactly one API call per invocation. | |
| Guards are in place for two failure modes: a missing or empty `response.output` (e.g. due to content filtering) and malformed JSON in tool call arguments. |
Integrates deterministic panel image capture, tool response truncation, iteration limit fix, and finish_reason detection from main into the feature/mvp branch with CLI, discovery, and RCA enrichment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Phase 1 (anomaly detection) was using maxIterations=20 from config, causing "Phase did not complete within 20 iterations" timeouts that killed the entire investigation including panel capture - All phases now capped at 8 iterations (6 tool-use + 2 forced-response) - Removed duplicate onToolCall invocation in runPhase that reported each tool call twice in the UI - Reduced MAX_TOOL_RESPONSE_CHARS from 4000 to 2000 to reduce token usage - Improved deterministic panel capture: sort dashboards/panels by service relevance, extract time range from anomaly context, add debug logging Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Cap phase iterations at 5 (3 tool-use + 2 forced-response), down from 8 - Limit tool calls per iteration to 3 to prevent context bloat - Add smart truncation for search_dashboards (uid+title only, cap 20) - Reduce MAX_TOOL_RESPONSE_CHARS from 2000 to 1500 - Trim verbose phase prompts (~70% shorter) — removes redundant instructions that cost tokens on every LLM round-trip - Target: ~30 tool calls, ~60K tokens (down from 89 calls, 206K tokens) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 2-iteration forced-response window left only 3 tool-use iterations out of 5, causing Phase 1 to time out before gathering enough data. Also fixed a bug where tool calls were still executed during the forced-response window (tools withheld but calls still dispatched). Now: 4 tool-use iterations + 1 forced-response = 5 total. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three issues fixed: 1. Panel images failed with "cannot unmarshal number into string": normalizeGrafanaTime was converting ISO dates to epoch ms numbers, but Grafana MCP Go server expects strings. Now returns string always. 2. Phases timing out despite having enough data: Added post-loop retry that tries up to 3 times with mock error responses when gpt-oss-120b keeps returning tool_calls despite empty tools array. Each retry adds tool error messages to convince the model to produce JSON. 3. Wasted iterations on list_datasources: Pre-fetch datasource UIDs once before phases start and inject them into all phase prompts. Saves 1-2 iterations per phase. Phase 1 bumped to 7 iterations (6 tool + 1 forced) since anomaly detection needs discovery + data querying. Verified: 23 tool calls, 88K tokens, 3 panel images captured. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…in RCA Panel capture now extracts dashboard/panel name hints from the user message (e.g. "Ingestion Log Rate in Ingestion monitor") and weights them 3x higher than service name tokens when ranking dashboards and panels. This ensures the correct panels are captured instead of unrelated ones. Also instructs the RCA synthesis prompt to avoid markdown tables (which render poorly in the terminal CLI) and use bullet lists instead. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of showing "No matching service found" error and stopping, unmatched investigation intents now fall through to the conversational agent which can use MCP tools to answer infrastructure queries directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: CLI, discovery, RCA pipeline, panel images, and investigation hardening
* refactor(scan): remove Status section from Settings tab Status display (next run / last run / ticking) and "Scan now" trigger now live in the Operation Desk view. Keeping them here duplicated the same data and forced a 10s polling interval that the Settings tab doesn't need. Removes the status fetch, the polling interval + its cleanup, the trigger handler + timers, and the Field / formatTimestamp helpers that were only used here. Addresses finding #2 from /design-review on Settings → Scan. * style(scan): plain-language copy on the Settings tab Rewrites every helper sentence to avoid internal jargon ("probe", "tick", "trip", "PromQL") that operators don't need to learn to read the settings page. "Cron expression" label becomes "Schedule" (the expression itself is still shown). Helper text is lifted from 10px mono to 12px regular so it's actually readable. The top-of-page blurb now leads with what the feature does for the user, not how it works internally. Addresses finding #3 from /design-review on Settings → Scan. * feat(scan): cron preset shortcuts on Settings tab Adds a row of pill buttons above the schedule input (Every 15 min, Hourly, Every 4 hours, Daily, Weekly) that fill the cron field in one click. The pill matching the current value is highlighted so the user can see which preset their schedule matches, if any. Typing a custom expression still works and simply leaves no pill highlighted. Addresses finding #4 from /design-review on Settings → Scan. * feat(scan): default timezone to the browser's timezone When the user hasn't saved a GUI timezone yet (settings.source.timezone is still "config", i.e. the UTC default from config.yaml), pre-fill the form with Intl.DateTimeFormat().resolvedOptions().timeZone. Marks the form dirty so the Save button lights up and the user can persist it with one click. Users who explicitly want UTC can clear the field and save; a user who has already saved a specific timezone keeps theirs. Addresses finding #5 from /design-review on Settings → Scan. * style(scan): tidy per-rule editor layout Collapses the three header actions (move up, move down, remove) from three differently-styled bordered buttons into a single horizontal icon strip with subtle hover backgrounds. Each icon is a 28px square, no per-button border, destructive hover only on the × button. The rule card now has one clean header row instead of two ragged columns. Other touch-ups in the same editor: - "Consecutive ticks" label → "Scans in a row" (matches the vocabulary used elsewhere on the tab) - Drops the redundant helper under the ticks input (label is clear) - Rewrites the query helper to drop "Probe" jargon - Remove-confirm dialog uses plain language instead of "hysteresis state" Addresses finding #1 from /design-review on Settings → Scan. * style(design): dark-theme Add recipient modal Align the email recipient editor modal with the dark-theme design tokens used by EmailRecipientsSection. Swap the white card for bg-card, adopt the mono uppercase label/title pattern, use bg-background/40 inputs with border-border/40, accent-primary radios/checkboxes, and replace raw button elements with the shared Button (outline Cancel, primary Save). Backdrop gains backdrop-blur and proper dialog ARIA. * chore: bump version and changelog (v0.1.2.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…#189) * feat(webhooks): backend foundations for Settings → Alert Webhooks tab Three changes that PR 2 (the UI) depends on, none of them user-visible on their own. 1. Extract `processFiringAlert()` from `createWebhookHandler` so the post-auth pipeline (service match → dedup → run) can be driven by internal callers without synthesizing a fake Request/Response. The HTTP handler keeps its bearer auth + response shaping above this; the shared core emits `alert_received` events on every path it takes. 2. Emit `alert_received` events on dedup-skipped and concurrency-skipped paths in addition to accepted alerts. Pre-fix the eventLog only recorded accepts, which made the upcoming activity log silent on exactly the cases an operator needs to see ("Grafana keeps retrying but I don't see anything come through"). Every event now carries `meta.deliveryStatus` (`investigated` / `deduplicated` / `concurrency_skipped` / `no_service_match`) so the UI can render the outcome correctly. Existing accepted-alert assertions still pass. 3. Add `min(16)` to `WebhookSchema.tokens` value so the masking format (`first4…last4`) can't ever leak the whole token. Legacy `webhook.secret` keeps no floor for back-compat; sub-16 secrets degrade to a placeholder string in the GUI rather than masking. Plus two new read-only endpoints, both stack-scoped via the existing X-Stack-Id middleware: - `GET /api/webhooks/info` — masked tokens, severity → template map, dedup window, max concurrent, AND the service-label contract (`["service","service_name","app","job","deployment"]`). Surfacing the contract kills the #1 self-service failure mode (operator labels alerts with `team` instead of `service` and gets a silent 422). - `GET /api/webhooks/recent` — last 20 webhook deliveries from the persistent `events` DB table, filtered by `kind=alert_received` and `meta.source=alertmanager`. DB-backed (not the 200-entry in-memory ring) so the activity log doesn't drop entries on busy stacks. Tests: - `webhook-handler.test.ts` — 4 new cases for the dedup / concurrency / no_service_match emission paths, plus deliveryStatus assertion on the existing accepted-alert test. - `routes.webhooks.test.ts` (new) — masking edge cases (long token, legacy secret, short legacy secret), stack scoping, contract surfacing, recent filtering + ordering + cap + cross-source isolation. Tab UI, test endpoint, reveal-token endpoint, snippet generator, and loopback test ship in PR 2 alongside the AlertWebhooksTab component. * fix(webhooks): preserve delivery audit events
The gap-#1 change over-corrected: gating on loopOutcome made the button appear on clean-confirm / no-rule-out investigations, where clicking dead-ends in 'the loop ruled nothing out … nothing to re-examine'. Deep mode re-examines RULED-OUT causes, so the honest gate is ruledOut.length > 0 — investigations with nothing to resurrect simply don't offer it (no dead-end button). (The earlier 'no button at all' was viewing pre-loop investigations, which correctly have no rule-outs either.)
…-out causes (#231) * feat(workflows): deep-mode orchestrator core (Step 3) Pure runDeepMode: re-examines the hypothesis loop's top ruled-out hypotheses with deeper injected evidence + the same discriminating keystone, resurrecting any the loop dismissed prematurely. Read-only by construction; gatherDeepEvidence injected so the control flow is unit-testable without LLM/MCP. 5 tests, tsc clean. Trigger/runner wiring + report/UI surface follow in subsequent commits. * feat(server): deep-mode on-demand trigger wiring (Step 3) deep_mode_investigate WS message (distinct from the deep_investigate chat msg) -> handler loads a completed investigation's stored hypotheses+ruledOut, rejoins predictions (matchRuledOutToPredictions), runs the deeper read-only re-query via deepModeReexamine (reuses investigation providers+model + hypothesis-requery), persists deepMode onto the report, streams progress. - rca-types: DeepModeReexamination/DeepModeReport + report.deepMode field - agents.ts: deepModeReexamine closure on createMastraAdapters - ws-types: deep_mode_investigate client msg + deep_mode:* server msgs - sanitize: DeepModeInvestigateMessageSchema; demo-mode blocks it - ws-handler: validation + dispatch + handleDeepModeInvestigate (read-only) - 3 new matchRuledOutToPredictions tests; tsc + deep-mode/sanitize/ws-handler green Next: deep-from-start flag (workflow tail) + React UI. * feat(web): deep-mode UI — Deep investigate button + report section (Step 3) - 'Deep investigate' action on a completed investigation (shows only when the loop left ruled-out causes); sends deep_mode_investigate, reflects running/error state. - InvestigationPane handles deep_mode:started/complete/error → updates the report in place. - RcaReport renders a 'Deep Mode' section: resurrected-candidate (warning, a dismissed cause came back) vs rule-outs-confirmed (reassurance), with the per-hypothesis prior→deep verdict. - 3 render tests; tsc + clean web build green. Completes the on-demand trigger path end-to-end. Remaining: deep-from-start flag. * feat(web): surface hypothesis-loop status (N>1) in the GUI Two operator-visible signals for whether the synthesis loop is active: 1. Settings → LLM: read-only 'Hypothesis Loop' indicator — 'ON · N rounds' (teal) vs 'OFF · single-pass'. Surfaced via the existing /api/stacks/:id/llm/settings view (synthesisLoopRounds added). Deployment- level proof without firing an investigation. 2. Investigation metadata panel: a 'loop' MetaRow (outcome + ranked count) + a 'deep mode' MetaRow, shown only when those ran — per-investigation proof that synthesis wasn't single-pass. DESIGN.md-consistent (mono labels, restrained teal, rounded-sm). tsc + llm-settings/SettingsPage tests + clean web build green. * chore(release): bump VERSION 0.4.4.3 → 0.4.4.4 (deep-mode preview) Preview build off feat/deep-mode so deep mode + the N>1 GUI indicators can be deployed/verified before #231 merges. Distinct tag from the released 0.4.4.3 (clean-main build). Not tagged :latest. * feat(server): stream deep-mode progress as Console thinking Deep mode jumped straight from button to result — the deep_mode:tool_call events were emitted but unrendered. Now route step-by-step progress through the chat reasoning channel (chat:stream_start → chat:stream_delta reasoning → chat:stream_end) the deep_investigate follow-up already uses, so it shows as a live 'thinking' block in the Console + a plain-language summary message. - deepModeReexamine: onProgress callback — announces each hypothesis as the loop reaches it ('↪ testing: X'), each re-query tool call, and the per- hypothesis verdict (resurrected vs still-ruled-out). - handleDeepModeInvestigate: streams onProgress/onToolCall as reasoning deltas; ends with an outcome summary. deep_mode:started/complete/error still drive the button + report update. - ws-types: dropped the now-unused deep_mode:tool_call. Reuses ChatPane's existing reasoning display — no new client rendering. tsc + ws-handler tests + clean web build green. * fix(web): show Deep investigate whenever the loop ran (not only on rule-outs) Gap: the button gated on ruledOut.length>0, so a cleanly-confirmed investigation (0 rule-outs) — or any report where rule-outs weren't persisted — showed no button at all. Now gate on loopOutcome present (the Step 2 loop ran → deep mode is applicable). The handler treats 'loop ran but ruled nothing out' as a calm Console message, not a red error; genuine single-pass (no hypotheses) stays an error. * feat(workflows): live 'Testing hypotheses' progress in the Step 2 loop Gap: the Approach-A draft showed a live Testing-hypotheses feed (rank → test leader → rule-out) but the loop ran silently inside Synthesis — only the final report showed rule-outs. Now runHypothesisLoop emits onRound progress events (ranking / testing / verdict); synthesis maps them to onIteration under the Synthesis phase, so the PhaseStepper streams them live while investigating — e.g. 'Testing H1/3: …', 'Ruled out H1: … (absent)', 'Confirmed H2: …'. Reuses the existing iteration-event rendering — no UI change. onRound is optional → loop stays pure; existing tests unaffected. +1 progress test. tsc + loop/deep-mode/investigation tests green. * feat(server): deep mode digs into a wider window than the loop Closes the 'deeper' gap: deep mode re-queried the same incident window the loop used, so it was 're-examine', not 'escalate'. Now widenTimeRange expands the window each side by max(duration, 30min) (~3x, centered) for the deep re-query, surfacing precursors/aftermath the narrow synthesis window missed. - widenTimeRange (pure, defensive: only widens parseable ISO/epoch; passes Grafana relative ranges + undefined through unchanged). 4 tests. - deepModeReexamine queries the widened window but keeps the ORIGINAL incident onset as the change-in-window anchor. Note: cross-service following (the other 'deeper' dimension) still pending — needs dependency topology. tsc + deep-mode tests green. * feat: deep-from-start — auto-chain deep mode after interactive investigations The 2nd of the two trigger paths. New config flag agent.deepModeOnComplete: when on, an interactive (chat-dispatched) investigation that ran the loop and ruled causes out automatically chains the deep re-examination on completion — no second click; the result streams + lands in one pass. - config: agent.deepModeOnComplete (Zod, default off) - ws-handler: extracted runDeepModeStreamed() (shared by the on-demand trigger + the new chain); chat dispatch captures the report and chains when enabled - Settings → LLM: read-only 'Deep Mode — auto' indicator (ON / OFF · on-demand) alongside the Hypothesis Loop one, via the llm/settings view Scope: wired for the interactive path (where streaming matters); headless (webhook/poller) chaining is a follow-up. tsc + llm-settings/ws-handler/config tests + clean web build green. * fix(web): only offer Deep investigate when the loop ruled causes out The gap-#1 change over-corrected: gating on loopOutcome made the button appear on clean-confirm / no-rule-out investigations, where clicking dead-ends in 'the loop ruled nothing out … nothing to re-examine'. Deep mode re-examines RULED-OUT causes, so the honest gate is ruledOut.length > 0 — investigations with nothing to resurrect simply don't offer it (no dead-end button). (The earlier 'no button at all' was viewing pre-loop investigations, which correctly have no rule-outs either.) * feat: broaden deep mode — resurrect rule-outs OR refute the confirmed cause Deep mode was only useful when the loop ruled causes out, so the button kept being absent (pre-loop / clean confirms) or dead-ending. Now it re-examines the loop's conclusion by STANDING and always does something useful: - ruled-out causes -> try to RESURRECT (deeper evidence now satisfies?) - the confirmed cause (no rule-outs) -> try to REFUTE (deeper evidence drops support?) - deep-mode.ts: runDeepMode takes ReexamineTarget[] (priorStanding); flip logic per standing; outcomes resurrected-candidate | confirmation-shaken | holds | nothing-to-examine. buildReexamineTargets() picks resurrect vs refute mode. - rca-types/agents/ws-handler/RcaReport/InvestigationPane threaded through; button gates on loopOutcome again (always useful now); summaries + metadata row + report section handle shaken/holds; deep-from-start chains on any loop run. - tests rewritten for the standing API + refute mode + buildReexamineTargets. tsc + deep-mode/RcaReport/ws-handler/investigation tests + clean build green. * feat(web): dedicated structured deep-mode stream (colored, grouped, expanded) The deep-mode progress was piped through the chat 'thinking' block — collapsed by default, plain mono, no grouping. Replaced with a dedicated agent stream matching the design: - structured AgentStreamEvent (verb/target/status/indent) replaces plain text; deepModeReexamine emits onStep events, ws-handler streams deep_mode:step + a final stats footer (examined/tools/resurrected/shaken/elapsed). - new DeepModeStream component: status icons (◉/✓/✗), coral verbs, info-blue query targets, indented sub-steps with a left rail, always expanded. - InvestigationPane accumulates steps + renders it above the report. - dropped the chat:stream routing for deep mode (no more collapsed plain block). tsc + ws-handler/deep-mode tests + clean web build green. * feat: gate deep mode behind deepModeEnabled (default off) + plain-English copy Deep mode (Step 3) is hidden from users until the autonomous orchestrator lands. Today's bounded re-examination only re-judges the existing RCA's hypotheses; it can't investigate freely for the real cause. Until it can, expose it only behind config.agent.deepModeEnabled (default false): - server injects window.__DEEP_MODE_ENABLED__ only when enabled; the 'Deep investigate' button is gated on it (hidden by default) - deep_mode_investigate WS handler + deep-from-start chain reject when off - reworded the deep-mode stream + report copy to plain English (lead-with-takeaway: 'Probably not the cause: … — the evidence that would confirm it isn't there') - schema test locks the ships-OFF default
Summary
Full-stack MVP of the dops-assistant DevOps investigation agent:
DiscoveryAgentthat probes Grafana to find monitored servicesTest plan
npx vitest run— 248 tests passnpx tsc --noEmit— types cleannpm run investigate -- --runs 1 --query "..."— end-to-end investigation produces relevant panel images🤖 Generated with Claude Code