Skip to content

feat: CLI, discovery, RCA pipeline, panel images, and investigation hardening#1

Merged
WZ merged 53 commits into
mainfrom
feature/mvp
Mar 5, 2026
Merged

feat: CLI, discovery, RCA pipeline, panel images, and investigation hardening#1
WZ merged 53 commits into
mainfrom
feature/mvp

Conversation

@WZ
Copy link
Copy Markdown
Owner

@WZ WZ commented Mar 3, 2026

Summary

Full-stack MVP of the dops-assistant DevOps investigation agent:

  • CLI interface — Ink React terminal UI with markdown rendering, input history, token usage display, and panel image viewing
  • Service discovery — LLM-driven DiscoveryAgent that probes Grafana to find monitored services
  • 5-phase RCA pipeline — Anomaly detection → metric/log/infra analysis (parallel) → synthesis, with structured JSON outputs
  • Panel image capture — Deterministic capture from Grafana dashboards using user query hints (dashboard/panel name extraction weighted 3x) for relevance
  • Investigation hardening — Phase iteration caps, forced-response windows, post-loop retry (3x with mock error injection), pre-fetched datasource UIDs
  • Token optimization — Smart tool response truncation, 3 tool calls/iteration cap, trimmed prompts (~60% reduction from 206K to ~88K tokens)
  • MCP improvements — HTTP transport support, Grafana time range normalization (epoch ms strings for Go server), image extraction from tool results
  • LLM client — Switched from Chat Completions to OpenAI Responses API with token usage surfacing
  • Slack integration — Chart image uploads, RCA report formatting
  • Output quality — RCA synthesis avoids markdown tables (terminal-friendly), correct panel images based on user query context

Test plan

  • npx vitest run — 248 tests pass
  • npx tsc --noEmit — types clean
  • npm run investigate -- --runs 1 --query "..." — end-to-end investigation produces relevant panel images
  • Verify panel images match user-specified dashboard/panel when query includes hints like "(Panel Name in Dashboard Name)"
  • Verify RCA report renders cleanly in terminal without broken table formatting

🤖 Generated with Claude Code

Wilson Li and others added 30 commits February 26, 2026 00:52
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Prompt now instructs LLM to use list_panels before get_panel_image
  to get correct panel IDs
- Prompt explicitly forbids base64/markdown image syntax in text
- AgentCore strips any ![...](data:image/...) from response text
- Increased dev timeouts (llmCallMs 120s, toolExecutionMs 60s)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…or CLI

Config loader now supports ${VAR:-fallback} syntax. Slack env vars in
dev/config.yaml use :- so CLI mode works without Slack credentials.
Added GRAFANA_MCP_URL and CONFIG_PATH to dev/.env.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…config.yaml

No more manual env var export needed. Just run: npm run cli

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Bold, inline code, headers, numbered/bullet lists are now formatted
with colors and styles instead of showing raw markdown syntax.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Auto-load dev/.env with override=true (no manual env export needed)
- Add markdown renderer for CLI output (bold, code, headers, lists)
- Add tool_choice=auto and debug logging to LLM client
- Disable TLS verification for self-signed certs
- Default to gpt-4o with OpenAI API

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Tool calls are now persisted in the message history as dimmed text,
visible after the response so you can see what tools the agent used.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Enables tool calling with both gpt-4o and vLLM-hosted gpt-oss-120b,
which has a known bug with Chat Completions tool calling (vLLM #22578)
but works correctly via the Responses API.

- Rewrite doChat() to use openai.responses.create() instead of
  openai.chat.completions.create()
- Add conversion helpers: convertToResponsesInput, convertTools,
  convertResponseFormat
- Replace OpenAI.Chat types with local ResponseFormat type
- Simplify Message.tool_calls to { id, name, args } shape
- Update agent core.ts + investigation.ts message building
- Update all LLM tests with Responses API mock shapes
- Add unit tests for convertToResponsesInput

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Derive image file extension from mimeType instead of hardcoding .png
- Make base64 stripping regex more specific (require ;base64, prefix)
- Log warnings for unhandled response content/output types in LLM client
- Use process.cwd() for dotenv path, make configurable via DOTENV_PATH
- Restrict env var default value regex to single-line matches
- Improve discovery prompt to mention searching dashboards first
- Use stable React keys in Markdown renderer instead of index-only
- Log errors from execFile("open") instead of silently swallowing
- Show concise "Completed N tool calls" instead of repeating tool list

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rewrite parseInline() with character-scanning parser for robust
  handling of unclosed markers and multiple bold/code sections
- Document Message type's intentional divergence from OpenAI Chat types
- Document convertResponseFormat() maps to Responses API text config
- Add 15 tests for CLI pure functions: parseInline, formatRcaText,
  saveAndOpenImages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add TokenUsage type to LlmResponse so callers can track input/output
tokens per LLM call. Wire an onTokenUsage callback through AgentTask
and InvestigationAgent so CLI or other interfaces can accumulate and
display token counts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire onTokenUsage callback into CLI App to accumulate input/output
tokens per request. Display totals alongside tool call count in the
status line, e.g. "7 tool calls · 1234 tokens (800 in / 434 out)".

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Detect consecutive pipe-delimited lines as table blocks, compute
max column widths, and pad cells for aligned output. Separator
rows (|---|---|) are stripped. Inline formatting (bold, code) is
preserved within cells.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wilson Li and others added 5 commits March 2, 2026 23:22
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…utput

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… maxIterations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t from intent classifier

LLMs sometimes return non-breaking hyphens (U+2011) or en-dashes instead of
ASCII hyphens, causing matchService to fail on names like "ingestion-server".

Also removes the 60+ service name list injection from IntentClassifier — the
LLM was unreliable at exact matching from huge lists. matchService fuzzy logic
handles lookup instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 3, 2026 22:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR merges several MVP features into the codebase: an Ink-based CLI interface, LLM-driven service discovery, richer RCA evidence (including panel screenshots), Slack image uploads, and a migration of the OpenAI client to the Responses API.

Changes:

  • Added Ink/React CLI entrypoints and rendering utilities (markdown rendering, tool-call/token logging, image saving).
  • Implemented DiscoveryAgent + config schema support and integrated optional discovery into startup flows.
  • Extended RCA data types/prompts + MCP/agent pipelines to collect and deliver panel images, and switched the OpenAI client to the Responses API.

Reviewed changes

Copilot reviewed 39 out of 42 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tsconfig.json Enables JSX compilation for Ink/React CLI components.
src/notifications/rca-blocks.test.ts Updates RCA test fixture for new report fields.
src/mcp/client.ts Changes tool call return type to include text + images.
src/mcp/client.test.ts Updates tool-call tests and adds image-part coverage.
src/llm/openai.ts Migrates OpenAI wrapper to Responses API and adds conversion helpers + usage tracking.
src/llm/openai.test.ts Updates tests for Responses API output parsing and conversion helpers.
src/interfaces/slack.ts Uploads tool-captured images to Slack threads; adds channelId plumbing.
src/interfaces/slack.test.ts Adds/updates tests for Slack image uploads and new context shape.
src/interfaces/cli/cli-utils.test.ts Adds unit tests for CLI markdown parsing, RCA formatting, and image saving behavior.
src/interfaces/cli/Markdown.tsx Implements CLI markdown renderer (inline formatting + basic tables).
src/interfaces/cli/App.tsx Implements Ink CLI REPL, tool-call logging, token usage display, RCA formatting, and image saving/opening.
src/index.ts Integrates optional service discovery and uses merged services for scheduler/Slack bot.
src/discover.tsx Adds an Ink-based “discover services and write config” entrypoint.
src/config/schema.ts Adds discovery config schema + exported type.
src/config/schema.test.ts Adds tests validating discovery schema defaults.
src/config/loader.ts Adds support for ${VAR:-default} env var interpolation.
src/cli.tsx Adds CLI entrypoint wiring MCP/LLM/agents/memory into Ink app.
src/agent/types.ts Adds token/tool callbacks to AgentTask and images to AgentResult (+ ImageAttachment type).
src/agent/rca-types.ts Extends RCA types with dashboard links, panel images, and richer log findings.
src/agent/rca-prompts.ts Updates prompts + JSON schemas for enriched evidence and dashboard link extraction.
src/agent/rca-prompts.test.ts Adds tests covering new RCA schema fields and prompt requirements.
src/agent/prompts.ts Updates conversational system prompt to encourage panel image capture and forbid base64 in text.
src/agent/investigation.ts Collects panel images during investigation phases; threads token usage reporting through phases.
src/agent/investigation.test.ts Updates mocks/fixtures for ToolResult and new log findings fields.
src/agent/intent.ts Adds service matching helper (including unicode hyphen normalization); updates classifier signature.
src/agent/intent.test.ts Adds tests for matchService behavior.
src/agent/discovery.ts Adds LLM-driven DiscoveryAgent that executes tool calls and returns ServiceConfig candidates.
src/agent/discovery.test.ts Adds tests for discovery output parsing, exclusions, and token-usage callback.
src/agent/discovery-prompts.ts Adds discovery prompt + schema and message builder.
src/agent/discovery-prompts.test.ts Adds tests for discovery prompt/schema/message builder.
src/agent/core.ts Collects tool-returned images, sanitizes tool outputs, strips base64 markdown from responses, and adds token/tool callbacks.
src/agent/core.test.ts Updates mocks for ToolResult and adds tests for image capture + callback behavior.
package.json Adds CLI/discover scripts and Ink/React/dotenv deps.
package-lock.json Locks new dependency graph for CLI/Ink/React additions.
docs/plans/2026-03-02-rca-evidence-enrichment.md Adds implementation plan for RCA evidence enrichment.
docs/plans/2026-03-02-rca-evidence-enrichment-design.md Adds design doc for evidence enrichment scope/behavior.
docs/plans/2026-02-26-grafana-screenshots-plan.md Adds implementation plan for Grafana screenshot delivery.
docs/plans/2026-02-26-grafana-screenshots-design.md Adds design doc for screenshot pipeline and error handling.
docs/plans/2026-02-26-cli-mode-plan.md Adds implementation plan for CLI mode.
docs/plans/2026-02-26-cli-mode-design.md Adds design doc for CLI UX/architecture.
docs/architecture-overiew.md Adds/updates architecture documentation (needs alignment with Responses API).
.gitignore Ignores dev/ local config directory.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread package.json
Comment on lines 16 to 28
"dependencies": {
"@inkjs/ui": "^2.0.0",
"@modelcontextprotocol/sdk": "^1.0.0",
"@slack/bolt": "^3.22.0",
"dotenv": "^17.3.1",
"ink": "^6.8.0",
"node-cron": "^3.0.3",
"openai": "^4.77.0",
"pino": "^9.5.0",
"prom-client": "^15.1.3",
"react": "^19.2.4",
"yaml": "^2.7.0",
"zod": "^3.24.0"
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ink-table is imported by the new CLI Markdown renderer (src/interfaces/cli/Markdown.tsx) but it is not listed in dependencies here. This will break fresh installs / CI builds that only rely on package.json. Add ink-table as a direct dependency (or remove the import/use).

Copilot uses AI. Check for mistakes.
Comment thread package.json Outdated
Comment on lines +13 to +14
"cli": "NODE_TLS_REJECT_UNAUTHORIZED=0 NODE_NO_WARNINGS=1 tsx src/cli.tsx",
"discover": "NODE_TLS_REJECT_UNAUTHORIZED=0 NODE_NO_WARNINGS=1 tsx src/discover.tsx"
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cli/discover scripts disable TLS verification via NODE_TLS_REJECT_UNAUTHORIZED=0. That makes it easy to accidentally run these entrypoints with insecure HTTPS handling and can mask certificate issues. Consider removing this flag, or gating it behind an explicit opt-in env var (and documenting why it’s needed).

Suggested change
"cli": "NODE_TLS_REJECT_UNAUTHORIZED=0 NODE_NO_WARNINGS=1 tsx src/cli.tsx",
"discover": "NODE_TLS_REJECT_UNAUTHORIZED=0 NODE_NO_WARNINGS=1 tsx src/discover.tsx"
"cli": "NODE_NO_WARNINGS=1 tsx src/cli.tsx",
"discover": "NODE_NO_WARNINGS=1 tsx src/discover.tsx"

Copilot uses AI. Check for mistakes.
Comment on lines +241 to +253
setToolCalls((prev) => {
const parts: string[] = [];
if (prev.length > 0) {
parts.push(`${prev.length} tool call${prev.length === 1 ? "" : "s"}`);
}
const { inputTokens, outputTokens } = tokenTotals.current;
if (inputTokens > 0 || outputTokens > 0) {
parts.push(`${inputTokens + outputTokens} tokens (${inputTokens} in / ${outputTokens} out)`);
}
if (parts.length > 0) {
addMessage({ id: randomUUID(), role: "toolcalls", content: parts.join(" · ") });
}
return [];
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid side effects inside the setToolCalls state updater. The updater function should be pure, but it currently calls addMessage(...) while computing the summary, which can behave unpredictably with React batching/re-render behavior. Compute the summary outside the state setter (e.g., using a ref for the last toolCalls + token totals), then clear toolCalls in a separate setToolCalls([]) call.

Copilot uses AI. Check for mistakes.
Comment thread src/agent/investigation.ts Outdated
Comment on lines +175 to +176
collectedImages.push({
filename: `panel-${call.name}-${j}-${k}.png`,
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image attachments collected from MCP tool results always use a .png filename and don’t include any disambiguator for the outer i loop. This can (a) produce the wrong extension when mimeType isn’t image/png, and (b) overwrite earlier images if the same tool is called again in a later iteration. Derive the extension from mimeType (as AgentCore does) and include a unique suffix (e.g., i or a UUID) in the filename.

Suggested change
collectedImages.push({
filename: `panel-${call.name}-${j}-${k}.png`,
const ext =
img.mimeType && img.mimeType.startsWith("image/")
? img.mimeType.substring("image/".length)
: "bin";
collectedImages.push({
filename: `panel-${call.name}-${i}-${j}-${k}.${ext}`,

Copilot uses AI. Check for mistakes.
Comment thread src/mcp/client.ts Outdated
Comment on lines 120 to 123
const images: ImageContent[] = parts
.filter((p) => p.type === "image")
.map((p) => ({ mimeType: p.mimeType ?? "image/png", data: p.data ?? "" }));
toolCallsTotal.inc({ tool: name, status: "success" });
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callTool() currently includes image parts even when data is missing, mapping them to an empty string. Downstream code decodes img.data as base64 and may end up uploading/handling empty buffers. Filter out image parts with missing/empty data (and optionally validate base64) so images only contains usable entries.

Copilot uses AI. Check for mistakes.
Comment thread src/discover.tsx
Comment on lines +90 to +94
await mcp.disconnect();
setStatus("done");
} catch (err) {
log(`Error: ${err instanceof Error ? err.message : String(err)}`);
setStatus("error");
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MCP connection isn’t cleaned up on early cancellation or errors: if cancelled becomes true after mcp.connect() (or if any later step throws), mcp.disconnect() is skipped. Wrap the workflow in a try/finally (and/or track mcp in outer scope) so the MCP client is always disconnected, even on errors or when the component unmounts.

Suggested change
await mcp.disconnect();
setStatus("done");
} catch (err) {
log(`Error: ${err instanceof Error ? err.message : String(err)}`);
setStatus("error");
setStatus("done");
} catch (err) {
log(`Error: ${err instanceof Error ? err.message : String(err)}`);
setStatus("error");
} finally {
try {
await mcp.disconnect();
} catch {
// Ignore disconnect errors to avoid masking the original failure
}

Copilot uses AI. Check for mistakes.
Comment on lines +56 to +63
A thin wrapper around the `openai` SDK. The single method `chat(messages, tools)` calls the OpenAI chat completions API and returns a typed discriminated union:

- `{ type: "text", content: string }` — the LLM produced a final response
- `{ type: "tool_calls", calls: ToolCall[] }` — the LLM wants to call one or more tools

This shape is what the Agent Core loops on. The LLM client never loops itself — it makes exactly one API call per invocation.

Guards are in place for two failure modes: an empty `choices` array (content filtering) and malformed JSON in tool call arguments.
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc section still describes the LLM client as calling the Chat Completions API and mentions an empty choices array, but src/llm/openai.ts has been switched to the Responses API (response.output, input_tokens, etc.). Update the documentation here to match the current implementation to avoid misleading readers.

Suggested change
A thin wrapper around the `openai` SDK. The single method `chat(messages, tools)` calls the OpenAI chat completions API and returns a typed discriminated union:
- `{ type: "text", content: string }` — the LLM produced a final response
- `{ type: "tool_calls", calls: ToolCall[] }` — the LLM wants to call one or more tools
This shape is what the Agent Core loops on. The LLM client never loops itself — it makes exactly one API call per invocation.
Guards are in place for two failure modes: an empty `choices` array (content filtering) and malformed JSON in tool call arguments.
A thin wrapper around the `openai` SDK. The single method `chat(messages, tools)` calls the OpenAI Responses API and returns a typed discriminated union:
- `{ type: "text", content: string }` — the LLM produced a final response
- `{ type: "tool_calls", calls: ToolCall[] }` — the LLM wants to call one or more tools
This shape is what the Agent Core loops on. The LLM client never loops itself — it makes exactly one API call per invocation.
Guards are in place for two failure modes: a missing or empty `response.output` (e.g. due to content filtering) and malformed JSON in tool call arguments.

Copilot uses AI. Check for mistakes.
Wilson Li and others added 2 commits March 3, 2026 14:56
Integrates deterministic panel image capture, tool response truncation,
iteration limit fix, and finish_reason detection from main into the
feature/mvp branch with CLI, discovery, and RCA enrichment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Phase 1 (anomaly detection) was using maxIterations=20 from config,
  causing "Phase did not complete within 20 iterations" timeouts that
  killed the entire investigation including panel capture
- All phases now capped at 8 iterations (6 tool-use + 2 forced-response)
- Removed duplicate onToolCall invocation in runPhase that reported
  each tool call twice in the UI
- Reduced MAX_TOOL_RESPONSE_CHARS from 4000 to 2000 to reduce token usage
- Improved deterministic panel capture: sort dashboards/panels by service
  relevance, extract time range from anomaly context, add debug logging

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@WZ WZ changed the title Merge feature/mvp: CLI, discovery, RCA evidence, panel images feat: CLI, discovery, RCA pipeline, panel images, and investigation hardening Mar 3, 2026
Wilson Li and others added 5 commits March 3, 2026 16:04
- Cap phase iterations at 5 (3 tool-use + 2 forced-response), down from 8
- Limit tool calls per iteration to 3 to prevent context bloat
- Add smart truncation for search_dashboards (uid+title only, cap 20)
- Reduce MAX_TOOL_RESPONSE_CHARS from 2000 to 1500
- Trim verbose phase prompts (~70% shorter) — removes redundant
  instructions that cost tokens on every LLM round-trip
- Target: ~30 tool calls, ~60K tokens (down from 89 calls, 206K tokens)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The 2-iteration forced-response window left only 3 tool-use iterations
out of 5, causing Phase 1 to time out before gathering enough data.
Also fixed a bug where tool calls were still executed during the
forced-response window (tools withheld but calls still dispatched).

Now: 4 tool-use iterations + 1 forced-response = 5 total.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three issues fixed:

1. Panel images failed with "cannot unmarshal number into string":
   normalizeGrafanaTime was converting ISO dates to epoch ms numbers,
   but Grafana MCP Go server expects strings. Now returns string always.

2. Phases timing out despite having enough data:
   Added post-loop retry that tries up to 3 times with mock error
   responses when gpt-oss-120b keeps returning tool_calls despite
   empty tools array. Each retry adds tool error messages to convince
   the model to produce JSON.

3. Wasted iterations on list_datasources:
   Pre-fetch datasource UIDs once before phases start and inject them
   into all phase prompts. Saves 1-2 iterations per phase.

Phase 1 bumped to 7 iterations (6 tool + 1 forced) since anomaly
detection needs discovery + data querying.

Verified: 23 tool calls, 88K tokens, 3 panel images captured.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…in RCA

Panel capture now extracts dashboard/panel name hints from the user message
(e.g. "Ingestion Log Rate in Ingestion monitor") and weights them 3x higher
than service name tokens when ranking dashboards and panels. This ensures
the correct panels are captured instead of unrelated ones.

Also instructs the RCA synthesis prompt to avoid markdown tables (which
render poorly in the terminal CLI) and use bullet lists instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of showing "No matching service found" error and stopping,
unmatched investigation intents now fall through to the conversational
agent which can use MCP tools to answer infrastructure queries directly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@WZ WZ merged commit 818fcc2 into main Mar 5, 2026
1 check passed
WZ added a commit that referenced this pull request Apr 2, 2026
feat: CLI, discovery, RCA pipeline, panel images, and investigation hardening
WZ added a commit that referenced this pull request Apr 23, 2026
* refactor(scan): remove Status section from Settings tab

Status display (next run / last run / ticking) and "Scan now" trigger now
live in the Operation Desk view. Keeping them here duplicated the same
data and forced a 10s polling interval that the Settings tab doesn't need.

Removes the status fetch, the polling interval + its cleanup, the trigger
handler + timers, and the Field / formatTimestamp helpers that were only
used here.

Addresses finding #2 from /design-review on Settings → Scan.

* style(scan): plain-language copy on the Settings tab

Rewrites every helper sentence to avoid internal jargon ("probe", "tick",
"trip", "PromQL") that operators don't need to learn to read the settings
page. "Cron expression" label becomes "Schedule" (the expression itself is
still shown). Helper text is lifted from 10px mono to 12px regular so it's
actually readable. The top-of-page blurb now leads with what the feature
does for the user, not how it works internally.

Addresses finding #3 from /design-review on Settings → Scan.

* feat(scan): cron preset shortcuts on Settings tab

Adds a row of pill buttons above the schedule input (Every 15 min, Hourly,
Every 4 hours, Daily, Weekly) that fill the cron field in one click. The
pill matching the current value is highlighted so the user can see which
preset their schedule matches, if any. Typing a custom expression still
works and simply leaves no pill highlighted.

Addresses finding #4 from /design-review on Settings → Scan.

* feat(scan): default timezone to the browser's timezone

When the user hasn't saved a GUI timezone yet (settings.source.timezone
is still "config", i.e. the UTC default from config.yaml), pre-fill the
form with Intl.DateTimeFormat().resolvedOptions().timeZone. Marks the
form dirty so the Save button lights up and the user can persist it with
one click.

Users who explicitly want UTC can clear the field and save; a user who
has already saved a specific timezone keeps theirs.

Addresses finding #5 from /design-review on Settings → Scan.

* style(scan): tidy per-rule editor layout

Collapses the three header actions (move up, move down, remove) from
three differently-styled bordered buttons into a single horizontal icon
strip with subtle hover backgrounds. Each icon is a 28px square, no
per-button border, destructive hover only on the × button. The rule
card now has one clean header row instead of two ragged columns.

Other touch-ups in the same editor:
- "Consecutive ticks" label → "Scans in a row" (matches the vocabulary
  used elsewhere on the tab)
- Drops the redundant helper under the ticks input (label is clear)
- Rewrites the query helper to drop "Probe" jargon
- Remove-confirm dialog uses plain language instead of "hysteresis state"

Addresses finding #1 from /design-review on Settings → Scan.

* style(design): dark-theme Add recipient modal

Align the email recipient editor modal with the dark-theme design tokens
used by EmailRecipientsSection. Swap the white card for bg-card, adopt the
mono uppercase label/title pattern, use bg-background/40 inputs with
border-border/40, accent-primary radios/checkboxes, and replace raw button
elements with the shared Button (outline Cancel, primary Save). Backdrop
gains backdrop-blur and proper dialog ARIA.

* chore: bump version and changelog (v0.1.2.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
WZ added a commit that referenced this pull request May 7, 2026
…#189)

* feat(webhooks): backend foundations for Settings → Alert Webhooks tab

Three changes that PR 2 (the UI) depends on, none of them user-visible
on their own.

1. Extract `processFiringAlert()` from `createWebhookHandler` so the
   post-auth pipeline (service match → dedup → run) can be driven by
   internal callers without synthesizing a fake Request/Response. The
   HTTP handler keeps its bearer auth + response shaping above this; the
   shared core emits `alert_received` events on every path it takes.

2. Emit `alert_received` events on dedup-skipped and concurrency-skipped
   paths in addition to accepted alerts. Pre-fix the eventLog only
   recorded accepts, which made the upcoming activity log silent on
   exactly the cases an operator needs to see ("Grafana keeps retrying
   but I don't see anything come through"). Every event now carries
   `meta.deliveryStatus` (`investigated` / `deduplicated` /
   `concurrency_skipped` / `no_service_match`) so the UI can render the
   outcome correctly. Existing accepted-alert assertions still pass.

3. Add `min(16)` to `WebhookSchema.tokens` value so the masking format
   (`first4…last4`) can't ever leak the whole token. Legacy
   `webhook.secret` keeps no floor for back-compat; sub-16 secrets
   degrade to a placeholder string in the GUI rather than masking.

Plus two new read-only endpoints, both stack-scoped via the existing
X-Stack-Id middleware:

- `GET /api/webhooks/info` — masked tokens, severity → template map,
  dedup window, max concurrent, AND the service-label contract
  (`["service","service_name","app","job","deployment"]`). Surfacing
  the contract kills the #1 self-service failure mode (operator labels
  alerts with `team` instead of `service` and gets a silent 422).

- `GET /api/webhooks/recent` — last 20 webhook deliveries from the
  persistent `events` DB table, filtered by `kind=alert_received` and
  `meta.source=alertmanager`. DB-backed (not the 200-entry in-memory
  ring) so the activity log doesn't drop entries on busy stacks.

Tests:
- `webhook-handler.test.ts` — 4 new cases for the dedup / concurrency /
  no_service_match emission paths, plus deliveryStatus assertion on the
  existing accepted-alert test.
- `routes.webhooks.test.ts` (new) — masking edge cases (long token,
  legacy secret, short legacy secret), stack scoping, contract
  surfacing, recent filtering + ordering + cap + cross-source isolation.

Tab UI, test endpoint, reveal-token endpoint, snippet generator, and
loopback test ship in PR 2 alongside the AlertWebhooksTab component.

* fix(webhooks): preserve delivery audit events
WZ added a commit that referenced this pull request Jun 1, 2026
The gap-#1 change over-corrected: gating on loopOutcome made the button appear
on clean-confirm / no-rule-out investigations, where clicking dead-ends in
'the loop ruled nothing out … nothing to re-examine'. Deep mode re-examines
RULED-OUT causes, so the honest gate is ruledOut.length > 0 — investigations
with nothing to resurrect simply don't offer it (no dead-end button).

(The earlier 'no button at all' was viewing pre-loop investigations, which
correctly have no rule-outs either.)
WZ added a commit that referenced this pull request Jun 2, 2026
…-out causes (#231)

* feat(workflows): deep-mode orchestrator core (Step 3)

Pure runDeepMode: re-examines the hypothesis loop's top ruled-out
hypotheses with deeper injected evidence + the same discriminating
keystone, resurrecting any the loop dismissed prematurely. Read-only by
construction; gatherDeepEvidence injected so the control flow is
unit-testable without LLM/MCP. 5 tests, tsc clean.

Trigger/runner wiring + report/UI surface follow in subsequent commits.

* feat(server): deep-mode on-demand trigger wiring (Step 3)

deep_mode_investigate WS message (distinct from the deep_investigate chat
msg) -> handler loads a completed investigation's stored hypotheses+ruledOut,
rejoins predictions (matchRuledOutToPredictions), runs the deeper read-only
re-query via deepModeReexamine (reuses investigation providers+model +
hypothesis-requery), persists deepMode onto the report, streams progress.

- rca-types: DeepModeReexamination/DeepModeReport + report.deepMode field
- agents.ts: deepModeReexamine closure on createMastraAdapters
- ws-types: deep_mode_investigate client msg + deep_mode:* server msgs
- sanitize: DeepModeInvestigateMessageSchema; demo-mode blocks it
- ws-handler: validation + dispatch + handleDeepModeInvestigate (read-only)
- 3 new matchRuledOutToPredictions tests; tsc + deep-mode/sanitize/ws-handler green

Next: deep-from-start flag (workflow tail) + React UI.

* feat(web): deep-mode UI — Deep investigate button + report section (Step 3)

- 'Deep investigate' action on a completed investigation (shows only when
  the loop left ruled-out causes); sends deep_mode_investigate, reflects
  running/error state.
- InvestigationPane handles deep_mode:started/complete/error → updates the
  report in place.
- RcaReport renders a 'Deep Mode' section: resurrected-candidate (warning,
  a dismissed cause came back) vs rule-outs-confirmed (reassurance), with the
  per-hypothesis prior→deep verdict.
- 3 render tests; tsc + clean web build green.

Completes the on-demand trigger path end-to-end. Remaining: deep-from-start flag.

* feat(web): surface hypothesis-loop status (N>1) in the GUI

Two operator-visible signals for whether the synthesis loop is active:

1. Settings → LLM: read-only 'Hypothesis Loop' indicator — 'ON · N rounds'
   (teal) vs 'OFF · single-pass'. Surfaced via the existing
   /api/stacks/:id/llm/settings view (synthesisLoopRounds added). Deployment-
   level proof without firing an investigation.
2. Investigation metadata panel: a 'loop' MetaRow (outcome + ranked count) +
   a 'deep mode' MetaRow, shown only when those ran — per-investigation proof
   that synthesis wasn't single-pass.

DESIGN.md-consistent (mono labels, restrained teal, rounded-sm). tsc +
llm-settings/SettingsPage tests + clean web build green.

* chore(release): bump VERSION 0.4.4.3 → 0.4.4.4 (deep-mode preview)

Preview build off feat/deep-mode so deep mode + the N>1 GUI indicators can
be deployed/verified before #231 merges. Distinct tag from the released
0.4.4.3 (clean-main build). Not tagged :latest.

* feat(server): stream deep-mode progress as Console thinking

Deep mode jumped straight from button to result — the deep_mode:tool_call
events were emitted but unrendered. Now route step-by-step progress through
the chat reasoning channel (chat:stream_start → chat:stream_delta reasoning →
chat:stream_end) the deep_investigate follow-up already uses, so it shows as
a live 'thinking' block in the Console + a plain-language summary message.

- deepModeReexamine: onProgress callback — announces each hypothesis as the
  loop reaches it ('↪ testing: X'), each re-query tool call, and the per-
  hypothesis verdict (resurrected vs still-ruled-out).
- handleDeepModeInvestigate: streams onProgress/onToolCall as reasoning deltas;
  ends with an outcome summary. deep_mode:started/complete/error still drive
  the button + report update.
- ws-types: dropped the now-unused deep_mode:tool_call.

Reuses ChatPane's existing reasoning display — no new client rendering.
tsc + ws-handler tests + clean web build green.

* fix(web): show Deep investigate whenever the loop ran (not only on rule-outs)

Gap: the button gated on ruledOut.length>0, so a cleanly-confirmed
investigation (0 rule-outs) — or any report where rule-outs weren't
persisted — showed no button at all. Now gate on loopOutcome present (the
Step 2 loop ran → deep mode is applicable). The handler treats 'loop ran but
ruled nothing out' as a calm Console message, not a red error; genuine
single-pass (no hypotheses) stays an error.

* feat(workflows): live 'Testing hypotheses' progress in the Step 2 loop

Gap: the Approach-A draft showed a live Testing-hypotheses feed (rank → test
leader → rule-out) but the loop ran silently inside Synthesis — only the final
report showed rule-outs. Now runHypothesisLoop emits onRound progress events
(ranking / testing / verdict); synthesis maps them to onIteration under the
Synthesis phase, so the PhaseStepper streams them live while investigating —
e.g. 'Testing H1/3: …', 'Ruled out H1: … (absent)', 'Confirmed H2: …'.

Reuses the existing iteration-event rendering — no UI change. onRound is
optional → loop stays pure; existing tests unaffected. +1 progress test.
tsc + loop/deep-mode/investigation tests green.

* feat(server): deep mode digs into a wider window than the loop

Closes the 'deeper' gap: deep mode re-queried the same incident window the
loop used, so it was 're-examine', not 'escalate'. Now widenTimeRange expands
the window each side by max(duration, 30min) (~3x, centered) for the deep
re-query, surfacing precursors/aftermath the narrow synthesis window missed.

- widenTimeRange (pure, defensive: only widens parseable ISO/epoch; passes
  Grafana relative ranges + undefined through unchanged). 4 tests.
- deepModeReexamine queries the widened window but keeps the ORIGINAL incident
  onset as the change-in-window anchor.

Note: cross-service following (the other 'deeper' dimension) still pending —
needs dependency topology. tsc + deep-mode tests green.

* feat: deep-from-start — auto-chain deep mode after interactive investigations

The 2nd of the two trigger paths. New config flag agent.deepModeOnComplete:
when on, an interactive (chat-dispatched) investigation that ran the loop and
ruled causes out automatically chains the deep re-examination on completion —
no second click; the result streams + lands in one pass.

- config: agent.deepModeOnComplete (Zod, default off)
- ws-handler: extracted runDeepModeStreamed() (shared by the on-demand trigger
  + the new chain); chat dispatch captures the report and chains when enabled
- Settings → LLM: read-only 'Deep Mode — auto' indicator (ON / OFF · on-demand)
  alongside the Hypothesis Loop one, via the llm/settings view

Scope: wired for the interactive path (where streaming matters); headless
(webhook/poller) chaining is a follow-up. tsc + llm-settings/ws-handler/config
tests + clean web build green.

* fix(web): only offer Deep investigate when the loop ruled causes out

The gap-#1 change over-corrected: gating on loopOutcome made the button appear
on clean-confirm / no-rule-out investigations, where clicking dead-ends in
'the loop ruled nothing out … nothing to re-examine'. Deep mode re-examines
RULED-OUT causes, so the honest gate is ruledOut.length > 0 — investigations
with nothing to resurrect simply don't offer it (no dead-end button).

(The earlier 'no button at all' was viewing pre-loop investigations, which
correctly have no rule-outs either.)

* feat: broaden deep mode — resurrect rule-outs OR refute the confirmed cause

Deep mode was only useful when the loop ruled causes out, so the button kept
being absent (pre-loop / clean confirms) or dead-ending. Now it re-examines the
loop's conclusion by STANDING and always does something useful:
- ruled-out causes -> try to RESURRECT (deeper evidence now satisfies?)
- the confirmed cause (no rule-outs) -> try to REFUTE (deeper evidence drops support?)

- deep-mode.ts: runDeepMode takes ReexamineTarget[] (priorStanding); flip logic
  per standing; outcomes resurrected-candidate | confirmation-shaken | holds |
  nothing-to-examine. buildReexamineTargets() picks resurrect vs refute mode.
- rca-types/agents/ws-handler/RcaReport/InvestigationPane threaded through;
  button gates on loopOutcome again (always useful now); summaries + metadata
  row + report section handle shaken/holds; deep-from-start chains on any loop run.
- tests rewritten for the standing API + refute mode + buildReexamineTargets.
  tsc + deep-mode/RcaReport/ws-handler/investigation tests + clean build green.

* feat(web): dedicated structured deep-mode stream (colored, grouped, expanded)

The deep-mode progress was piped through the chat 'thinking' block — collapsed
by default, plain mono, no grouping. Replaced with a dedicated agent stream
matching the design:
- structured AgentStreamEvent (verb/target/status/indent) replaces plain text;
  deepModeReexamine emits onStep events, ws-handler streams deep_mode:step + a
  final stats footer (examined/tools/resurrected/shaken/elapsed).
- new DeepModeStream component: status icons (◉/✓/✗), coral verbs, info-blue
  query targets, indented sub-steps with a left rail, always expanded.
- InvestigationPane accumulates steps + renders it above the report.
- dropped the chat:stream routing for deep mode (no more collapsed plain block).

tsc + ws-handler/deep-mode tests + clean web build green.

* feat: gate deep mode behind deepModeEnabled (default off) + plain-English copy

Deep mode (Step 3) is hidden from users until the autonomous orchestrator
lands. Today's bounded re-examination only re-judges the existing RCA's
hypotheses; it can't investigate freely for the real cause. Until it can,
expose it only behind config.agent.deepModeEnabled (default false):

- server injects window.__DEEP_MODE_ENABLED__ only when enabled; the
  'Deep investigate' button is gated on it (hidden by default)
- deep_mode_investigate WS handler + deep-from-start chain reject when off
- reworded the deep-mode stream + report copy to plain English
  (lead-with-takeaway: 'Probably not the cause: … — the evidence that
  would confirm it isn't there')
- schema test locks the ships-OFF default
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants