feat: deterministic intent routing & robust service matching#6
Conversation
…iciency - Add LLM-based time range extraction (handles any date format including unicode dashes, natural language like "March 3" or "this Thursday") - Add timeRangeFrom/timeRangeTo to AnomalyAssessment schema for LLM extraction - Pre-populate real dashboard URLs from GRAFANA_URL instead of LLM hallucination - Pass raw metric tool data to synthesis for data-grounded analysis - Strengthen synthesis evidence requirements (3-5 items per category with examples) - Add verified-counts-only instruction to log correlation prompt - Increase log sampleLines from 3→5 and add lastSeen to condensed logs - Reduce evidence iterations from 7→6 (4 productive + 2 wind-down) - Refactor: simplify extractTimeRange to ISO-only fallback (LLM handles rest) - Refactor: replace 150-word STOP_WORDS + extractQueryKeywords with simple tokenizer - Refactor: simplify suggestStepSeconds to formula-based (durationSec/100) - Refactor: simplify toRfc3339Window, remove unused Grafana relative parsing - Add toolData to PhaseResult for raw evidence pass-through Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…y to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add keyword and symptom-based fast-paths to bypass LLM for intent classification (9/10 accuracy, 8/10 deterministic). Add alias resolution and tiebreaker logic to matchServiceFromText for reliable service matching (kafka→brokers, clickhouse→ch-clickhouse). Bump evidence/synthesis timeouts and maxOutputTokens for slow models. Re-enable list_prometheus_metric_metadata. 231 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR improves reliability and determinism of the CLI investigation flow by introducing deterministic intent routing/service matching and hardening the RCA investigation pipeline (timeouts, planning/reflection phases, prefetching evidence context, and better truncation/JSON recovery), plus UI/UX upgrades for the CLI input and report formatting.
Changes:
- Add deterministic intent routing fast-paths and strengthen service matching (aliases + token scoring).
- Harden the investigation pipeline with planning/reflection phases, longer timeouts, evidence prefetching, and improved tool-output compaction/JSON recovery.
- Improve CLI UX (custom text input editing, routing feedback, richer RCA formatting) and MCP/LLM robustness (tool-arg coercion, hallucinated tool-call handling).
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/mcp/client.ts | Adds schema-based tool-argument coercion before MCP tool calls. |
| src/mcp/client.test.ts | Adds unit tests for argument coercion and adjusts existing assertion. |
| src/llm/openai.ts | Adds per-call timeout override and changes handling for truncated outputs / hallucinated function calls. |
| src/llm/openai.test.ts | Updates tests to reflect tool-call parsing behavior when tools are provided. |
| src/interfaces/cli/CliTextInput.tsx | Introduces a custom Ink input component with readline-like editing. |
| src/interfaces/cli/App.tsx | Switches to IntentRouter, improves RCA rendering, adds history/abort UX, and routes via service matching from text. |
| src/interfaces/cli/cli-utils.test.ts | Updates RCA text formatting expectations for new sections/headers. |
| src/index.ts | Renames IntentClassifier usage to IntentRouter. |
| src/cli.tsx | Preserves user LOG_LEVEL and converts static imports to dynamic imports to avoid logger init ordering issues. |
| src/agent/types.ts | Extends AnomalyAssessment with extracted time range fields. |
| src/agent/rca-types.ts | Introduces structured findings types and expands RcaReport fields (impact/trigger/timeline/etc.). |
| src/agent/rca-prompts.ts | Refactors prompts into builders, adds planning/reflection schemas, and strengthens synthesis + intent prompts. |
| src/agent/rca-prompts.test.ts | Updates tests for prompt builders and new JSON schemas. |
| src/agent/prompts.ts | Adds explicit time-window extraction requirement to anomaly assessment prompt/schema. |
| src/agent/investigation.ts | Major pipeline hardening: planning + reflection phases, prefetching, timeline build, truncation improvements, and retries. |
| src/agent/investigation.test.ts | Updates and expands tests to cover planning/reflection/timeline/severity validation/time-range parsing. |
| src/agent/intent.ts | Replaces IntentClassifier with IntentRouter and adds deterministic fast-paths + alias-aware matching. |
| src/agent/intent.test.ts | Updates tests for new routing behavior and expanded service matching logic. |
| src/agent/core.ts | Enhances create_temp_panel tool to respect time range and optionally datasource UID. |
| package-lock.json | Adds a bin entry for a CLI executable. |
| docs/plans/2026-03-08-agent-robustness.md | Adds an implementation plan doc for robustness improvements. |
| README.md | Expands documentation with pipeline details, examples, and CLI shortcuts. |
| CLAUDE.md | Adds contributor guidance (security + repo conventions). |
Comments suppressed due to low confidence (1)
src/llm/openai.ts:336
- When
tools.length === 0, this code intentionally ignores anyfunction_callitems as hallucinations, but it can then fall through and return{ type: "text", content: "" }if the response contained only function calls (nomessageoutput). That leads to silent empty output rather than an error, which can break callers that expect real text/JSON. Consider treating this case as "no output" (throw) or at least warning+throw whentextContentis empty after filtering hallucinated function calls (unlessstatus === "incomplete"and you have a different recovery path).
if (functionCalls.length > 0) {
// When no tools were provided, any function_call items are hallucinations
// (e.g. "<|constrain|>json"). Ignore them and fall through to text handling.
if (tools.length > 0) {
return {
type: "tool_calls",
usage,
calls: functionCalls.map((fc) => {
let args: Record<string, unknown>;
try {
args = JSON.parse(fc.arguments) as Record<string, unknown>;
} catch {
throw new Error(
`Failed to parse tool arguments for "${fc.name}": ${fc.arguments}`,
);
}
return { id: fc.id, name: fc.name, args };
}),
};
} else {
logger.warn(
{ hallucinated: functionCalls.map((fc) => fc.name) },
"Ignoring hallucinated function calls (no tools were provided)",
);
}
}
if (response.output.length === 0 && response.status !== "incomplete") {
throw new Error(
"LLM returned no output (possible content filter or API error)",
);
}
return { type: "text", content: textContent, usage };
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /** | ||
| * Extract a Grafana-compatible time range from the anomaly description. | ||
| * Looks for date/time references; defaults to last 24h if none found. | ||
| * Uses Grafana's day-rounding syntax (/d) to produce precise day boundaries. | ||
| */ | ||
| private extractTimeRange(anomalySummary: string, userMessage?: string): { from: string; to: string } { | ||
| const text = `${anomalySummary} ${userMessage ?? ""}`; | ||
|
|
||
| // Try to find an ISO-ish date or "March 2nd" style references | ||
| extractTimeRange(anomalySummary: string, userMessage?: string): { from: string; to: string } { | ||
| // Lightweight fallback for when LLM extraction fails or is skipped. | ||
| // Only handles ISO dates — all other formats are handled by extractTimeRangeViaLlm. | ||
| const text = `${anomalySummary} ${userMessage ?? ""}`.replace(/[\u2010-\u2015\u2212]/g, "-"); | ||
| const dateMatch = text.match(/(\d{4}-\d{2}-\d{2})/); | ||
| if (dateMatch) { | ||
| // Found a date like 2026-03-02 — show that full day | ||
| return { from: `${dateMatch[1]}T00:00:00`, to: `${dateMatch[1]}T23:59:59` }; | ||
| return { from: `${dateMatch[1]}T00:00:00Z`, to: `${dateMatch[1]}T23:59:59Z` }; | ||
| } | ||
| return { from: "now-6h", to: "now" }; | ||
| } |
There was a problem hiding this comment.
The docstring says this uses Grafana’s day-rounding syntax (/d) to produce precise day boundaries, but the implementation only returns ISO day boundaries (...T00:00:00Z → ...T23:59:59Z) or now-6h/now. Either update the implementation to actually use /d (if that’s desired/required elsewhere), or adjust the comment so it reflects the current behavior.
| private toRfc3339Window(window: { from: string; to: string }): { startRfc3339: string; endRfc3339: string } { | ||
| const resolve = (expr: string): string => { | ||
| if (/^\d{4}-\d{2}-\d{2}/.test(expr)) return new Date(expr).toISOString(); | ||
| const m = expr.match(/^now(?:-(\d+)([dhm]))?/); | ||
| if (m) { | ||
| const d = new Date(); | ||
| if (m[1] && m[2]) { | ||
| const n = parseInt(m[1], 10); | ||
| if (m[2] === "d") d.setDate(d.getDate() - n); | ||
| else if (m[2] === "h") d.setHours(d.getHours() - n); | ||
| else d.setMinutes(d.getMinutes() - n); | ||
| } | ||
| return d.toISOString(); | ||
| } | ||
| return new Date(Date.now() - 7 * 86400000).toISOString(); |
There was a problem hiding this comment.
toRfc3339Window() parses relative expressions with ^now(?:-(\d+)([dhm]))?, but won’t match common Grafana forms like now-1d/d (day-rounded) that you already mention supporting elsewhere. In that case it falls back to a 7-day window, which can make Loki probe queries unexpectedly expensive and broaden the selector search. Consider accepting an optional /d suffix (and possibly w as well, since suggestStepSeconds() supports it).
| const abortRef = useRef<AbortController | null>(null); | ||
| const lastEscRef = useRef(0); | ||
| const threadId = "cli-session"; | ||
|
|
||
| useInput((_input, key) => { | ||
| useInput((input, key) => { | ||
| if (key.escape && isThinking) { | ||
| const now = Date.now(); | ||
| if (now - lastEscRef.current < 500) { | ||
| abortRef.current?.abort(); | ||
| lastEscRef.current = 0; |
There was a problem hiding this comment.
The Esc-Esc "abort" flow only aborts a local AbortController, but that signal is never passed into router.route, investigationAgent.investigate, agent.chat, or any underlying LLM/MCP calls. As a result, the user sees "Query aborted" but the expensive background work still continues to completion. Either plumb AbortSignal through the agent/LLM/MCP layers (and actually cancel in-flight requests), or remove/rename this behavior so it doesn’t imply cancellation.
| dispatch({ type: "move-cursor-left" }); | ||
| } else if (key.rightArrow) { | ||
| dispatch({ type: "move-cursor-right" }); | ||
| } else if (key.backspace || key.delete) { |
There was a problem hiding this comment.
key.delete (forward delete) is handled the same as backspace here, which deletes the character before the cursor. That makes the Delete key behave incorrectly. Consider introducing a separate action for forward-delete that removes the character at cursorOffset (if any) while keeping cursorOffset unchanged.
| } else if (key.backspace || key.delete) { | |
| } else if (key.delete) { | |
| // Forward delete: remove character at cursor without moving cursor back | |
| if (state.cursorOffset < state.value.length) { | |
| dispatch({ type: "move-cursor-right" }); | |
| dispatch({ type: "delete" }); | |
| } | |
| } else if (key.backspace) { |
feat: deterministic intent routing & robust service matching
Summary
investigate,diagnose,rca,troubleshoot,root cause,postmortem) and symptom+service fast-path (drop/slow/errors/check+ recognized service) to bypass the LLM for intent classification. Result: 9/10 accuracy, 8/10 fully deterministic (0ms, no LLM call).matchServiceandmatchServiceFromTextnow resolve common LLM shorthand —kafka→kafka-brokers,clickhouse→ch-clickhouse,postgres→stolon-proxy,redis→cache-redis-ha.list_prometheus_metric_metadata: Was excluded, causing 3-4 hallucinated calls per run wasting iterations.Test plan
npx vitest run)npx tsc --noEmit)npm run cli→ "investigate ingestion rate drop on 2026-03-03"🤖 Generated with Claude Code