feat: deterministic intent routing & robust service matching by WZ · Pull Request #6 · WZ/dops-assistant

WZ · 2026-03-08T09:09:48Z

Summary

Deterministic intent routing: Add keyword fast-path (investigate, diagnose, rca, troubleshoot, root cause, postmortem) and symptom+service fast-path (drop/slow/errors/check + recognized service) to bypass the LLM for intent classification. Result: 9/10 accuracy, 8/10 fully deterministic (0ms, no LLM call).
Alias resolution in service matching: matchService and matchServiceFromText now resolve common LLM shorthand — kafka → kafka-brokers, clickhouse → ch-clickhouse, postgres → stolon-proxy, redis → cache-redis-ha.
Tiebreaker logic: When multiple services score equally on token overlap, prefer the shorter (more specific) service name.
Strengthened intent classifier prompt: 8 few-shot examples, structured classification rules, "when in doubt → investigate" bias.
Investigation pipeline reliability: Evidence timeout 120s→180s, synthesis/reflection timeout 120s→240s, planning maxOutputTokens 128→2048, reflection maxOutputTokens 4096→8192.
Re-enabled list_prometheus_metric_metadata: Was excluded, causing 3-4 hallucinated calls per run wasting iterations.
E2E validated: 3 full investigations against live Grafana MCP — ingestion spike, data-server slow queries, ClickHouse connection errors — all produced actionable RCA reports with evidence, timelines, and dashboard links.

Test plan

231 unit tests passing (npx vitest run)
TypeScript compiles clean (npx tsc --noEmit)
Phase 1: Intent classification 9/10 accuracy, service matching 9/10
Phase 2: Full investigation feat: CLI, discovery, RCA pipeline, panel images, and investigation hardening #1 (ingestion) — full quality, 0 issues
Phase 2: Full investigation feat: multi-MCP provider abstraction #8 (data-server CH errors) — full quality, 0 issues
Manual CLI test: npm run cli → "investigate ingestion rate drop on 2026-03-03"

🤖 Generated with Claude Code

…iciency - Add LLM-based time range extraction (handles any date format including unicode dashes, natural language like "March 3" or "this Thursday") - Add timeRangeFrom/timeRangeTo to AnomalyAssessment schema for LLM extraction - Pre-populate real dashboard URLs from GRAFANA_URL instead of LLM hallucination - Pass raw metric tool data to synthesis for data-grounded analysis - Strengthen synthesis evidence requirements (3-5 items per category with examples) - Add verified-counts-only instruction to log correlation prompt - Increase log sampleLines from 3→5 and add lastSeen to condensed logs - Reduce evidence iterations from 7→6 (4 productive + 2 wind-down) - Refactor: simplify extractTimeRange to ISO-only fallback (LLM handles rest) - Refactor: replace 150-word STOP_WORDS + extractQueryKeywords with simple tokenizer - Refactor: simplify suggestStepSeconds to formula-based (durationSec/100) - Refactor: simplify toRfc3339Window, remove unused Grafana relative parsing - Add toolData to PhaseResult for raw evidence pass-through Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…y to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add keyword and symptom-based fast-paths to bypass LLM for intent classification (9/10 accuracy, 8/10 deterministic). Add alias resolution and tiebreaker logic to matchServiceFromText for reliable service matching (kafka→brokers, clickhouse→ch-clickhouse). Bump evidence/synthesis timeouts and maxOutputTokens for slow models. Re-enable list_prometheus_metric_metadata. 231 tests passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR improves reliability and determinism of the CLI investigation flow by introducing deterministic intent routing/service matching and hardening the RCA investigation pipeline (timeouts, planning/reflection phases, prefetching evidence context, and better truncation/JSON recovery), plus UI/UX upgrades for the CLI input and report formatting.

Changes:

Add deterministic intent routing fast-paths and strengthen service matching (aliases + token scoring).
Harden the investigation pipeline with planning/reflection phases, longer timeouts, evidence prefetching, and improved tool-output compaction/JSON recovery.
Improve CLI UX (custom text input editing, routing feedback, richer RCA formatting) and MCP/LLM robustness (tool-arg coercion, hallucinated tool-call handling).

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/mcp/client.ts	Adds schema-based tool-argument coercion before MCP tool calls.
src/mcp/client.test.ts	Adds unit tests for argument coercion and adjusts existing assertion.
src/llm/openai.ts	Adds per-call timeout override and changes handling for truncated outputs / hallucinated function calls.
src/llm/openai.test.ts	Updates tests to reflect tool-call parsing behavior when tools are provided.
src/interfaces/cli/CliTextInput.tsx	Introduces a custom Ink input component with readline-like editing.
src/interfaces/cli/App.tsx	Switches to IntentRouter, improves RCA rendering, adds history/abort UX, and routes via service matching from text.
src/interfaces/cli/cli-utils.test.ts	Updates RCA text formatting expectations for new sections/headers.
src/index.ts	Renames IntentClassifier usage to IntentRouter.
src/cli.tsx	Preserves user LOG_LEVEL and converts static imports to dynamic imports to avoid logger init ordering issues.
src/agent/types.ts	Extends AnomalyAssessment with extracted time range fields.
src/agent/rca-types.ts	Introduces structured findings types and expands RcaReport fields (impact/trigger/timeline/etc.).
src/agent/rca-prompts.ts	Refactors prompts into builders, adds planning/reflection schemas, and strengthens synthesis + intent prompts.
src/agent/rca-prompts.test.ts	Updates tests for prompt builders and new JSON schemas.
src/agent/prompts.ts	Adds explicit time-window extraction requirement to anomaly assessment prompt/schema.
src/agent/investigation.ts	Major pipeline hardening: planning + reflection phases, prefetching, timeline build, truncation improvements, and retries.
src/agent/investigation.test.ts	Updates and expands tests to cover planning/reflection/timeline/severity validation/time-range parsing.
src/agent/intent.ts	Replaces IntentClassifier with IntentRouter and adds deterministic fast-paths + alias-aware matching.
src/agent/intent.test.ts	Updates tests for new routing behavior and expanded service matching logic.
src/agent/core.ts	Enhances create_temp_panel tool to respect time range and optionally datasource UID.
package-lock.json	Adds a bin entry for a CLI executable.
docs/plans/2026-03-08-agent-robustness.md	Adds an implementation plan doc for robustness improvements.
README.md	Expands documentation with pipeline details, examples, and CLI shortcuts.
CLAUDE.md	Adds contributor guidance (security + repo conventions).

Comments suppressed due to low confidence (1)

src/llm/openai.ts:336

When tools.length === 0, this code intentionally ignores any function_call items as hallucinations, but it can then fall through and return { type: "text", content: "" } if the response contained only function calls (no message output). That leads to silent empty output rather than an error, which can break callers that expect real text/JSON. Consider treating this case as "no output" (throw) or at least warning+throw when textContent is empty after filtering hallucinated function calls (unless status === "incomplete" and you have a different recovery path).

    if (functionCalls.length > 0) {
      // When no tools were provided, any function_call items are hallucinations
      // (e.g. "<|constrain|>json"). Ignore them and fall through to text handling.
      if (tools.length > 0) {
        return {
          type: "tool_calls",
          usage,
          calls: functionCalls.map((fc) => {
            let args: Record<string, unknown>;
            try {
              args = JSON.parse(fc.arguments) as Record<string, unknown>;
            } catch {
              throw new Error(
                `Failed to parse tool arguments for "${fc.name}": ${fc.arguments}`,
              );
            }
            return { id: fc.id, name: fc.name, args };
          }),
        };
      } else {
        logger.warn(
          { hallucinated: functionCalls.map((fc) => fc.name) },
          "Ignoring hallucinated function calls (no tools were provided)",
        );
      }
    }

    if (response.output.length === 0 && response.status !== "incomplete") {
      throw new Error(
        "LLM returned no output (possible content filter or API error)",
      );
    }

    return { type: "text", content: textContent, usage };
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-08T09:14:16Z

  /**
   * Extract a Grafana-compatible time range from the anomaly description.
-   * Looks for date/time references; defaults to last 24h if none found.
+   * Uses Grafana's day-rounding syntax (/d) to produce precise day boundaries.
   */
-  private extractTimeRange(anomalySummary: string, userMessage?: string): { from: string; to: string } {
-    const text = `${anomalySummary} ${userMessage ?? ""}`;
-
-    // Try to find an ISO-ish date or "March 2nd" style references
+  extractTimeRange(anomalySummary: string, userMessage?: string): { from: string; to: string } {
+    // Lightweight fallback for when LLM extraction fails or is skipped.
+    // Only handles ISO dates — all other formats are handled by extractTimeRangeViaLlm.
+    const text = `${anomalySummary} ${userMessage ?? ""}`.replace(/[\u2010-\u2015\u2212]/g, "-");
    const dateMatch = text.match(/(\d{4}-\d{2}-\d{2})/);
    if (dateMatch) {
-      // Found a date like 2026-03-02 — show that full day
-      return { from: `${dateMatch[1]}T00:00:00`, to: `${dateMatch[1]}T23:59:59` };
+      return { from: `${dateMatch[1]}T00:00:00Z`, to: `${dateMatch[1]}T23:59:59Z` };
+    }
+    return { from: "now-6h", to: "now" };
+  }


The docstring says this uses Grafana’s day-rounding syntax (/d) to produce precise day boundaries, but the implementation only returns ISO day boundaries (...T00:00:00Z → ...T23:59:59Z) or now-6h/now. Either update the implementation to actually use /d (if that’s desired/required elsewhere), or adjust the comment so it reflects the current behavior.

Copilot · 2026-03-08T09:14:16Z

+  private toRfc3339Window(window: { from: string; to: string }): { startRfc3339: string; endRfc3339: string } {
+    const resolve = (expr: string): string => {
+      if (/^\d{4}-\d{2}-\d{2}/.test(expr)) return new Date(expr).toISOString();
+      const m = expr.match(/^now(?:-(\d+)([dhm]))?/);
+      if (m) {
+        const d = new Date();
+        if (m[1] && m[2]) {
+          const n = parseInt(m[1], 10);
+          if (m[2] === "d") d.setDate(d.getDate() - n);
+          else if (m[2] === "h") d.setHours(d.getHours() - n);
+          else d.setMinutes(d.getMinutes() - n);
+        }
+        return d.toISOString();
+      }
+      return new Date(Date.now() - 7 * 86400000).toISOString();


toRfc3339Window() parses relative expressions with ^now(?:-(\d+)([dhm]))?, but won’t match common Grafana forms like now-1d/d (day-rounded) that you already mention supporting elsewhere. In that case it falls back to a 7-day window, which can make Loki probe queries unexpectedly expensive and broaden the selector search. Consider accepting an optional /d suffix (and possibly w as well, since suggestStepSeconds() supports it).

Copilot · 2026-03-08T09:14:16Z

+  const abortRef = useRef<AbortController | null>(null);
+  const lastEscRef = useRef(0);
  const threadId = "cli-session";

-  useInput((_input, key) => {
+  useInput((input, key) => {
+    if (key.escape && isThinking) {
+      const now = Date.now();
+      if (now - lastEscRef.current < 500) {
+        abortRef.current?.abort();
+        lastEscRef.current = 0;


The Esc-Esc "abort" flow only aborts a local AbortController, but that signal is never passed into router.route, investigationAgent.investigate, agent.chat, or any underlying LLM/MCP calls. As a result, the user sees "Query aborted" but the expensive background work still continues to completion. Either plumb AbortSignal through the agent/LLM/MCP layers (and actually cancel in-flight requests), or remove/rename this behavior so it doesn’t imply cancellation.

Copilot · 2026-03-08T09:14:17Z

+        dispatch({ type: "move-cursor-left" });
+      } else if (key.rightArrow) {
+        dispatch({ type: "move-cursor-right" });
+      } else if (key.backspace || key.delete) {


key.delete (forward delete) is handled the same as backspace here, which deletes the character before the cursor. That makes the Delete key behave incorrectly. Consider introducing a separate action for forward-delete that removes the character at cursorOffset (if any) while keeping cursorOffset unchanged.

Suggested change

} else if (key.backspace || key.delete) {

} else if (key.delete) {

// Forward delete: remove character at cursor without moving cursor back

if (state.cursorOffset < state.value.length) {

dispatch({ type: "move-cursor-right" });

dispatch({ type: "delete" });

}

} else if (key.backspace) {

feat: deterministic intent routing & robust service matching

Wilson Li and others added 4 commits March 7, 2026 15:36

docs: add detailed investigation pipeline diagram and LLM call summar…

eb2cb5a

…y to README Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

docs: explain discovery vs pre-fetch relationship in README

4ba125f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings March 8, 2026 09:09

Copilot started reviewing on behalf of WZ March 8, 2026 09:10 View session

WZ merged commit 54c0b27 into main Mar 8, 2026
3 checks passed

Copilot AI reviewed Mar 8, 2026

View reviewed changes

WZ added a commit that referenced this pull request Apr 2, 2026

Merge pull request #6 from WZ/fix/pr5-review-comments

11ba2b8

feat: deterministic intent routing & robust service matching

This was referenced Apr 17, 2026

fix(qa): resolve 5 P3 + 3 P4 findings from /qa sweep #94

Merged

fix(router): preserve :name param on service-card SPA nav (QA #13) #100

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: deterministic intent routing & robust service matching#6

feat: deterministic intent routing & robust service matching#6
WZ merged 4 commits into
mainfrom
fix/pr5-review-comments

WZ commented Mar 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Copilot AI Mar 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-      } else if (key.backspace || key.delete) {
+      } else if (key.delete) {
+        // Forward delete: remove character at cursor without moving cursor back
+        if (state.cursorOffset < state.value.length) {
+          dispatch({ type: "move-cursor-right" });
+          dispatch({ type: "delete" });
+        }
+      } else if (key.backspace) {

Conversation

WZ commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WZ commented Mar 8, 2026 •

edited

Loading