Skip to content

feat: deterministic intent routing & robust service matching#6

Merged
WZ merged 4 commits into
mainfrom
fix/pr5-review-comments
Mar 8, 2026
Merged

feat: deterministic intent routing & robust service matching#6
WZ merged 4 commits into
mainfrom
fix/pr5-review-comments

Conversation

@WZ
Copy link
Copy Markdown
Owner

@WZ WZ commented Mar 8, 2026

Summary

  • Deterministic intent routing: Add keyword fast-path (investigate, diagnose, rca, troubleshoot, root cause, postmortem) and symptom+service fast-path (drop/slow/errors/check + recognized service) to bypass the LLM for intent classification. Result: 9/10 accuracy, 8/10 fully deterministic (0ms, no LLM call).
  • Alias resolution in service matching: matchService and matchServiceFromText now resolve common LLM shorthand — kafkakafka-brokers, clickhousech-clickhouse, postgresstolon-proxy, rediscache-redis-ha.
  • Tiebreaker logic: When multiple services score equally on token overlap, prefer the shorter (more specific) service name.
  • Strengthened intent classifier prompt: 8 few-shot examples, structured classification rules, "when in doubt → investigate" bias.
  • Investigation pipeline reliability: Evidence timeout 120s→180s, synthesis/reflection timeout 120s→240s, planning maxOutputTokens 128→2048, reflection maxOutputTokens 4096→8192.
  • Re-enabled list_prometheus_metric_metadata: Was excluded, causing 3-4 hallucinated calls per run wasting iterations.
  • E2E validated: 3 full investigations against live Grafana MCP — ingestion spike, data-server slow queries, ClickHouse connection errors — all produced actionable RCA reports with evidence, timelines, and dashboard links.

Test plan

🤖 Generated with Claude Code

Wilson Li and others added 4 commits March 7, 2026 15:36
…iciency

- Add LLM-based time range extraction (handles any date format including
  unicode dashes, natural language like "March 3" or "this Thursday")
- Add timeRangeFrom/timeRangeTo to AnomalyAssessment schema for LLM extraction
- Pre-populate real dashboard URLs from GRAFANA_URL instead of LLM hallucination
- Pass raw metric tool data to synthesis for data-grounded analysis
- Strengthen synthesis evidence requirements (3-5 items per category with examples)
- Add verified-counts-only instruction to log correlation prompt
- Increase log sampleLines from 3→5 and add lastSeen to condensed logs
- Reduce evidence iterations from 7→6 (4 productive + 2 wind-down)
- Refactor: simplify extractTimeRange to ISO-only fallback (LLM handles rest)
- Refactor: replace 150-word STOP_WORDS + extractQueryKeywords with simple tokenizer
- Refactor: simplify suggestStepSeconds to formula-based (durationSec/100)
- Refactor: simplify toRfc3339Window, remove unused Grafana relative parsing
- Add toolData to PhaseResult for raw evidence pass-through

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…y to README

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add keyword and symptom-based fast-paths to bypass LLM for intent
classification (9/10 accuracy, 8/10 deterministic). Add alias resolution
and tiebreaker logic to matchServiceFromText for reliable service matching
(kafka→brokers, clickhouse→ch-clickhouse). Bump evidence/synthesis
timeouts and maxOutputTokens for slow models. Re-enable
list_prometheus_metric_metadata. 231 tests passing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 8, 2026 09:09
@WZ WZ merged commit 54c0b27 into main Mar 8, 2026
3 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves reliability and determinism of the CLI investigation flow by introducing deterministic intent routing/service matching and hardening the RCA investigation pipeline (timeouts, planning/reflection phases, prefetching evidence context, and better truncation/JSON recovery), plus UI/UX upgrades for the CLI input and report formatting.

Changes:

  • Add deterministic intent routing fast-paths and strengthen service matching (aliases + token scoring).
  • Harden the investigation pipeline with planning/reflection phases, longer timeouts, evidence prefetching, and improved tool-output compaction/JSON recovery.
  • Improve CLI UX (custom text input editing, routing feedback, richer RCA formatting) and MCP/LLM robustness (tool-arg coercion, hallucinated tool-call handling).

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/mcp/client.ts Adds schema-based tool-argument coercion before MCP tool calls.
src/mcp/client.test.ts Adds unit tests for argument coercion and adjusts existing assertion.
src/llm/openai.ts Adds per-call timeout override and changes handling for truncated outputs / hallucinated function calls.
src/llm/openai.test.ts Updates tests to reflect tool-call parsing behavior when tools are provided.
src/interfaces/cli/CliTextInput.tsx Introduces a custom Ink input component with readline-like editing.
src/interfaces/cli/App.tsx Switches to IntentRouter, improves RCA rendering, adds history/abort UX, and routes via service matching from text.
src/interfaces/cli/cli-utils.test.ts Updates RCA text formatting expectations for new sections/headers.
src/index.ts Renames IntentClassifier usage to IntentRouter.
src/cli.tsx Preserves user LOG_LEVEL and converts static imports to dynamic imports to avoid logger init ordering issues.
src/agent/types.ts Extends AnomalyAssessment with extracted time range fields.
src/agent/rca-types.ts Introduces structured findings types and expands RcaReport fields (impact/trigger/timeline/etc.).
src/agent/rca-prompts.ts Refactors prompts into builders, adds planning/reflection schemas, and strengthens synthesis + intent prompts.
src/agent/rca-prompts.test.ts Updates tests for prompt builders and new JSON schemas.
src/agent/prompts.ts Adds explicit time-window extraction requirement to anomaly assessment prompt/schema.
src/agent/investigation.ts Major pipeline hardening: planning + reflection phases, prefetching, timeline build, truncation improvements, and retries.
src/agent/investigation.test.ts Updates and expands tests to cover planning/reflection/timeline/severity validation/time-range parsing.
src/agent/intent.ts Replaces IntentClassifier with IntentRouter and adds deterministic fast-paths + alias-aware matching.
src/agent/intent.test.ts Updates tests for new routing behavior and expanded service matching logic.
src/agent/core.ts Enhances create_temp_panel tool to respect time range and optionally datasource UID.
package-lock.json Adds a bin entry for a CLI executable.
docs/plans/2026-03-08-agent-robustness.md Adds an implementation plan doc for robustness improvements.
README.md Expands documentation with pipeline details, examples, and CLI shortcuts.
CLAUDE.md Adds contributor guidance (security + repo conventions).
Comments suppressed due to low confidence (1)

src/llm/openai.ts:336

  • When tools.length === 0, this code intentionally ignores any function_call items as hallucinations, but it can then fall through and return { type: "text", content: "" } if the response contained only function calls (no message output). That leads to silent empty output rather than an error, which can break callers that expect real text/JSON. Consider treating this case as "no output" (throw) or at least warning+throw when textContent is empty after filtering hallucinated function calls (unless status === "incomplete" and you have a different recovery path).
    if (functionCalls.length > 0) {
      // When no tools were provided, any function_call items are hallucinations
      // (e.g. "<|constrain|>json"). Ignore them and fall through to text handling.
      if (tools.length > 0) {
        return {
          type: "tool_calls",
          usage,
          calls: functionCalls.map((fc) => {
            let args: Record<string, unknown>;
            try {
              args = JSON.parse(fc.arguments) as Record<string, unknown>;
            } catch {
              throw new Error(
                `Failed to parse tool arguments for "${fc.name}": ${fc.arguments}`,
              );
            }
            return { id: fc.id, name: fc.name, args };
          }),
        };
      } else {
        logger.warn(
          { hallucinated: functionCalls.map((fc) => fc.name) },
          "Ignoring hallucinated function calls (no tools were provided)",
        );
      }
    }

    if (response.output.length === 0 && response.status !== "incomplete") {
      throw new Error(
        "LLM returned no output (possible content filter or API error)",
      );
    }

    return { type: "text", content: textContent, usage };
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 826 to +839
/**
* Extract a Grafana-compatible time range from the anomaly description.
* Looks for date/time references; defaults to last 24h if none found.
* Uses Grafana's day-rounding syntax (/d) to produce precise day boundaries.
*/
private extractTimeRange(anomalySummary: string, userMessage?: string): { from: string; to: string } {
const text = `${anomalySummary} ${userMessage ?? ""}`;

// Try to find an ISO-ish date or "March 2nd" style references
extractTimeRange(anomalySummary: string, userMessage?: string): { from: string; to: string } {
// Lightweight fallback for when LLM extraction fails or is skipped.
// Only handles ISO dates — all other formats are handled by extractTimeRangeViaLlm.
const text = `${anomalySummary} ${userMessage ?? ""}`.replace(/[\u2010-\u2015\u2212]/g, "-");
const dateMatch = text.match(/(\d{4}-\d{2}-\d{2})/);
if (dateMatch) {
// Found a date like 2026-03-02 — show that full day
return { from: `${dateMatch[1]}T00:00:00`, to: `${dateMatch[1]}T23:59:59` };
return { from: `${dateMatch[1]}T00:00:00Z`, to: `${dateMatch[1]}T23:59:59Z` };
}
return { from: "now-6h", to: "now" };
}
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says this uses Grafana’s day-rounding syntax (/d) to produce precise day boundaries, but the implementation only returns ISO day boundaries (...T00:00:00Z...T23:59:59Z) or now-6h/now. Either update the implementation to actually use /d (if that’s desired/required elsewhere), or adjust the comment so it reflects the current behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +879 to +893
private toRfc3339Window(window: { from: string; to: string }): { startRfc3339: string; endRfc3339: string } {
const resolve = (expr: string): string => {
if (/^\d{4}-\d{2}-\d{2}/.test(expr)) return new Date(expr).toISOString();
const m = expr.match(/^now(?:-(\d+)([dhm]))?/);
if (m) {
const d = new Date();
if (m[1] && m[2]) {
const n = parseInt(m[1], 10);
if (m[2] === "d") d.setDate(d.getDate() - n);
else if (m[2] === "h") d.setHours(d.getHours() - n);
else d.setMinutes(d.getMinutes() - n);
}
return d.toISOString();
}
return new Date(Date.now() - 7 * 86400000).toISOString();
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

toRfc3339Window() parses relative expressions with ^now(?:-(\d+)([dhm]))?, but won’t match common Grafana forms like now-1d/d (day-rounded) that you already mention supporting elsewhere. In that case it falls back to a 7-day window, which can make Loki probe queries unexpectedly expensive and broaden the selector search. Consider accepting an optional /d suffix (and possibly w as well, since suggestStepSeconds() supports it).

Copilot uses AI. Check for mistakes.
Comment on lines +196 to +205
const abortRef = useRef<AbortController | null>(null);
const lastEscRef = useRef(0);
const threadId = "cli-session";

useInput((_input, key) => {
useInput((input, key) => {
if (key.escape && isThinking) {
const now = Date.now();
if (now - lastEscRef.current < 500) {
abortRef.current?.abort();
lastEscRef.current = 0;
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Esc-Esc "abort" flow only aborts a local AbortController, but that signal is never passed into router.route, investigationAgent.investigate, agent.chat, or any underlying LLM/MCP calls. As a result, the user sees "Query aborted" but the expensive background work still continues to completion. Either plumb AbortSignal through the agent/LLM/MCP layers (and actually cancel in-flight requests), or remove/rename this behavior so it doesn’t imply cancellation.

Copilot uses AI. Check for mistakes.
dispatch({ type: "move-cursor-left" });
} else if (key.rightArrow) {
dispatch({ type: "move-cursor-right" });
} else if (key.backspace || key.delete) {
Copy link

Copilot AI Mar 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

key.delete (forward delete) is handled the same as backspace here, which deletes the character before the cursor. That makes the Delete key behave incorrectly. Consider introducing a separate action for forward-delete that removes the character at cursorOffset (if any) while keeping cursorOffset unchanged.

Suggested change
} else if (key.backspace || key.delete) {
} else if (key.delete) {
// Forward delete: remove character at cursor without moving cursor back
if (state.cursorOffset < state.value.length) {
dispatch({ type: "move-cursor-right" });
dispatch({ type: "delete" });
}
} else if (key.backspace) {

Copilot uses AI. Check for mistakes.
WZ added a commit that referenced this pull request Apr 2, 2026
feat: deterministic intent routing & robust service matching
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants