Skip to content

feat(dashboard): root cause explorer — trace-driven failure diagnosis #786

@christso

Description

@christso

Objective

Add a dashboard view that helps users diagnose why tests fail, not just that they failed. Combines trace data, failure clustering, and git correlation to surface root causes and suggest fixes.

Architecture Boundary

external-first — dashboard analysis layer. Reads existing trace.jsonl and results.jsonl data. Does not modify the eval engine.

What this enables

Currently, debugging a failed eval requires manually reading trace files and comparing runs. The root cause explorer automates this:

  • Failure clustering: Group similar failures across tests and runs by error pattern
  • Trace filtering: Filter traces by tool, error type, latency, token usage
  • Git correlation: Link score changes to specific commits
  • Fix suggestions: Based on failure patterns, suggest prompt/logic adjustments

Proposed views

Failure Overview

  • Failure heatmap: tests × runs, colored by score (green/yellow/red)
  • Top failure clusters with frequency and affected tests
  • Score change timeline with git commit annotations

Failure Cluster Detail

  • Similar failures grouped by error pattern (e.g., "tool not found", "timeout", "wrong format")
  • Representative traces for each cluster
  • Frequency trend: is this cluster growing or shrinking?

Trace Drill-Down

Git Correlation

  • Score timeline with commit markers
  • Click commit → see which tests regressed
  • Diff view: changed files that correlate with score drops

Design Latitude

  • Clustering algorithm (simple string matching, embedding-based, or LLM-assisted)
  • Whether fix suggestions use LLM or pattern matching
  • How to handle missing trace data (older runs without traces)
  • Git integration depth (just commit hashes vs. full diff display)

Acceptance Signals

  • Failures are clustered by error pattern across tests and runs
  • Users can filter traces by tool, error type, latency
  • Side-by-side trace comparison highlights divergence points
  • Score timeline shows git commit correlation
  • At least basic fix suggestions based on common failure patterns

Non-Goals

  • Automated fix application (suggest only)
  • Custom clustering model training
  • Integration with external error tracking (Sentry, etc.)

Dependencies

Research source

  • melagiri/code-insights — pattern detection, friction point identification, root cause analysis across sessions

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestwuiRelates to the browser dashboard / web UI runtime

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions