Skip to content

fix: knowledge_miss_rate 1.0 on SODA sessions — ground truth matching broken #183

@decko

Description

@decko

Summary

knowledge_miss_rate returns 1.0 (100% covered) on SODA sessions, which is misleading. The metric reports "the KB covered everything the agent failed on" — but the "knowledge base" is just synthesized from the session's own outputs, not actual reference docs.

Root Cause

Two separate problems:

Problem 1: Synthesized context creates false positives (knowledge metrics)

The session-schema adapter synthesizes knowledge_context from structured phase outputs (approach, code_area, files, plan tasks, etc.) when none is explicitly provided (session_schema.py:109-129). The legacy word-overlap check in miss_rate.py:129 uses a threshold of 1 shared token (if issue_words & knowledge_words), which is trivially easy to satisfy against a broad synthesized context containing all of the session's own domain terminology.

Result: every finding is marked as "covered" → knowledge_miss_rate = 1.0.

Fix: When knowledge_context is synthesized (not explicitly provided), either:

  • Skip the knowledge metric entirely and return None (conservative)
  • Use a stricter matching threshold (e.g., require 3+ shared tokens like word_match does)
  • Add a context_source: Literal["explicit", "synthesized"] field to PhaseResult so the metric can distinguish

Problem 2: Alcove adapter never sets output_structured (ground truth matching)

AlcoveAdapter._build_phases() never passes output_structured to PhaseResult() (lines 398-438). This means _extract_domains() in matcher.py always returns an empty set → no ground truth entries match → ground truth matching is broken for all Alcove/bridge sessions.

Fix: Populate output_structured from the phase's JSON content in the Alcove adapter.

Files to Change

Problem 1 (knowledge metrics):

  • src/raki/metrics/knowledge/miss_rate.py — skip or gate when context is synthesized
  • src/raki/metrics/knowledge/gap_rate.py — same treatment
  • src/raki/metrics/knowledge/_common.py — add context_source awareness to extract_knowledge_context()
  • src/raki/adapters/session_schema.py — tag synthesized context with source marker
  • src/raki/model/phases.py — add context_source field to PhaseResult (if needed)

Problem 2 (ground truth matching):

  • src/raki/adapters/alcove.py — populate output_structured in _build_phases()
  • tests/test_adapters.py — test that output_structured is populated

Acceptance Criteria

  • knowledge_miss_rate returns None (not 1.0) when only synthesized context is available and no --docs-path is provided
  • knowledge_miss_rate works correctly when explicit knowledge_context or --docs-path is provided
  • Alcove adapter populates output_structured on phases
  • Ground truth matching works for Alcove/bridge sessions that have code_area in their phases
  • Test coverage for both fixes with SODA-format and Alcove-format fixtures

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions