Summary
knowledge_miss_rate returns 1.0 (100% covered) on SODA sessions, which is misleading. The metric reports "the KB covered everything the agent failed on" — but the "knowledge base" is just synthesized from the session's own outputs, not actual reference docs.
Root Cause
Two separate problems:
Problem 1: Synthesized context creates false positives (knowledge metrics)
The session-schema adapter synthesizes knowledge_context from structured phase outputs (approach, code_area, files, plan tasks, etc.) when none is explicitly provided (session_schema.py:109-129). The legacy word-overlap check in miss_rate.py:129 uses a threshold of 1 shared token (if issue_words & knowledge_words), which is trivially easy to satisfy against a broad synthesized context containing all of the session's own domain terminology.
Result: every finding is marked as "covered" → knowledge_miss_rate = 1.0.
Fix: When knowledge_context is synthesized (not explicitly provided), either:
- Skip the knowledge metric entirely and return
None (conservative)
- Use a stricter matching threshold (e.g., require 3+ shared tokens like
word_match does)
- Add a
context_source: Literal["explicit", "synthesized"] field to PhaseResult so the metric can distinguish
Problem 2: Alcove adapter never sets output_structured (ground truth matching)
AlcoveAdapter._build_phases() never passes output_structured to PhaseResult() (lines 398-438). This means _extract_domains() in matcher.py always returns an empty set → no ground truth entries match → ground truth matching is broken for all Alcove/bridge sessions.
Fix: Populate output_structured from the phase's JSON content in the Alcove adapter.
Files to Change
Problem 1 (knowledge metrics):
src/raki/metrics/knowledge/miss_rate.py — skip or gate when context is synthesized
src/raki/metrics/knowledge/gap_rate.py — same treatment
src/raki/metrics/knowledge/_common.py — add context_source awareness to extract_knowledge_context()
src/raki/adapters/session_schema.py — tag synthesized context with source marker
src/raki/model/phases.py — add context_source field to PhaseResult (if needed)
Problem 2 (ground truth matching):
src/raki/adapters/alcove.py — populate output_structured in _build_phases()
tests/test_adapters.py — test that output_structured is populated
Acceptance Criteria
Summary
knowledge_miss_ratereturns 1.0 (100% covered) on SODA sessions, which is misleading. The metric reports "the KB covered everything the agent failed on" — but the "knowledge base" is just synthesized from the session's own outputs, not actual reference docs.Root Cause
Two separate problems:
Problem 1: Synthesized context creates false positives (knowledge metrics)
The session-schema adapter synthesizes
knowledge_contextfrom structured phase outputs (approach, code_area, files, plan tasks, etc.) when none is explicitly provided (session_schema.py:109-129). The legacy word-overlap check inmiss_rate.py:129uses a threshold of 1 shared token (if issue_words & knowledge_words), which is trivially easy to satisfy against a broad synthesized context containing all of the session's own domain terminology.Result: every finding is marked as "covered" →
knowledge_miss_rate = 1.0.Fix: When
knowledge_contextis synthesized (not explicitly provided), either:None(conservative)word_matchdoes)context_source: Literal["explicit", "synthesized"]field toPhaseResultso the metric can distinguishProblem 2: Alcove adapter never sets
output_structured(ground truth matching)AlcoveAdapter._build_phases()never passesoutput_structuredtoPhaseResult()(lines 398-438). This means_extract_domains()inmatcher.pyalways returns an empty set → no ground truth entries match → ground truth matching is broken for all Alcove/bridge sessions.Fix: Populate
output_structuredfrom the phase's JSON content in the Alcove adapter.Files to Change
Problem 1 (knowledge metrics):
src/raki/metrics/knowledge/miss_rate.py— skip or gate when context is synthesizedsrc/raki/metrics/knowledge/gap_rate.py— same treatmentsrc/raki/metrics/knowledge/_common.py— addcontext_sourceawareness toextract_knowledge_context()src/raki/adapters/session_schema.py— tag synthesized context with source markersrc/raki/model/phases.py— addcontext_sourcefield toPhaseResult(if needed)Problem 2 (ground truth matching):
src/raki/adapters/alcove.py— populateoutput_structuredin_build_phases()tests/test_adapters.py— test thatoutput_structuredis populatedAcceptance Criteria
knowledge_miss_ratereturnsNone(not 1.0) when only synthesized context is available and no--docs-pathis providedknowledge_miss_rateworks correctly when explicitknowledge_contextor--docs-pathis providedoutput_structuredon phasescode_areain their phases