Skip to content

feat(report): show metric context in score cards — thresholds, breakdowns, tooltips#301

Merged
decko merged 4 commits into
mainfrom
soda/289
May 19, 2026
Merged

feat(report): show metric context in score cards — thresholds, breakdowns, tooltips#301
decko merged 4 commits into
mainfrom
soda/289

Conversation

@decko
Copy link
Copy Markdown
Owner

@decko decko commented May 19, 2026

Summary

Adds metric context to score cards in the HTML report, making it easier for users to interpret scores without external reference.

Changes

  • Score card metric context — cards display a static threshold hint (≥80 % = calibrated · <80 % = miscalibrated); file_prediction_accuracy cards show live micro precision and recall values read from report.metric_details with null guards.
  • Score chip hover tooltips — per-session drill-down <span class="score-chip"> elements now carry title= attributes populated from the metric metadata registry, so hovering reveals the metric description.
  • Collapse empty Retrieval Quality section — when no retrieval metrics are present the full category-section div is dropped and only a single footnote line is rendered: Retrieval metrics omitted — run with --judge to enable LLM-backed evaluation.
  • Towncrier fragmentchanges/289.feature added.

Acceptance Criteria

  • triage_calibration score card shows threshold hint text
  • file_prediction_accuracy score card shows micro precision and recall when available
  • Score chip tooltips populated from metric metadata registry
  • Empty Retrieval Quality section replaced by a footnote-only note (no empty category-section div rendered)
  • All new behaviour covered by tests (5 new tests added)
  • Towncrier fragment present at changes/289.feature

Review Results

Specialist Verdict Findings
General reviewer ✅ pass 0 findings

Refs #289


Assisted-by: Claude Opus 4.6 (1M context) noreply@anthropic.com
Assigned-by: decko

decko added 4 commits May 19, 2026 16:35
…prediction_accuracy

- Add .metric-context CSS class for muted sub-text in score cards
- Show static threshold hint (>=80% = calibrated) for triage_calibration
- Show micro precision/recall breakdown for file_prediction_accuracy when
  metric_details are available, with null guards
- 2 new tests covering both metric-context blocks
…drill-down

- Add title= attribute to .score-chip <span> sourcing the metric
  description from metric_metadata
- Tooltips are visible on hover in all browsers natively
- 1 new test verifying title= attribute and description text
…ly note

- Merge the two fallback {% elif %} / {% else %} branches into a single
  {% else %} that renders only a <p class="footnote"> guidance note
- No category-section div is rendered when retrieval metrics are absent,
  regardless of whether has_retrieval is True or False
- Update test_no_retrieval_metrics_shows_empty_state: assert footnote class
  and 'Retrieval Quality' absent (was: assert empty-state and old text)
- Update test_footnote_when_no_retrieval: assert footnote class (was: assert
  one of two legacy text strings)
- 2 new tests verifying no category-section div in both fallback cases
@decko decko added the ai-assisted Implemented with AI assistance label May 19, 2026
@decko decko merged commit ed2ab2a into main May 19, 2026
4 checks passed
@decko decko deleted the soda/289 branch May 19, 2026 20:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai-assisted Implemented with AI assistance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant