Skip to content

feat: distinguish agent model from judge model in reports #179

@decko

Description

@decko

Summary

Reports should track both:

  • Judge model: what evaluated the session (from MetricConfig — llm_provider + llm_model)
  • Agent model: what ran the session (from SessionMeta.model_id)

Currently model_id is populated by both adapters and stored in report data (sample_results[].sample.session.model_id), but it is invisible in all report output — not shown in CLI summary, HTML report, or diff.

Use Case

When diffing runs that used different agent models but the same judge, users need to know which model produced better work — not just that scores changed.

Locations to Change

Data (already exists — no changes needed)

  • src/raki/model/dataset.py:20SessionMeta.model_id field
  • src/raki/adapters/session_schema.py — extracts model_id from meta.json
  • src/raki/adapters/alcove.py — extracts model_id from system entry

Report output (needs additions)

  • src/raki/report/cli_summary.py — show agent model in CLI summary header
  • src/raki/report/html_report.py — show agent model in HTML report header
  • src/raki/report/diff.py — surface agent model difference when comparing reports (alongside existing judge config comparison at lines 83-84)

Open question

  • If a report evaluates multiple sessions with different agent models, how to display? Options: show unique set, show per-sample, or require homogeneous agent model per report.

Depends On

#173 (judge config in report) — already shipped (PR #188).

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions