Summary
Reports should track both:
- Judge model: what evaluated the session (from MetricConfig —
llm_provider + llm_model)
- Agent model: what ran the session (from
SessionMeta.model_id)
Currently model_id is populated by both adapters and stored in report data (sample_results[].sample.session.model_id), but it is invisible in all report output — not shown in CLI summary, HTML report, or diff.
Use Case
When diffing runs that used different agent models but the same judge, users need to know which model produced better work — not just that scores changed.
Locations to Change
Data (already exists — no changes needed)
src/raki/model/dataset.py:20 — SessionMeta.model_id field
src/raki/adapters/session_schema.py — extracts model_id from meta.json
src/raki/adapters/alcove.py — extracts model_id from system entry
Report output (needs additions)
src/raki/report/cli_summary.py — show agent model in CLI summary header
src/raki/report/html_report.py — show agent model in HTML report header
src/raki/report/diff.py — surface agent model difference when comparing reports (alongside existing judge config comparison at lines 83-84)
Open question
- If a report evaluates multiple sessions with different agent models, how to display? Options: show unique set, show per-sample, or require homogeneous agent model per report.
Depends On
#173 (judge config in report) — already shipped (PR #188).
Summary
Reports should track both:
llm_provider+llm_model)SessionMeta.model_id)Currently
model_idis populated by both adapters and stored in report data (sample_results[].sample.session.model_id), but it is invisible in all report output — not shown in CLI summary, HTML report, or diff.Use Case
When diffing runs that used different agent models but the same judge, users need to know which model produced better work — not just that scores changed.
Locations to Change
Data (already exists — no changes needed)
src/raki/model/dataset.py:20—SessionMeta.model_idfieldsrc/raki/adapters/session_schema.py— extractsmodel_idfrom meta.jsonsrc/raki/adapters/alcove.py— extractsmodel_idfrom system entryReport output (needs additions)
src/raki/report/cli_summary.py— show agent model in CLI summary headersrc/raki/report/html_report.py— show agent model in HTML report headersrc/raki/report/diff.py— surface agent model difference when comparing reports (alongside existing judge config comparison at lines 83-84)Open question
Depends On
#173 (judge config in report) — already shipped (PR #188).