Summary
The report JSON's config section stores llm_model but not llm_provider, temperature, or max_tokens. This is the first half of the judge config work — adding the fields and serializing them.
The --diff warning comparison is handled separately in #187.
Scope (~60K token budget)
| File |
Change |
Lines |
src/raki/metrics/protocol.py |
Add max_tokens: int | None = None to MetricConfig |
~5 |
src/raki/model/report.py |
Add llm_provider, llm_temperature, llm_max_tokens fields to EvalReport |
~10 |
src/raki/metrics/engine.py |
Serialize all judge fields into report.config dict |
~10 |
src/raki/report/json_report.py |
No changes needed (uses model_dump) |
0 |
tests/test_report.py |
Config serialization tests, backward compat |
~60 |
tests/test_cli.py |
Verify JSON output includes new fields |
~30 |
Current State
"config": {
"llm_model": "claude-sonnet-4-6",
"metrics": [...],
"skip_llm": false
}
Target State
"config": {
"llm_provider": "vertex-anthropic",
"llm_model": "claude-sonnet-4-6",
"llm_temperature": 0.0,
"llm_max_tokens": 4096,
"metrics": [...],
"skip_llm": false
}
Backward Compatibility
Old reports missing llm_provider/llm_temperature/llm_max_tokens load without error — fields default to None.
Acceptance Criteria
Implementation Plan
Task 1: Add max_tokens to MetricConfig
Files: src/raki/metrics/protocol.py, tests/test_report.py
- Write failing test:
MetricConfig(max_tokens=4096) → config.max_tokens == 4096
- Add
max_tokens: int | None = None to MetricConfig
Task 2: Serialize judge fields in engine
Files: src/raki/metrics/engine.py, tests/test_report.py
- Write failing test: run engine with
requires_llm=True metric → report config has llm_provider, llm_model, llm_temperature, llm_max_tokens
- Write failing test: run engine with
skip_llm=True → report config has None for all judge fields
- Update
MetricsEngine.run() to populate config dict with all four fields from self._config
- Remove any duplicate
llm_model key
Task 3: Backward compatibility
Files: tests/test_report.py
- Write failing test: load a JSON fixture without judge fields → no error, missing fields are None
- Verify
load_json_report() handles missing fields gracefully (Pydantic defaults)
Task 4: Verification
uv run pytest tests/test_report.py -v
uv run pytest tests/ -v -m "not slow" — no regressions
uv run ruff check src/ tests/ && uv run ruff format src/ tests/
uv run ty check src/raki/
Summary
The report JSON's config section stores
llm_modelbut notllm_provider,temperature, ormax_tokens. This is the first half of the judge config work — adding the fields and serializing them.The
--diffwarning comparison is handled separately in #187.Scope (~60K token budget)
src/raki/metrics/protocol.pymax_tokens: int | None = NonetoMetricConfigsrc/raki/model/report.pyllm_provider,llm_temperature,llm_max_tokensfields toEvalReportsrc/raki/metrics/engine.pyreport.configdictsrc/raki/report/json_report.pymodel_dump)tests/test_report.pytests/test_cli.pyCurrent State
Target State
Backward Compatibility
Old reports missing
llm_provider/llm_temperature/llm_max_tokensload without error — fields default to None.Acceptance Criteria
max_tokens: int | None = Noneadded toMetricConfiginprotocol.pyllm_provider: str | None,llm_temperature: float | None,llm_max_tokens: int | Nonefields onEvalReport(or serialized intoconfigdict)MetricsEngine.run()populates all four judge fields (llm_provider,llm_model,llm_temperature,llm_max_tokens) into report config when LLM is usedskip_llm=True, judge fields are None in report configload_json_report()without error (default to None)llm_modelif needed)Implementation Plan
Task 1: Add max_tokens to MetricConfig
Files:
src/raki/metrics/protocol.py,tests/test_report.pyMetricConfig(max_tokens=4096)→config.max_tokens == 4096max_tokens: int | None = NonetoMetricConfigTask 2: Serialize judge fields in engine
Files:
src/raki/metrics/engine.py,tests/test_report.pyrequires_llm=Truemetric → report config hasllm_provider,llm_model,llm_temperature,llm_max_tokensskip_llm=True→ report config has None for all judge fieldsMetricsEngine.run()to populate config dict with all four fields fromself._configllm_modelkeyTask 3: Backward compatibility
Files:
tests/test_report.pyload_json_report()handles missing fields gracefully (Pydantic defaults)Task 4: Verification
uv run pytest tests/test_report.py -vuv run pytest tests/ -v -m "not slow"— no regressionsuv run ruff check src/ tests/ && uv run ruff format src/ tests/uv run ty check src/raki/