feat: serialize judge config fields into report JSON

## Summary

The report JSON's config section stores `llm_model` but not `llm_provider`, `temperature`, or `max_tokens`. This is the first half of the judge config work — adding the fields and serializing them.

The `--diff` warning comparison is handled separately in #187.

## Scope (~60K token budget)

| File | Change | Lines |
|------|--------|-------|
| `src/raki/metrics/protocol.py` | Add `max_tokens: int \| None = None` to `MetricConfig` | ~5 |
| `src/raki/model/report.py` | Add `llm_provider`, `llm_temperature`, `llm_max_tokens` fields to `EvalReport` | ~10 |
| `src/raki/metrics/engine.py` | Serialize all judge fields into `report.config` dict | ~10 |
| `src/raki/report/json_report.py` | No changes needed (uses `model_dump`) | 0 |
| `tests/test_report.py` | Config serialization tests, backward compat | ~60 |
| `tests/test_cli.py` | Verify JSON output includes new fields | ~30 |

## Current State

```json
"config": {
  "llm_model": "claude-sonnet-4-6",
  "metrics": [...],
  "skip_llm": false
}
```

## Target State

```json
"config": {
  "llm_provider": "vertex-anthropic",
  "llm_model": "claude-sonnet-4-6",
  "llm_temperature": 0.0,
  "llm_max_tokens": 4096,
  "metrics": [...],
  "skip_llm": false
}
```

## Backward Compatibility

Old reports missing `llm_provider`/`llm_temperature`/`llm_max_tokens` load without error — fields default to None.


## Acceptance Criteria

- [ ] `max_tokens: int | None = None` added to `MetricConfig` in `protocol.py`
- [ ] `llm_provider: str | None`, `llm_temperature: float | None`, `llm_max_tokens: int | None` fields on `EvalReport` (or serialized into `config` dict)
- [ ] `MetricsEngine.run()` populates all four judge fields (`llm_provider`, `llm_model`, `llm_temperature`, `llm_max_tokens`) into report config when LLM is used
- [ ] When `skip_llm=True`, judge fields are None in report config
- [ ] Old JSON reports without these fields load via `load_json_report()` without error (default to None)
- [ ] No duplicate keys in config dict (consolidate `llm_model` if needed)
- [ ] Tests: (a) engine sets all judge fields when LLM used, (b) engine sets None when skip_llm, (c) old report without fields loads cleanly, (d) roundtrip JSON serialization preserves new fields



## Implementation Plan

### Task 1: Add max_tokens to MetricConfig

**Files:** `src/raki/metrics/protocol.py`, `tests/test_report.py`

1. Write failing test: `MetricConfig(max_tokens=4096)` → `config.max_tokens == 4096`
2. Add `max_tokens: int | None = None` to `MetricConfig`

### Task 2: Serialize judge fields in engine

**Files:** `src/raki/metrics/engine.py`, `tests/test_report.py`

1. Write failing test: run engine with `requires_llm=True` metric → report config has `llm_provider`, `llm_model`, `llm_temperature`, `llm_max_tokens`
2. Write failing test: run engine with `skip_llm=True` → report config has None for all judge fields
3. Update `MetricsEngine.run()` to populate config dict with all four fields from `self._config`
4. Remove any duplicate `llm_model` key

### Task 3: Backward compatibility

**Files:** `tests/test_report.py`

1. Write failing test: load a JSON fixture without judge fields → no error, missing fields are None
2. Verify `load_json_report()` handles missing fields gracefully (Pydantic defaults)

### Task 4: Verification

1. `uv run pytest tests/test_report.py -v`
2. `uv run pytest tests/ -v -m "not slow"` — no regressions
3. `uv run ruff check src/ tests/ && uv run ruff format src/ tests/`
4. `uv run ty check src/raki/`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: serialize judge config fields into report JSON #173

Summary

Scope (~60K token budget)

Current State

Target State

Backward Compatibility

Acceptance Criteria

Implementation Plan

Task 1: Add max_tokens to MetricConfig

Task 2: Serialize judge fields in engine

Task 3: Backward compatibility

Task 4: Verification

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

File	Change	Lines
`src/raki/metrics/protocol.py`	Add `max_tokens: int \| None = None` to `MetricConfig`	~5
`src/raki/model/report.py`	Add `llm_provider`, `llm_temperature`, `llm_max_tokens` fields to `EvalReport`	~10
`src/raki/metrics/engine.py`	Serialize all judge fields into `report.config` dict	~10
`src/raki/report/json_report.py`	No changes needed (uses `model_dump`)	0
`tests/test_report.py`	Config serialization tests, backward compat	~60
`tests/test_cli.py`	Verify JSON output includes new fields	~30

feat: serialize judge config fields into report JSON #173

Description

Summary

Scope (~60K token budget)

Current State

Target State

Backward Compatibility

Acceptance Criteria

Implementation Plan

Task 1: Add max_tokens to MetricConfig

Task 2: Serialize judge fields in engine

Task 3: Backward compatibility

Task 4: Verification

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions