feat: track judge cost per report

## Summary

RAKI doesn't track how much the LLM judge calls cost to generate a report. Users can't answer "how much did this evaluation cost?" without checking their provider billing.

## Current State

- `JudgeLogger` (`src/raki/metrics/ragas/llm_setup.py:141-171`) logs metric/input/score/reason per call — **no token counts, no cost**
- LLM clients created via `create_ragas_llm()` using `AsyncAnthropicVertex`, `AsyncAnthropic`, or `genai.Client`
- Token usage from judge calls is completely opaque — Ragas/instructor discard the raw API response and return only parsed Pydantic models
- The Anthropic SDK returns `usage.input_tokens` + `usage.output_tokens` on every `Message` response

## Implementation Approach

### 1. Token Accumulator (cross-cutting, engine-level)

```python
@dataclass
class TokenAccumulator:
    input_tokens: int = 0
    output_tokens: int = 0
    calls: int = 0
```

Owned by `MetricsEngine.run()`, not per-metric. Created once, injected into `create_ragas_llm()` via `MetricConfig`, read after all metrics complete.

### 2. Client monkey-patch (not a proxy class)

Patch `client.messages.create` in-place before passing to `llm_factory()`. This preserves client identity (instructor does structural checks on the type). The monkey-patch sits below both Ragas and instructor — transparent to both.

```python
def patch_client_for_token_tracking(client, accumulator: TokenAccumulator):
    original_create = client.messages.create

    async def tracked_create(*args, **kwargs):
        response = await original_create(*args, **kwargs)
        if hasattr(response, "usage"):
            accumulator.input_tokens += response.usage.input_tokens
            accumulator.output_tokens += response.usage.output_tokens
            accumulator.calls += 1
        return response

    client.messages.create = tracked_create
```

No lock needed — asyncio is single-threaded/cooperative, `+=` after `await` is atomic within the event loop.

### 3. Report output

Add to `EvalReport` config dict in `MetricsEngine.run()`:

```json
"judge_cost": {
  "input_tokens": 15000,
  "output_tokens": 3000,
  "calls": 24,
  "total_usd": null
}
```

`total_usd` is computed in the report layer using a pricing lookup (tokens × model price). Set to `null` if pricing for the model is unknown.

### 4. Google provider

For `genai.Client`, a separate patch function targeting the equivalent method surface. Both patch functions behind a common `TokenAccumulator` — the accumulator is provider-agnostic.

## Files to Change

- `src/raki/metrics/ragas/llm_setup.py` — add `TokenAccumulator`, `patch_client_for_token_tracking()`, apply patch in `create_ragas_llm()`
- `src/raki/metrics/protocol.py` — add `token_accumulator: TokenAccumulator | None = None` to `MetricConfig`
- `src/raki/metrics/engine.py` — create accumulator, inject into config, read totals into report config
- `src/raki/report/cli_summary.py` — display judge cost in CLI summary
- `src/raki/report/html_report.py` — display judge cost in HTML report
- Tests for: accumulator, client patch, engine aggregation, report display

## What NOT to do

- Don't thread tokens through `ScoringState` or `MetricResult` — token tracking is a cross-cutting concern, not per-metric
- Don't subclass or `__getattr__`-proxy the client — breaks instructor's structural checks
- Don't estimate tokens post-hoc from logged text — inaccurate, can't capture output tokens
- Don't put USD pricing logic in the metrics layer — belongs in report layer

## Depends On

#182 (shared scoring loop) — already shipped (PR #201).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: track judge cost per report #174

Summary

Current State

Implementation Approach

1. Token Accumulator (cross-cutting, engine-level)

2. Client monkey-patch (not a proxy class)

3. Report output

4. Google provider

Files to Change

What NOT to do

Depends On

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: track judge cost per report #174

Description

Summary

Current State

Implementation Approach

1. Token Accumulator (cross-cutting, engine-level)

2. Client monkey-patch (not a proxy class)

3. Report output

4. Google provider

Files to Change

What NOT to do

Depends On

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions