Skip to content

feat(experiments): wire eval_model config field to evaluator construction#2124

Merged
bug-ops merged 1 commit intomainfrom
feat-issue-2113-wire-eval-model-to-evaluator
Mar 22, 2026
Merged

feat(experiments): wire eval_model config field to evaluator construction#2124
bug-ops merged 1 commit intomainfrom
feat-issue-2113-wire-eval-model-to-evaluator

Conversation

@bug-ops
Copy link
Owner

@bug-ops bug-ops commented Mar 22, 2026

Closes #2113

Summary

  • When experiments.eval_model is set (e.g. openai/gpt-4o, claude/claude-opus-4-6), a dedicated judge provider is created for the evaluator so the judge is independent from the agent under test
  • Falls back to the agent's primary provider when eval_model is not configured (existing behavior preserved, no breaking changes)
  • Wired in both paths: agent loop (/experiment start) and standalone CLI (--experiment-run)
  • Removes the TODO(#eval-model) comment from experiment_cmd.rs

Changes

  • ExperimentState: new eval_provider: Option<AnyProvider> field (feature-gated)
  • AppBuilder::build_eval_provider(): creates provider from eval_model spec using existing create_summary_provider (supports ollama/<model>, claude, openai, compatible/<name>)
  • Agent::with_eval_provider(): builder method to inject the eval provider
  • runner.rs: wired in agent build path and run_experiment_session

Test plan

  • All 6366 existing tests pass (cargo nextest run --workspace --features full --lib --bins)
  • cargo clippy --workspace --features full -- -D warnings clean
  • cargo +nightly fmt --check clean
  • Feature is enabled = false by default — no production impact

@bug-ops bug-ops enabled auto-merge (squash) March 22, 2026 13:38
@github-actions github-actions bot added enhancement New feature or request size/M Medium PR (51-200 lines) documentation Improvements or additions to documentation rust Rust code changes core zeph-core crate and removed size/M Medium PR (51-200 lines) labels Mar 22, 2026
…tion (#2113)

When experiments.eval_model is set, create a dedicated judge provider so
the evaluator is independent from the agent under test. Falls back to the
primary provider when eval_model is unset (existing behavior preserved).

- Add eval_provider field to ExperimentState (feature-gated)
- Add AppBuilder::build_eval_provider() using create_summary_provider
- Add Agent::with_eval_provider() builder method
- Wire eval provider in both agent (/experiment start) and --experiment-run paths
- Remove the TODO(#eval-model) comment from experiment_cmd.rs
@bug-ops bug-ops force-pushed the feat-issue-2113-wire-eval-model-to-evaluator branch from ede64f7 to f0004d0 Compare March 22, 2026 13:43
@github-actions github-actions bot added the size/M Medium PR (51-200 lines) label Mar 22, 2026
@bug-ops bug-ops merged commit f6bc748 into main Mar 22, 2026
25 checks passed
@bug-ops bug-ops deleted the feat-issue-2113-wire-eval-model-to-evaluator branch March 22, 2026 13:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request rust Rust code changes size/M Medium PR (51-200 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(experiments): wire eval_model config field to evaluator construction

1 participant