Skip to content

fix(experiments): apply generation overrides to subject provider before evaluation#1427

Merged
bug-ops merged 3 commits intomainfrom
experiment-engine-variation
Mar 9, 2026
Merged

fix(experiments): apply generation overrides to subject provider before evaluation#1427
bug-ops merged 3 commits intomainfrom
experiment-engine-variation

Conversation

@bug-ops
Copy link
Owner

@bug-ops bug-ops commented Mar 9, 2026

Summary

  • Fixes Experiment engine: variations not applied to subject provider #1407: experiment engine was evaluating all variations with the same unmodified subject provider, producing delta=0.0 for every variation
  • Adds GenerationOverrides struct to zeph-llm and with_generation_overrides() builder to all provider types and AnyProvider
  • Engine now creates a cloned+patched provider per variation before calling evaluator.evaluate()

Changes

  • zeph-llm/src/provider.rs: new GenerationOverrides struct (temperature, top_p, top_k, frequency_penalty, presence_penalty)
  • zeph-llm/src/{ollama,claude,openai,compatible,mock}.rs: generation_overrides field + with_generation_overrides() builder; overrides applied in chat(), chat_stream(), chat_with_tools(), chat_typed() paths
  • zeph-llm/src/any.rs: with_generation_overrides() dispatches to all variants; Orchestrator/Router warn and return self unchanged
  • zeph-core/src/experiments/snapshot.rs: removes local GenerationOverrides, re-exports from zeph_llm
  • zeph-core/src/experiments/engine.rs: line 249 creates patched provider with overrides before each evaluator.evaluate() call; removes stale TODO comment
  • Claude temperature field corrected from Option<u32> (thousandths) to Option<f64> across all request structs

Test plan

  • cargo +nightly fmt --check — clean
  • cargo clippy --workspace --features full -- -D warnings — clean
  • cargo nextest run --config-file .github/nextest.toml --workspace --features full --lib --bins — 4860 passed (+3 new unit tests for with_generation_overrides state propagation)

bug-ops added 2 commits March 9, 2026 07:04
…re evaluation (#1407)

The experiment engine was calling evaluator.evaluate(&self.subject) with
the same unmodified provider on every iteration, producing delta=0.0 for
all variations.

Add GenerationOverrides to zeph-llm and implement with_generation_overrides()
on all provider types (Ollama, Claude, OpenAI, Compatible, Mock) and
AnyProvider. Apply overrides in chat(), chat_stream(), chat_with_tools(),
and chat_typed() paths. Remove local GenerationOverrides from snapshot.rs
and re-export from zeph-llm. Engine now creates a cloned+patched provider
per variation before evaluation.

- Claude temperature field changed from Option<u32> to Option<f64>
- Ollama: frequency_penalty/presence_penalty silently dropped (unsupported
  by ModelOptions; documented with comment and debug log)
- Orchestrator/Router variants warn and return self unchanged
@github-actions github-actions bot added bug Something isn't working documentation Improvements or additions to documentation llm zeph-llm crate (Ollama, Claude) size/L Large PR (201-500 lines) rust Rust code changes core zeph-core crate and removed bug Something isn't working labels Mar 9, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 9, 2026 06:05
@github-actions github-actions bot added the bug Something isn't working label Mar 9, 2026
@bug-ops bug-ops merged commit 2b6eb38 into main Mar 9, 2026
25 checks passed
@bug-ops bug-ops deleted the experiment-engine-variation branch March 9, 2026 06:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working core zeph-core crate documentation Improvements or additions to documentation llm zeph-llm crate (Ollama, Claude) rust Rust code changes size/L Large PR (201-500 lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Experiment engine: variations not applied to subject provider

1 participant