feat(router): RAPS Bayesian reputation scoring (#1886)#2091
Merged
Conversation
Add per-provider quality reputation tracking to AgentRouter using Beta distributions (RAPS — Reputation-Adjusted Provider Selection). Key design decisions: - Tool execution quality outcomes (InvalidParams only) shift routing scores separately from API availability, tracked in Beta(alpha, beta) per provider - Session-level decay (default 0.95) shrinks evidence toward uniform prior, preventing stale observations from permanently biasing routing - Minimum observation threshold (default 5) gates all routing influence until enough data is accumulated - Cascade strategy is a no-op — reputation not used for fixed cost tiers Architecture critique fixes (all 3 CRIT issues resolved): - CRIT-1: reputation factor computed per-provider (each has its own Beta mean), not a single shared sample that cancels in argmax comparison - CRIT-2: EMA blending uses multiplicative formula `ema_score * (1 + weight * (rep_factor - 0.5) * 2)`, bounded proportionally to existing EMA score rather than unbounded additive term - CRIT-3: Thompson Sampling priors shifted by quality reputation parameters before sampling via shift_thompson_priors() + select_with_priors(), preserving single-distribution sampling guarantees (no convex combination of two samples) Implementation: - New crate: crates/zeph-llm/src/router/reputation.rs — ReputationTracker, ReputationEntry (embeds BetaDist from thompson.rs), apply_decay(), prune(), atomic save/load with 0o600 permissions, 28 unit tests - RouterProvider: reputation/reputation_state_path/reputation_weight fields, with_reputation() builder, record_quality_outcome(), save_reputation_state(), reputation_stats(); last_active_provider tracking for correct attribution - ThompsonState: select_with_priors() for shifted-prior sampling, get_distribution() accessor - LlmProvider trait: record_quality_outcome() default no-op - AnyProvider: delegates record_quality_outcome() and save_reputation_state() - native.rs: InvalidParams classified as quality failure; success recorded; network/transient/timeout errors excluded from quality signal - bootstrap/provider.rs: apply_reputation_if_enabled() wires config to router, skips Cascade strategy - Config: [llm.router.reputation] section — enabled, decay_factor, weight, min_observations, state_path - State persisted to ~/.config/zeph/router_reputation_state.json +28 tests (reputation.rs), total: 6321
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds per-provider quality reputation tracking to
AgentRouterusing Beta distributions (RAPS — Reputation-Adjusted Provider Selection). Reputation tracks tool execution quality outcomes (invalid argument failures) separately from API availability, and adjusts routing scores over time to prefer providers that produce valid tool calls.Beta(alpha, beta)quality distributions; default uniform prior (1,1)decay_factor = 0.95) shrinks evidence toward prior on each loadmin_observations = 5) gates routing influenceInvalidParamstool errors count as quality failures; network/transient/timeout errors excludedArchitecture critique fixes
Three critical math errors from architecture review are resolved:
ema_reputation_factor()computes each provider's own Beta mean independently — meaningful argmax comparisonema_score * (1 + weight * (rep_factor - 0.5) * 2), score bounded proportionally to existing EMA value, neutral atrep_factor = 0.5shift_thompson_priors()adds weighted quality evidence into Thompson Beta params before sampling viaselect_with_priors()— preserves single-distribution sampling property, no convex blendConfig
Files changed
crates/zeph-llm/src/router/reputation.rsReputationTracker, decay, prune, save/load, 28 testscrates/zeph-llm/src/router/mod.rswith_reputation(),record_quality_outcome(), EMA/Thompson blendingcrates/zeph-llm/src/router/thompson.rsselect_with_priors(),get_distribution()crates/zeph-llm/src/any.rsrecord_quality_outcome()andsave_reputation_state()crates/zeph-llm/src/provider.rsrecord_quality_outcome()in traitcrates/zeph-core/src/config/providers.rsReputationConfigstruct,RouterConfig.reputationfieldcrates/zeph-core/src/bootstrap/provider.rsapply_reputation_if_enabled()crates/zeph-core/src/agent/tool_execution/native.rssrc/init.rsreputation: Nonein wizardRouterConfigliteralCHANGELOG.mdTests
+28 unit tests in
reputation.rs. Total: 6327 passed, 15 skipped.Test plan
cargo +nightly fmt --check— passcargo clippy --workspace --features full -- -D warnings— passcargo nextest run --config-file .github/nextest.toml --workspace --features full --lib --bins— 6327 passed