Skip to content

feat(router): RAPS Bayesian reputation scoring (#1886)#2091

Merged
bug-ops merged 3 commits intomainfrom
feat-issue-1886-raps-bayesian-reputation
Mar 21, 2026
Merged

feat(router): RAPS Bayesian reputation scoring (#1886)#2091
bug-ops merged 3 commits intomainfrom
feat-issue-1886-raps-bayesian-reputation

Conversation

@bug-ops
Copy link
Owner

@bug-ops bug-ops commented Mar 21, 2026

Summary

Adds per-provider quality reputation tracking to AgentRouter using Beta distributions (RAPS — Reputation-Adjusted Provider Selection). Reputation tracks tool execution quality outcomes (invalid argument failures) separately from API availability, and adjusts routing scores over time to prefer providers that produce valid tool calls.

  • Per-provider Beta(alpha, beta) quality distributions; default uniform prior (1,1)
  • Session-level decay (decay_factor = 0.95) shrinks evidence toward prior on each load
  • Minimum observation threshold (min_observations = 5) gates routing influence
  • Cascade strategy is fully excluded — no mutex overhead, no wasted collection
  • Only InvalidParams tool errors count as quality failures; network/transient/timeout errors excluded

Architecture critique fixes

Three critical math errors from architecture review are resolved:

  • CRIT-1 (per-provider sampling): ema_reputation_factor() computes each provider's own Beta mean independently — meaningful argmax comparison
  • CRIT-2 (unbounded EMA): multiplicative formula ema_score * (1 + weight * (rep_factor - 0.5) * 2), score bounded proportionally to existing EMA value, neutral at rep_factor = 0.5
  • CRIT-3 (Thompson guarantees): shift_thompson_priors() adds weighted quality evidence into Thompson Beta params before sampling via select_with_priors() — preserves single-distribution sampling property, no convex blend

Config

[llm.router.reputation]
enabled = true
decay_factor = 0.95   # (0.0, 1.0], lower = faster forgetting
weight = 0.3          # [0.0, 1.0], blend strength
min_observations = 5  # gate: minimum quality events before routing is affected
# state_path = "~/.config/zeph/router_reputation_state.json"

Files changed

File Change
crates/zeph-llm/src/router/reputation.rs New — ReputationTracker, decay, prune, save/load, 28 tests
crates/zeph-llm/src/router/mod.rs Reputation fields, with_reputation(), record_quality_outcome(), EMA/Thompson blending
crates/zeph-llm/src/router/thompson.rs select_with_priors(), get_distribution()
crates/zeph-llm/src/any.rs Delegate record_quality_outcome() and save_reputation_state()
crates/zeph-llm/src/provider.rs Default no-op record_quality_outcome() in trait
crates/zeph-core/src/config/providers.rs ReputationConfig struct, RouterConfig.reputation field
crates/zeph-core/src/bootstrap/provider.rs apply_reputation_if_enabled()
crates/zeph-core/src/agent/tool_execution/native.rs Quality outcome recording after tool execution
src/init.rs reputation: None in wizard RouterConfig literal
CHANGELOG.md Unreleased entry

Tests

+28 unit tests in reputation.rs. Total: 6327 passed, 15 skipped.

Test plan

  • cargo +nightly fmt --check — pass
  • cargo clippy --workspace --features full -- -D warnings — pass
  • cargo nextest run --config-file .github/nextest.toml --workspace --features full --lib --bins — 6327 passed
  • Merged with main, no conflicts remaining

bug-ops added 2 commits March 21, 2026 23:46
Add per-provider quality reputation tracking to AgentRouter using Beta
distributions (RAPS — Reputation-Adjusted Provider Selection).

Key design decisions:
- Tool execution quality outcomes (InvalidParams only) shift routing scores
  separately from API availability, tracked in Beta(alpha, beta) per provider
- Session-level decay (default 0.95) shrinks evidence toward uniform prior,
  preventing stale observations from permanently biasing routing
- Minimum observation threshold (default 5) gates all routing influence until
  enough data is accumulated
- Cascade strategy is a no-op — reputation not used for fixed cost tiers

Architecture critique fixes (all 3 CRIT issues resolved):
- CRIT-1: reputation factor computed per-provider (each has its own Beta mean),
  not a single shared sample that cancels in argmax comparison
- CRIT-2: EMA blending uses multiplicative formula
  `ema_score * (1 + weight * (rep_factor - 0.5) * 2)`, bounded proportionally
  to existing EMA score rather than unbounded additive term
- CRIT-3: Thompson Sampling priors shifted by quality reputation parameters
  before sampling via shift_thompson_priors() + select_with_priors(), preserving
  single-distribution sampling guarantees (no convex combination of two samples)

Implementation:
- New crate: crates/zeph-llm/src/router/reputation.rs — ReputationTracker,
  ReputationEntry (embeds BetaDist from thompson.rs), apply_decay(), prune(),
  atomic save/load with 0o600 permissions, 28 unit tests
- RouterProvider: reputation/reputation_state_path/reputation_weight fields,
  with_reputation() builder, record_quality_outcome(), save_reputation_state(),
  reputation_stats(); last_active_provider tracking for correct attribution
- ThompsonState: select_with_priors() for shifted-prior sampling,
  get_distribution() accessor
- LlmProvider trait: record_quality_outcome() default no-op
- AnyProvider: delegates record_quality_outcome() and save_reputation_state()
- native.rs: InvalidParams classified as quality failure; success recorded;
  network/transient/timeout errors excluded from quality signal
- bootstrap/provider.rs: apply_reputation_if_enabled() wires config to router,
  skips Cascade strategy
- Config: [llm.router.reputation] section — enabled, decay_factor, weight,
  min_observations, state_path
- State persisted to ~/.config/zeph/router_reputation_state.json

+28 tests (reputation.rs), total: 6321
@github-actions github-actions bot added documentation Improvements or additions to documentation llm zeph-llm crate (Ollama, Claude) rust Rust code changes core zeph-core crate enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 21, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 21, 2026 22:51
@bug-ops bug-ops merged commit 2c4eba4 into main Mar 21, 2026
25 checks passed
@bug-ops bug-ops deleted the feat-issue-1886-raps-bayesian-reputation branch March 21, 2026 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request llm zeph-llm crate (Ollama, Claude) rust Rust code changes size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

research(routing): RAPS Bayesian reputation scoring for AgentRouter robustness

1 participant