feat(ranking): BM25 context relevance + Thompson sampling + unified ranker#91
feat(ranking): BM25 context relevance + Thompson sampling + unified ranker#91
Conversation
…anker Upgrades rule injection ranking to address two issues surfaced by the 2026-04 autoresearch synthesis (sec 5): - **BM25 over (category + description + tags)** replaces substring keyword overlap as the context-relevance signal. Uses the pure-Python `bm25s` package, gated behind `try/except ImportError` so the SDK stays zero-required-deps — falls back cleanly to the existing keyword scorer when bm25s is unavailable. Added as a `ranking` optional extra and rolled into the `all` group. - **Thompson sampling over (alpha, beta) posteriors** as an opt-in mode (`GRADATA_THOMPSON_RANKING=1`). When enabled, the confidence term is replaced by `p ~ Beta(alpha, beta_param)` sampled via stdlib `random.betavariate`, giving exploration weight to newly graduated PATTERN-tier rules with uncertain posteriors. Deterministic within a session via new `session_seed` argument — same seed yields the same top-K across invocations, different seeds diverge as expected. - **Unified ranking paths**. `inject_brain_rules.py` (SessionStart) and `agent_precontext.py` (PreToolUse/Agent) now both call `rule_ranker.rank_rules`, so the algorithm ablation-tests is the algorithm that ships. The qmd wiki-category signal is preserved as an optional `wiki_boost: dict[str, float]` input instead of a hard pre-filter — BM25 can still rescue strong cross-category matches. - Kept `_score` shim in `inject_brain_rules.py` for back-compat with existing tests; did not touch `rule_engine.apply_rules` to avoid churn on its many callers. Tests: 14 new cases in `tests/test_ranking_v2.py` covering BM25 win over irrelevant rules, runtime fallback when bm25s is monkeypatched out, Thompson determinism under seed, seed divergence across runs, guards on bad Beta params, empty / single-rule / missing-Beta inputs, max-K respect, and wiki_boost routing. Full suite: 2575 passed, 24 skipped.
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 Walkthrough
WalkthroughAdds BM25-backed context relevance and optional Thompson-sampling to rule/lesson ranking, exposes a new optional dependency group Changes
Sequence Diagram(s)sequenceDiagram
participant Hook as Agent/Brain Hook
participant Adapter as Lesson→Rule Adapter
participant Ranker as rank_rules
participant BM25 as BM25 Scorer
participant Sampler as Thompson Sampler
participant Results as Ranked Rules
Hook->>Adapter: provide lessons + metadata (agent_type, session_number, session_id)
Adapter->>Ranker: call rank_rules(rule_dicts, context_keywords, wiki_boost, session_seed, max_rules)
Ranker->>BM25: request relevance scores for context (if bm25s present)
alt BM25 available
BM25-->>Ranker: normalized scores per rule
else
Ranker->>Ranker: compute keyword hit-ratio fallback
end
Ranker->>Sampler: if Thompson enabled, sample Beta(alpha,beta_param) per rule using session_seed
Sampler-->>Ranker: sampled confidence values
Ranker->>Ranker: combine context score + wiki_boost + confidence -> composite score
Ranker-->>Results: sort by composite score and truncate to max_rules
Results-->>Hook: return ranked lessons via stored _lesson field
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Previous comprehensions `[rd.get("_lesson") for rd in ranked if ... is not None]`
don't narrow types for pyright — .get() returns Optional, and the predicate
runs a second call which pyright can't tie back. Switch to explicit loop
with local variable so the narrowing sticks.
Closes 8 reportOptionalMemberAccess errors that failed CI on PR #91.
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
Summary
Addresses the two ranking upgrades called out in the 2026-04 autoresearch synthesis (section 5) that were unblocked by the `rules.injected` emit in #86:
Design
BM25
Thompson sampling
Unified ranking
Weights (unchanged)
Test plan
What I did NOT do
Generated with Gradata