feat(ranking): BM25 context relevance + Thompson sampling + unified ranker by Gradata · Pull Request #91 · Gradata/gradata

Gradata · 2026-04-15T21:08:06Z

Summary

Addresses the two ranking upgrades called out in the 2026-04 autoresearch synthesis (section 5) that were unblocked by the `rules.injected` emit in #86:

BM25 over (category + description + tags) replaces the substring keyword-overlap scorer in `rule_ranker.py` as the context-relevance signal. Expected +2-5% on injection relevance.
Thompson sampling over existing Beta(alpha, beta) posteriors as an opt-in mode. Solves cold-start: newly graduated PATTERN-tier rules get exploration weight instead of being buried under older RULE-tier posteriors.
Unifies the three ranking paths (`inject_brain_rules.py`, `agent_precontext.py`, `rule_ranker.py`) so the algorithm that ships is the one ablation tests.

Design

BM25

Uses `bm25s` - pure-Python, single-import, zero C extensions.
Added as `ranking` optional extra in `pyproject.toml` and rolled into `all`. Not a required dep - the SDK stays zero-required-deps.
Gated behind `try/except ImportError`; when `bm25s` is absent at import or call time, `_context_component` falls back to the legacy substring-overlap scorer. Covered by a monkeypatched fallback test.
Corpus: `category + description + tags` per rule. Query: `task_type + context_keywords`. Scores are max-normalized to [0, 1] and plugged into the existing 20% context weight slot.

Thompson sampling

Opt-in via `GRADATA_THOMPSON_RANKING=1` (default off - preserves current ranker behavior).
When on, the 25% confidence weight slot uses `p ~ Beta(alpha, beta_param)` instead of the mean confidence.
Uses stdlib `random.betavariate` - no numpy dep added.
New `session_seed` argument on `rank_rules` makes sampling deterministic within a session. Same seed -> same top-K; different seeds -> different orderings (validated by tests).
Hardens against malformed Beta params (zero / negative alpha or beta_param are clamped to 1e-3).

Unified ranking

`inject_brain_rules.py` and `agent_precontext.py` now both call `rule_ranker.rank_rules`. The linear `state_bonus + conf_norm + conf` scorer and the sub-agent `_relevance_score` helper are replaced.
The qmd wiki category match is preserved as an optional `wiki_boost: dict[str, float]` signal fed into the context component (+0.3 by default), not a hard pre-filter. BM25 can now rescue strong cross-category matches the wiki missed.
Back-compat `_score` shim kept in `inject_brain_rules.py` so existing tests / external callers don't break.
Did not touch `rule_engine.apply_rules` - it has many callers and changing its signature was out of scope. If future work wants to route `apply_rules` through the unified ranker too, it should be a separate PR with care on the public API.

Weights (unchanged)

30% scope match
25% confidence (or Beta-sampled p when Thompson is on)
20% context relevance (BM25 normalized when bm25s available, keyword fallback otherwise)
15% recency
10% fire count
\pm 0.10 effectiveness bonus

Test plan

What I did NOT do

`rule_engine.apply_rules` - out of scope per task instructions; has many callers and changing its signature is higher risk than warranted here.
Removing the keyword fallback - shipped additive as instructed.
numpy dependency - Thompson uses `random.betavariate` from stdlib.

Generated with Gradata

…anker Upgrades rule injection ranking to address two issues surfaced by the 2026-04 autoresearch synthesis (sec 5): - **BM25 over (category + description + tags)** replaces substring keyword overlap as the context-relevance signal. Uses the pure-Python `bm25s` package, gated behind `try/except ImportError` so the SDK stays zero-required-deps — falls back cleanly to the existing keyword scorer when bm25s is unavailable. Added as a `ranking` optional extra and rolled into the `all` group. - **Thompson sampling over (alpha, beta) posteriors** as an opt-in mode (`GRADATA_THOMPSON_RANKING=1`). When enabled, the confidence term is replaced by `p ~ Beta(alpha, beta_param)` sampled via stdlib `random.betavariate`, giving exploration weight to newly graduated PATTERN-tier rules with uncertain posteriors. Deterministic within a session via new `session_seed` argument — same seed yields the same top-K across invocations, different seeds diverge as expected. - **Unified ranking paths**. `inject_brain_rules.py` (SessionStart) and `agent_precontext.py` (PreToolUse/Agent) now both call `rule_ranker.rank_rules`, so the algorithm ablation-tests is the algorithm that ships. The qmd wiki-category signal is preserved as an optional `wiki_boost: dict[str, float]` input instead of a hard pre-filter — BM25 can still rescue strong cross-category matches. - Kept `_score` shim in `inject_brain_rules.py` for back-compat with existing tests; did not touch `rule_engine.apply_rules` to avoid churn on its many callers. Tests: 14 new cases in `tests/test_ranking_v2.py` covering BM25 win over irrelevant rules, runtime fallback when bm25s is monkeypatched out, Thompson determinism under seed, seed divergence across runs, guards on bad Beta params, empty / single-rule / missing-Beta inputs, max-K respect, and wiki_boost routing. Full suite: 2575 passed, 24 skipped.

greptile-apps

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

coderabbitai · 2026-04-15T21:08:18Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: b52d9fd5-8d3b-4520-aaf3-0042c7e35e33

📥 Commits

Reviewing files that changed from the base of the PR and between 08673ef and 49f2d1f.

📒 Files selected for processing (2)

src/gradata/hooks/agent_precontext.py
src/gradata/hooks/inject_brain_rules.py

📝 Walkthrough

BM25-based context relevance: rank_rules() uses optional bm25s (bm25s>=0.2.0) to score context (category+description+tags) with a keyword-overlap fallback when bm25s is unavailable.
Thompson sampling: opt-in via GRADATA_THOMPSON_RANKING=1 to sample confidence from Beta(alpha, beta_param) (deterministic within a session via session_seed); beta params clamped to 1e-3; no numpy dependency added.
Unified ranking: inject_brain_rules.py and agent_precontext.py now call rank_rules() so both code paths use the same ranking algorithm.
Wiki boost refactor: wiki category matching changed from a hard pre-filter to an optional wiki_boost dict fed into the context component (default +0.3).
API additions (non-breaking): rank_rules(...) accepts new optional parameters wiki_boost: dict[str, float] | None and session_seed: int | None.
Dependency change: pyproject.toml adds an optional ranking extras group with bm25s and includes it in the all extras; import guarded so runtime fallback preserves behavior.
Tests: new tests (tests/test_ranking_v2.py) cover BM25 behavior and fallback, Thompson sampling determinism and robustness, ordering/max_rules/edge cases, and wiki_boost; test run: 2575 passed, 24 skipped.
Backward compatibility: a _score shim retained in inject_brain_rules.py; existing ranking weights unchanged.

Walkthrough

Adds BM25-backed context relevance and optional Thompson-sampling to rule/lesson ranking, exposes a new optional dependency group ranking for bm25s>=0.2.0, and integrates the enhanced rank_rules into agent precontext and brain-rules injection with deterministic per-session seeding.

Changes

Cohort / File(s)	Summary
Dependencies `pyproject.toml`	Add optional dependency group `ranking = ["bm25s>=0.2.0"]` and include it in the `all` extras.
Rule Ranker Core `src/gradata/rules/rule_ranker.py`	Expanded `rank_rules()` to use BM25 (optional) for context relevance with a keyword fallback, added `wiki_boost` and `session_seed` params, and optional Thompson Beta sampling (controlled by env flag) with deterministic RNG.
Hook Integrations `src/gradata/hooks/agent_precontext.py`, `src/gradata/hooks/inject_brain_rules.py`	Replaced local heuristic ranking with `rank_rules()` calls. Added adapters to convert lessons→rule dicts, build `context_keywords` and `wiki_boost`, and compute deterministic `session_seed`; ranked outputs re-map to original lesson objects.
Tests `tests/test_ranking_v2.py`	New test suite covering BM25 relevance (with conditional skip/fallback), Thompson sampling determinism/seed divergence and edge cases, `wiki_boost` behavior, ordering invariants, and `max_rules` truncation.

Sequence Diagram(s)

sequenceDiagram
    participant Hook as Agent/Brain Hook
    participant Adapter as Lesson→Rule Adapter
    participant Ranker as rank_rules
    participant BM25 as BM25 Scorer
    participant Sampler as Thompson Sampler
    participant Results as Ranked Rules

    Hook->>Adapter: provide lessons + metadata (agent_type, session_number, session_id)
    Adapter->>Ranker: call rank_rules(rule_dicts, context_keywords, wiki_boost, session_seed, max_rules)
    Ranker->>BM25: request relevance scores for context (if bm25s present)
    alt BM25 available
        BM25-->>Ranker: normalized scores per rule
    else
        Ranker->>Ranker: compute keyword hit-ratio fallback
    end
    Ranker->>Sampler: if Thompson enabled, sample Beta(alpha,beta_param) per rule using session_seed
    Sampler-->>Ranker: sampled confidence values
    Ranker->>Ranker: combine context score + wiki_boost + confidence -> composite score
    Ranker-->>Results: sort by composite score and truncate to max_rules
    Results-->>Hook: return ranked lessons via stored _lesson field

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat(scoped-brains): Brain.scope(domain) + sub-agent inheritance (Phase 2) #78: Modifies lesson-selection logic in src/gradata/hooks/agent_precontext.py, directly related to the refactor replacing local ranking with rank_rules.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 44.83% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the three main changes: BM25 context relevance, Thompson sampling, and unified ranker across multiple modules.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, covering the motivation, design, implementation details, test coverage, and intentional scope boundaries.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/ranking-bm25-thompson

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Previous comprehensions `[rd.get("_lesson") for rd in ranked if ... is not None]` don't narrow types for pyright — .get() returns Optional, and the predicate runs a second call which pyright can't tie back. Switch to explicit loop with local variable so the narrowing sticks. Closes 8 reportOptionalMemberAccess errors that failed CI on PR #91.

greptile-apps

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

greptile-apps Bot reviewed Apr 15, 2026

View reviewed changes

coderabbitai Bot added the feature label Apr 15, 2026

greptile-apps Bot reviewed Apr 15, 2026

View reviewed changes

Gradata merged commit 8d8b5aa into main Apr 15, 2026
16 checks passed

coderabbitai Bot mentioned this pull request Apr 17, 2026

feat(jit,graduation): BM25 for JIT ranking + raise Beta LB default to 0.85 #101

Merged

3 tasks

Gradata deleted the feat/ranking-bm25-thompson branch April 17, 2026 19:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ranking): BM25 context relevance + Thompson sampling + unified ranker#91

feat(ranking): BM25 context relevance + Thompson sampling + unified ranker#91
Gradata merged 2 commits intomainfrom
feat/ranking-bm25-thompson

Gradata commented Apr 15, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

greptile-apps Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gradata commented Apr 15, 2026

Summary

Design

BM25

Thompson sampling

Unified ranking

Weights (unchanged)

Test plan

What I did NOT do

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading