v0.6.0
Balanced log-odds fusion for hybrid search
balanced_log_odds_fusion() combines Bayesian BM25 probabilities with dense cosine similarities by normalizing both signals in logit space. Min-max normalization equalizes voting power, preventing heavy-tailed sparse logits (from sigmoid unwrapping) from drowning the dense signal while preserving document-length and term-frequency priors.
BEIR hybrid search benchmark
Evaluated on 5 BEIR datasets with retrieve-then-evaluate protocol (top-1000 per signal, union candidates, pytrec_eval). Dense encoder: all-MiniLM-L6-v2.
| Method | ArguAna | FiQA | NFCorpus | SciDocs | SciFact | Average |
|---|---|---|---|---|---|---|
| BM25 | 36.16 | 25.32 | 31.85 | 15.65 | 67.91 | 35.38 |
| Dense | 36.98 | 36.87 | 31.59 | 21.64 | 64.51 | 38.32 |
| Convex | 40.03 | 37.10 | 35.61 | 19.65 | 73.38 | 41.15 |
| RRF | 39.61 | 36.85 | 34.43 | 20.09 | 71.43 | 40.48 |
| Bayesian-Balanced | 37.28 | 40.57 | 35.63 | 21.55 | 71.75 | 41.36 |
| LO-Local | 39.63 | 37.20 | 34.10 | 19.50 | 73.81 | 40.85 |
| Bayesian-LogOdds | 37.17 | 33.12 | 35.25 | 18.52 | 72.25 | 39.26 |
Bayesian-Balanced leads in average NDCG@10 (41.36%), MAP@10 (30.23%), and Recall@10 (49.92%).
New features
balanced_log_odds_fusion()inbayesian_bm25.fusion- Accepts
weightparameter for asymmetric signal weighting (default 0.5) - Composes existing library functions (
logit,cosine_to_probability,_clamp_probability)
- Accepts
- BEIR hybrid search benchmark (
benchmarks/hybrid_beir.py)- 9 fusion methods: BM25, Dense, Convex, RRF, Bayesian-OR, Bayesian-LogOdds, LO-Local, Bayesian-LO-BR, Bayesian-Balanced
- Snowball English stemmer + stop word removal (matching BEIR official BM25 baseline)
- Embedding cache (.npz) to skip re-encoding across runs
Install
pip install bayesian-bm25==0.6.0