v0.9.0
What's New
Ten new features closing the remaining gaps from both papers.
Gating Functions (Paper 2, Theorems 6.7.6 / 6.8.1)
- GELU gating --
gating="gelu"inlog_odds_conjunction: Bayesian expected signal under Gaussian noise model, implemented aslogit * sigmoid(1.702 * logit) - Generalized Swish --
gating_betaparameter controls gate sharpness:beta -> 0givesx/2,beta = 1is standard Swish,beta -> infapproaches ReLU
Multi-Head Attention (Paper 2, Remark 8.6)
MultiHeadAttentionLogOddsWeights-- multiple independent attention heads with different random initializations, averaged in log-odds space for more robust fusion
Exact Attention Pruning (Paper 2, Theorem 8.7.1)
compute_upper_bounds()andprune()on bothAttentionLogOddsWeightsandMultiHeadAttentionLogOddsWeights-- safely eliminate candidates whose upper bound is below a threshold
Neural Score Calibration (Paper 1, Section 12.2 #5)
PlattCalibrator-- sigmoid calibrationP = sigmoid(a * score + b)with BCE gradient descentIsotonicCalibrator-- non-parametric monotone calibration via PAVA (numpy-only)
External Prior Features (Paper 1, Section 12.2 #6)
prior_fnonBayesianProbabilityTransform-- custom callable(score, tf, doc_len_ratio) -> floatreplaces composite prior
Temporal Dynamics (Paper 1, Section 12.2 #3)
TemporalBayesianTransform-- exponential decay weights recent observations more heavily infit(), enabling adaptation to non-stationary relevance patterns
BMW Block-Max Upper Bounds (Paper 1, Section 6.2)
BlockMaxIndex-- block-level BM25 upper bounds that are tighter than global WAND bounds for more aggressive safe pruning
Fusion Class Enhancements
base_rateparameter onLearnableLogOddsWeightsandAttentionLogOddsWeights-- addslogit(base_rate)as constant additive bias in log-odds space- Vectorized forward pass for
AttentionLogOddsWeights-- replaces per-row loop with numpy broadcast
BEIR Benchmark (NDCG@10, 22 methods)
| Method | ArguAna | FiQA | NFCorpus | SciDocs | SciFact | Average |
|---|---|---|---|---|---|---|
| BM25 | 36.16 | 25.32 | 31.85 | 15.65 | 67.91 | 35.38 |
| Convex | 40.03 | 37.10 | 35.61 | 19.65 | 73.38 | 41.15 |
| Bayesian-Balanced | 37.27 | 40.59 | 35.73 | 21.40 | 72.47 | 41.50 |
| Attn-NR | 37.21 | 40.43 | 35.43 | 21.91 | 73.22 | 41.64 |
| MH-NR (new) | 37.13 | 39.08 | 35.72 | 21.78 | 70.60 | 40.86 |
| Multi-Head (new) | 37.04 | 39.27 | 34.32 | 21.17 | 70.28 | 40.42 |
Attn-NR remains the top zero-shot method (+6.26 vs BM25). MH-NR is the best new v0.9.0 method (+5.48).
Install
pip install bayesian-bm25==0.9.0
Full Changelog: https://github.com/cognica-io/bayesian-bm25/blob/main/HISTORY.md#090-2026-03-14