v0.3.0
What's New
Hybrid Text + Vector Search
New cosine_to_probability() function (Definition 7.1.2) maps cosine similarity scores from [-1, 1] to calibrated probabilities in (0, 1), enabling direct fusion with BM25 probabilities via log-odds conjunction.
Weighted Log-Odds Conjunction (Log-OP)
log_odds_conjunction() now accepts an optional weights parameter for per-signal reliability weighting (Theorem 8.3, Remark 8.4). The Log-OP formulation sigma(sum(w_i * logit(P_i))) allows assigning different reliability weights to BM25 vs vector search signals during fusion.
WAND Upper Bound for Document Pruning
BayesianProbabilityTransform.wand_upper_bound() computes the tightest safe Bayesian probability upper bound (Theorem 6.1.2) for WAND-style top-k pruning. Uses p_max=0.9 from the composite prior bound (Theorem 4.2.4) and supports base-rate-aware bounds for tighter pruning.
Prior-Aware Training Modes (C1/C2/C3)
fit() and update() now accept a mode parameter supporting three training conditions from Algorithm 8.3.1:
"balanced"(C1, default) -- train on sigmoid likelihood (existing behavior)"prior_aware"(C2) -- train on full Bayesian posterior with chain-rule gradients throughdP/dL"prior_free"(C3) -- train on likelihood, inference usesprior=0.5
Usage
import numpy as np
from bayesian_bm25 import cosine_to_probability, log_odds_conjunction
# BM25 probabilities (from Bayesian BM25)
bm25_probs = np.array([0.85, 0.60, 0.40])
# Vector search cosine similarities -> probabilities
cosine_scores = np.array([0.92, 0.35, 0.70])
vector_probs = cosine_to_probability(cosine_scores)
# Fuse with reliability weights (BM25 weight=0.6, vector weight=0.4)
stacked = np.stack([bm25_probs, vector_probs], axis=-1)
fused = log_odds_conjunction(stacked, weights=np.array([0.6, 0.4]))Changes
- Add
cosine_to_probability()for cosine similarity to probability conversion (Definition 7.1.2) - Add
weightsparameter tolog_odds_conjunction()for Log-OP formulation (Theorem 8.3, Remark 8.4) - Add
BayesianProbabilityTransform.wand_upper_bound()for safe WAND document pruning (Theorem 6.1.2) - Add
modeparameter tofit()andupdate()for prior-aware (C2) and prior-free (C3) training (Algorithm 8.3.1) - Add weighted fusion benchmark (
benchmarks/weighted_fusion.py) - Add WAND upper bound tightness benchmark (
benchmarks/wand_upper_bound.py) - Extend base rate benchmark with Platt scaling, min-max normalization, and C2/C3 training mode comparisons
Full Changelog: v0.2.0...v0.3.0
PyPI: https://pypi.org/project/bayesian-bm25/0.3.0/