Skip to content

v0.3.0

Choose a tag to compare

@jaepil jaepil released this 20 Feb 13:24
· 54 commits to main since this release
dea9e18

What's New

Hybrid Text + Vector Search

New cosine_to_probability() function (Definition 7.1.2) maps cosine similarity scores from [-1, 1] to calibrated probabilities in (0, 1), enabling direct fusion with BM25 probabilities via log-odds conjunction.

Weighted Log-Odds Conjunction (Log-OP)

log_odds_conjunction() now accepts an optional weights parameter for per-signal reliability weighting (Theorem 8.3, Remark 8.4). The Log-OP formulation sigma(sum(w_i * logit(P_i))) allows assigning different reliability weights to BM25 vs vector search signals during fusion.

WAND Upper Bound for Document Pruning

BayesianProbabilityTransform.wand_upper_bound() computes the tightest safe Bayesian probability upper bound (Theorem 6.1.2) for WAND-style top-k pruning. Uses p_max=0.9 from the composite prior bound (Theorem 4.2.4) and supports base-rate-aware bounds for tighter pruning.

Prior-Aware Training Modes (C1/C2/C3)

fit() and update() now accept a mode parameter supporting three training conditions from Algorithm 8.3.1:

  • "balanced" (C1, default) -- train on sigmoid likelihood (existing behavior)
  • "prior_aware" (C2) -- train on full Bayesian posterior with chain-rule gradients through dP/dL
  • "prior_free" (C3) -- train on likelihood, inference uses prior=0.5

Usage

import numpy as np
from bayesian_bm25 import cosine_to_probability, log_odds_conjunction

# BM25 probabilities (from Bayesian BM25)
bm25_probs = np.array([0.85, 0.60, 0.40])

# Vector search cosine similarities -> probabilities
cosine_scores = np.array([0.92, 0.35, 0.70])
vector_probs = cosine_to_probability(cosine_scores)

# Fuse with reliability weights (BM25 weight=0.6, vector weight=0.4)
stacked = np.stack([bm25_probs, vector_probs], axis=-1)
fused = log_odds_conjunction(stacked, weights=np.array([0.6, 0.4]))

Changes

  • Add cosine_to_probability() for cosine similarity to probability conversion (Definition 7.1.2)
  • Add weights parameter to log_odds_conjunction() for Log-OP formulation (Theorem 8.3, Remark 8.4)
  • Add BayesianProbabilityTransform.wand_upper_bound() for safe WAND document pruning (Theorem 6.1.2)
  • Add mode parameter to fit() and update() for prior-aware (C2) and prior-free (C3) training (Algorithm 8.3.1)
  • Add weighted fusion benchmark (benchmarks/weighted_fusion.py)
  • Add WAND upper bound tightness benchmark (benchmarks/wand_upper_bound.py)
  • Extend base rate benchmark with Platt scaling, min-max normalization, and C2/C3 training mode comparisons

Full Changelog: v0.2.0...v0.3.0
PyPI: https://pypi.org/project/bayesian-bm25/0.3.0/