v0.11.0
What's New
VectorProbabilityTransform (Paper 3, Theorem 3.1.1)
Replaces naive (1 + cos) / 2 conversion with a likelihood ratio framework for vector similarity calibration:
P(R|d) = sigmoid(log(f_R(d) / f_G(d)) + logit(P_base))
VectorProbabilityTransform-- calibrates vector distances into probabilities via density ratio estimationfit_background(): estimate background Gaussian (mu_G, sigma_G) from a corpus samplecalibrate(): full pipeline with auto-routing between KDE and GMM density estimationestimate_kde(): weighted Gaussian KDE for relevant-document density f_R (Section 4.3)estimate_gmm(): two-component GMM-EM with fixed background component (Algorithm 5.3.1)log_density_ratio(): log(f_R(d) / f_G(d)) vector evidence (Definition 3.2.1)- Gap detection (Strategy 4.6.1): dual-threshold semantic cliff detection
- Auto-routing: gap + K >= 50 uses KDE, gap + K < 50 uses GMM, smooth distributions route by available weights
- Supports
base_ratefor corpus-level relevance prior
Index-Aware Density Priors (Paper 3, Strategy 4.6.2)
ivf_density_prior()--sigmoid(gamma * (cell_population / avg_population - 1))knn_density_prior()--sigmoid(gamma * (global_median_kth / kth_distance - 1))
FusionDebugger Extensions
- Extended
VectorSignalTracewith calibrated vector fields:distance,f_R,f_G,log_density_ratio,calibration_method - New
trace_calibrated_vector()method for tracing VPT-calibrated signals with full density ratio diagnostics
BEIR Benchmark (26 zero-shot methods)
Three new vector calibration methods added to benchmarks/hybrid_beir.py:
| Method | Description |
|---|---|
| Bayesian-Vector-Balanced | VPT-calibrated dense + BM25 via balanced_log_odds_fusion |
| Bayesian-Vector-Softplus | VPT-calibrated dense + BM25 via softplus-gated log_odds_conjunction |
| Bayesian-Vector-Attn | VPT-calibrated dense + attention with logit normalization + 7 features |
Additional benchmark fixes:
- Fix log-odds fusion functions to return probabilities via sigmoid where the pipeline stays in logit space
- Remove raw log-odds methods from CALIBRATION_METHODS to fix Brier > 1.0 bug
- Rename all benchmark methods with consistent Bayesian- prefix
Install
pip install bayesian-bm25==0.11.0
Full Changelog: v0.10.0...v0.11.0