Skip to content

v0.11.0

Choose a tag to compare

@jaepil jaepil released this 17 Mar 13:37
be2ef53

What's New

VectorProbabilityTransform (Paper 3, Theorem 3.1.1)

Replaces naive (1 + cos) / 2 conversion with a likelihood ratio framework for vector similarity calibration:

P(R|d) = sigmoid(log(f_R(d) / f_G(d)) + logit(P_base))
  • VectorProbabilityTransform -- calibrates vector distances into probabilities via density ratio estimation
    • fit_background(): estimate background Gaussian (mu_G, sigma_G) from a corpus sample
    • calibrate(): full pipeline with auto-routing between KDE and GMM density estimation
    • estimate_kde(): weighted Gaussian KDE for relevant-document density f_R (Section 4.3)
    • estimate_gmm(): two-component GMM-EM with fixed background component (Algorithm 5.3.1)
    • log_density_ratio(): log(f_R(d) / f_G(d)) vector evidence (Definition 3.2.1)
    • Gap detection (Strategy 4.6.1): dual-threshold semantic cliff detection
    • Auto-routing: gap + K >= 50 uses KDE, gap + K < 50 uses GMM, smooth distributions route by available weights
    • Supports base_rate for corpus-level relevance prior

Index-Aware Density Priors (Paper 3, Strategy 4.6.2)

  • ivf_density_prior() -- sigmoid(gamma * (cell_population / avg_population - 1))
  • knn_density_prior() -- sigmoid(gamma * (global_median_kth / kth_distance - 1))

FusionDebugger Extensions

  • Extended VectorSignalTrace with calibrated vector fields: distance, f_R, f_G, log_density_ratio, calibration_method
  • New trace_calibrated_vector() method for tracing VPT-calibrated signals with full density ratio diagnostics

BEIR Benchmark (26 zero-shot methods)

Three new vector calibration methods added to benchmarks/hybrid_beir.py:

Method Description
Bayesian-Vector-Balanced VPT-calibrated dense + BM25 via balanced_log_odds_fusion
Bayesian-Vector-Softplus VPT-calibrated dense + BM25 via softplus-gated log_odds_conjunction
Bayesian-Vector-Attn VPT-calibrated dense + attention with logit normalization + 7 features

Additional benchmark fixes:

  • Fix log-odds fusion functions to return probabilities via sigmoid where the pipeline stays in logit space
  • Remove raw log-odds methods from CALIBRATION_METHODS to fix Brier > 1.0 bug
  • Rename all benchmark methods with consistent Bayesian- prefix

Install

pip install bayesian-bm25==0.11.0

Full Changelog: v0.10.0...v0.11.0