Improve BayesianScoreQuery and LogOddsFusionQuery with base rate prior, weighted Log-OP, and parameter estimation#15948
Open
jaepil wants to merge 2 commits intoapache:mainfrom
Open
Improve BayesianScoreQuery and LogOddsFusionQuery with base rate prior, weighted Log-OP, and parameter estimation#15948jaepil wants to merge 2 commits intoapache:mainfrom
jaepil wants to merge 2 commits intoapache:mainfrom
Conversation
…ybrid search - Add BayesianScoreEstimator for auto-estimating sigmoid calibration parameters - Add base rate prior support to BayesianScoreQuery for log-odds shifting - Add per-signal weights to LogOddsFusionQuery for weighted Logarithmic Opinion Pooling - Add logit normalization support to LogOddsFusionScorer - Add comprehensive tests for BayesianScoreQuery and LogOddsFusionQuery
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #15827. This PR extends BayesianScoreQuery and LogOddsFusionQuery with three improvements:
sigmoid(alpha * (score - beta) + logit(baseRate)), improving calibration for rare-relevance corporaAlgorithm Details
BayesianScoreEstimator
Estimates
BayesianScoreQueryparameters from corpus statistics via pseudo-query sampling:beta = median(scores),alpha = 1 / std(scores)[1e-6, 0.5]Base Rate Prior
When a base rate
ris set onBayesianScoreQuery, the posterior is computed as:where
logit(r) = log(r / (1 - r)). This shifts scores down for rare-relevance corpora (e.g.,r = 0.01adds a -4.6 logit offset), improving calibration without changing ranking order within a single query.Weighted Log-OP
When per-signal weights are provided to
LogOddsFusionQuery, the scoring formula changes from uniform mean to weighted sum:Weights must be non-negative and sum to 1. Optional per-signal logit normalization bounds (
logitMin,logitMax) enable min-max normalization as an alternative to softplus gating, useful when learned signal scales differ significantly.New Files
BayesianScoreEstimator.javaModified Files
BayesianScoreQuery.javaLogOddsFusionQuery.javaLogOddsFusionScorer.javaTestBayesianScoreQuery.javaTestLogOddsFusionQuery.javaTest Coverage (23 new tests)
BayesianScoreQuery base rate (7 tests)
BayesianScoreEstimator (4 tests)
LogOddsFusionQuery weighted fusion (10 tests)
LogOddsFusionQuery logit normalization (2 tests)
Test plan
./gradlew tidypasses (google-java-format via Spotless)./gradlew :lucene:core:compileJava :lucene:core:compileTestJavapassesTestBayesianScoreQueryandTestLogOddsFusionQuery