-
Notifications
You must be signed in to change notification settings - Fork 988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-10480: Use BMM scorer for 2 clauses disjunction #972
LUCENE-10480: Use BMM scorer for 2 clauses disjunction #972
Conversation
Hi @jpountz , I have adapted the original BMM PR #101 to the latest codebase and run further experiments on using it for 2 clauses disjunction. The results look both encouraging and strange :D When I run
However, when I run all the tasks,
This seems to suggest tasks run may interfere with each other as opposed to independent? Do you have any suggestion where I can look into next to confirm the performance impact of this change ? |
My best guess would be that you are seeing different results mostly because luceneutil picks random queries, and the run that only had disjunctions picked queries that happened to like your change better than the run that included all tasks? These are very impressive speedups indeed! |
Thanks @jpountz for looking into this! I did further experiments on this and the result seems to suggest it may be caused by bug / caching in the util or lucene itself. What I did was I first only kept 1 query per pure disjunction task, and remove the rest of the tasks like below.
and got this result:
However, when I added back the rest of the tasks but still kept 1 query for each of the three disjunction tasks, I got vastly different results:
I've attached the tasks file for reference here as well wikimedium.10M.nostopwords.tasks.txt @mikemccand , do you have any suggestion where this discrepancy might be coming from? I'll continue to run experiments as well to see if I can pinpoint the issue. |
The fact that queries perform slower in general in your first benchmark run makes me wonder if this could be due to insufficient warmup time. The default task repeat count of 20 might be too low for these queries that are very good at skipping irrelevant documents. Maybe try passing |
@zacharymorn FYI I played with a slightly different approach that implements BMM as a bulk scorer instead of a scorer, which I was hoping would help with making bookkeeping more lightweight: https://github.com/jpountz/lucene/tree/maxscore. It could be interesting to compare with your implementation. One optimization it has that seemed to help that your scorer doesn't have is to check for every non-essential scorer whether the score obtained so far plus the sum of max scores of non essential scorers that haven't been checked yet is still competitive. I got the following results on one run on wikimedium10m:
|
Thanks @jpountz for the suggestion and also providing the bulk scorer implementation! The result looks pretty impressive as well! I just tried |
Alright. As it turns out, the reason I'm getting vastly different performance results as I change tasks file here #972 (comment) is that, I have previously configured Here are the results with only 3 disjunction tasks with 3 queries each, but with different number of search threads:
Just curious @jpountz , were you using the default |
Hi @jpountz, I've taken some ideas from your bulk scorer implementation and was able to simplify my code as well as to boost the performance when under default
For
I implemented similar logic to move the next least contributing essential scorer into non-essential scorer list when minCompetitiveScore increased, I feel the effect would be similar? In terms of next steps, I'm wondering if there's a preference between bulk scorer and scorer implementations when performance improvement is similar (maybe one type of scorer can be used in more places) ? I've learnt a lot from both PRs already and either one looks like a good improvement for disjunction queries! |
Indeed, sorry I had misread your code!
No, it shouldn't matter. Bulk scorers sometimes help yield better performance because it's easier for them to amortize computation across docs, but if they don't yield better performance, there's no point in using a bulk scorer instead of a regular scorer. I agree that it looks like a great speedup, we should get this in! The benchmark only tests performance of top-level disjunctions of term queries that have two clauses. I'd be curious to get performance numbers for queries like the below ones to see if we need to fine-tune a bit more when this new scorer gets used. Note that I don't think we need to get the performance better for all these queries to merge the change, we could start by only using this new scorer for the (common) case of a top-level disjunction of 2 term queries, and later see if this scorer can handle more disjunctions.
|
No worry, thanks still for the suggestion!
Ok I see, makes sense.
Sounds good! I have run these queries through benchmark and the results look somewhat consistent:
Looks like we may need to restrict the scorer to only term queries, or improve it for phrase queries? |
For Modified:
Baseline:
|
|
||
/** Scorer implementing Block-Max Maxscore algorithm */ | ||
public class BlockMaxMaxscoreScorer extends Scorer { | ||
private final ScoreMode scoreMode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't seem to actually need the scoreMode
, let's remove it and make sure Boolean2ScorerSupplier
only uses this scorer for TOP_SCORES?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Boolean2ScorerSupplier
already checks for scoreMode == ScoreMode.TOP_SCORES
before using this scorer.
} else if (top.doc > upTo) { | ||
target = upTo + 1; | ||
} else { | ||
float matchedMaxScoreSum = nonEssentialMaxScoreSum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other boolean queries sum up scores into a double
to reduce the impact of rounding on accuracy, though this only helps when there are 3 clauses or more, otherwise summing up into a double doesn't yield better accuracy. Can you either add a comment that this would need to be changed if we were to support more than two clauses with this scorer, or replace it with a double?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. I have replaced it with a double.
matchedMaxScoreSum += w.scorer.score(); | ||
} | ||
|
||
if (matchedMaxScoreSum < minCompetitiveScore) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look correct to me. The overall score could still be above the minCompetitiveScore
after summing up scores of the non-essential clauses? Should it be something like below:
if (matchedMaxScoreSum + nonEssentialMaxScoreSum < minCompetitiveScore) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm matchedMaxScoreSum
was actually initialized with nonEssentialMaxScoreSum
here , so nonEssentialMaxScoreSum
was already included in the calculation? Maybe the naming of the variable could be confusing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying, I was confused indeed by the name, which suggested to me that this score only included clauses that we "matched" so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup sorry about the confusion! I've renamed the variable to docScoreUpperBound
.
upTo = -1; | ||
for (DisiWrapper w : allScorers) { | ||
upTo = Math.max(w.scorer.advanceShallow(Math.max(w.doc, target)), upTo); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a comment that taking the max
is probably a good approach for 2 clauses but might need more thinking if we want this scorer to perform well with more than 2 clauses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return new TwoPhaseIterator(approximation) { | ||
@Override | ||
public boolean matches() throws IOException { | ||
return score() >= minCompetitiveScore; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing matches()
like that has the good side effect that we only pass good matches to the collector, but the bad side-effect that scores are computed twice for competitive hits. I wonder if we should try to cache the score so that it's not computed another time by the collector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good catch. I've updated it to cache the calculation.
@Override | ||
public float matchCost() { | ||
// maximum number of scorer that matches() might advance | ||
return allScorers.length - essentialsScorers.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matchCost
is expected to be fixed, we shouldn't try to adjust it based on the number of essential scorers, see e.g. how ConjunctionDISI
uses it to sort two-phase iterators up-front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I've updated it to use the length of all scorers to over-estimate the cost, since either essential or non-essential list may change in length.
Here are the latest benchmark results after the update:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks great. I left some comments that mostly relate to making this scorer work with more than 2 clauses.
With this change, I suspect that some scorers created in TestWANDScorer
would now use your new BlockMaxMaxScoreScorer
, which is going to decrease the coverage of WANDScorer. Can we somehow make sure that TestWANDScorer
always gets a WANDScorer
? E.g. I spotted this query under TestWANDScorer#testBasics
which likely uses your now scorer:
// test a filtered disjunction
query =
new BooleanQuery.Builder()
.add(
new BooleanQuery.Builder()
.add(
new BoostQuery(
new ConstantScoreQuery(new TermQuery(new Term("foo", "A"))), 2),
Occur.SHOULD)
.add(new ConstantScoreQuery(new TermQuery(new Term("foo", "B"))), Occur.SHOULD)
.build(),
Occur.MUST)
.add(new TermQuery(new Term("foo", "C")), Occur.FILTER)
.build();
lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/BlockMaxMaxscoreScorer.java
Outdated
Show resolved
Hide resolved
assert target <= upTo; | ||
|
||
for (DisiWrapper w : allScorers) { | ||
if (w.doc <= upTo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think w.doc
can ever be greater than upTo
given how upTo
is computed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that w.doc
is always less than or equal to upTo
since we use Math.max
to compute upTo
, but I'm also wondering if we should keep this branch still (with additional comment) since the way upTo
gets computed may change in the future for more clauses, and this branch might be easy to miss? On the other hand, I'm also fine with removing it for now and just update the comment above to mention handling this branch as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slight preference for the second approach (remove plus a comment) so that our test coverage build doesn't show up this branch as a branch that is never tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I have removed that branch and also changed the if
condition to be an assertion to help catch future changes.
// the "Optimizing Top-k Document Retrieval Strategies for Block-Max Indexes" paper. | ||
nonEssentialMaxScoreSum = 0; | ||
for (DisiWrapper w : allScorers) { | ||
if (nonEssentialMaxScoreSum + w.maxScoreFloat < minCompetitiveScore) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (nonEssentialMaxScoreSum + w.maxScoreFloat < minCompetitiveScore) { | |
if (maxScoreSumPropagator.scoreSumUpperBound(nonEssentialMaxScoreSum + w.maxScoreFloat) < minCompetitiveScore) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
|
||
for (DisiWrapper w = essentialsScorers.topList(); w != null; w = w.next) { | ||
docScoreUpperBound += w.scorer.score(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could probably cache the score that is computed here so that we don't re-compute it later in score(), I'm happy to leave it for a follow-up, it's up to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just gave it a try, but it seems caching a partial score from essential list here may require more changes / additional data structure inside score()
, as this scorer no longer maintains the non-essential list explicitly, but only remembers sum of maxScore from non-essential list, so it will need to differentiate scorers from non-essential list when using the cached essential list score. I think I will prefer to have a follow-up issue on this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created a follow-up issue for this https://issues.apache.org/jira/browse/LUCENE-10636.
@@ -106,7 +106,7 @@ static long scaleMaxScore(float maxScore, int scalingFactor) { | |||
* Scale min competitive scores the same way as max scores but this time by rounding down in order | |||
* to make sure that we do not miss any matches. | |||
*/ | |||
private static long scaleMinScore(float minScore, int scalingFactor) { | |||
static long scaleMinScore(float minScore, int scalingFactor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't seem to need to make this method pkg-private anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Yeah this is a good question. In my newly added tests I have used something like this to confirm it's testing the right scorer, but I'm not totally happy about this approach myself :
One alternative approach could be instantiating On the other hand, if we don't plan on initiating |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of creating WANDScorer more explicitly in tests. It doesn't look easy though and this change is already great so I wonder if we should keep it for a follow-up.
I reviewed the change and left some very minor comments but it looks great to me overall. Let's get it in.
import java.util.List; | ||
|
||
/** Scorer implementing Block-Max Maxscore algorithm */ | ||
public class BlockMaxMaxscoreScorer extends Scorer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be pkg-private?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
// sum of max scores of scorers in nonEssentialScorers list | ||
private double nonEssentialMaxScoreSum; | ||
|
||
private long cost; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's make it final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
|
||
for (DisiWrapper w = essentialsScorers.topList(); w != null; w = w.next) { | ||
docScoreUpperBound += w.scorer.score(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
assert target <= upTo; | ||
|
||
for (DisiWrapper w : allScorers) { | ||
if (w.doc <= upTo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slight preference for the second approach (remove plus a comment) so that our test coverage build doesn't show up this branch as a branch that is never tested.
@@ -118,6 +118,21 @@ private Scorer getInternal(long leadCost) throws IOException { | |||
leadCost); | |||
} | |||
|
|||
// pure two terms disjunction | |||
if (scoreMode == ScoreMode.TOP_SCORES | |||
&& minShouldMatch == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minShouldMatch = 1
also qualifies
&& minShouldMatch == 0 | |
&& minShouldMatch <= 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
@@ -39,6 +39,9 @@ public class DisiWrapper { | |||
// For WANDScorer | |||
long maxScore; | |||
|
|||
// For BlockMaxMaxscoreScorer | |||
float maxScoreFloat; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should call this one maxScore
and rename the other one scaledMaxScore
or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I like this idea. Updated.
Sounds good. I've created this follow-up issue https://issues.apache.org/jira/browse/LUCENE-10635 .
Awesome, thanks for all the review and feedback @jpountz, I really appreciate it! Iterating on the solution and seeing it improved each time is a lot of fun and I enjoy this process a lot! |
(cherry picked from commit 503ec55)
Thanks again @jpountz ! I've created the above PR to backport these changes to |
(cherry picked from commit 503ec55)
Description (or a Jira issue link if you have one)
Use Block-Max-Maxscore algorithm for 2 clauses disjunction. Adapted from PR #101