-
Notifications
You must be signed in to change notification settings - Fork 982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run top-level conjunctions of term queries with a specialized BulkScorer. #12382
Run top-level conjunctions of term queries with a specialized BulkScorer. #12382
Conversation
…rer. This implements a specialized `BlockMaxConjunctionBulkScorer`, which is really the same as `BlockMaxConjunctionScorer`, but as a `BulkScorer` instead of a `Scorer`. Also it doesn't support two-phase iterators in order to focus on the common case when queries, such as term queries, do not have two-phase iterators. If a clause has a two-phase iterator, it will keep running as a `BlockMaxConjunctionScorer` wrapped in a `DefaultBulkScorer`.
The speedup is not as high as I had hoped, so I'm leaning towards not merging this PR. Maybe someone figures out how to make this code run faster!
|
Reopening as I'm now seeing speedups. It's possible it's related to other changes that happened since last time I looked, or to the specific tasks that get picked by luceneutil. Here's the output of a run on wikibigall:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numbers are impressive. I have some code design/flow comments.
lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/BlockMaxConjunctionBulkScorer.java
Outdated
Show resolved
Hide resolved
lucene/core/src/java/org/apache/lucene/search/BooleanWeight.java
Outdated
Show resolved
Hide resolved
I agree... it makes me suspicous. I'll verify my |
I doubt this is your problem, but I have made the mistake in the past of benchmarking the incorrect commit because either:
|
Running with taskCountPerCat=5 and taskRepeatCount=50 gave very similar results:
I did a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
FWIW I ran the benchmark from https://tantivy-search.github.io/bench/ and also observed a speedup on conjunctions, so I think that the speedup is indeed real. |
…rer. (#12382) This implements a specialized `BlockMaxConjunctionBulkScorer`, which is really the same as `BlockMaxConjunctionScorer`, but as a `BulkScorer` instead of a `Scorer`. Also it doesn't support two-phase iterators in order to focus on the common case when queries, such as term queries, do not have two-phase iterators. If a clause has a two-phase iterator, it will keep running as a `BlockMaxConjunctionScorer` wrapped in a `DefaultBulkScorer`.
This gave a good speedup to AndHighHigh on nightlies: +15%, but AndHighMed had a small drop: -1.5%. It still looks positive overall as it's making the slow query (AndHighHigh, 22 QPS) faster and the fast query (AndHighMed, 51 QPS) slightly slower but I'm still a bit puzzled why I'm seeing such different numbers on my local benchmark and on nightlies. |
git bisect has identified Example of failure...
git bisect log...
The week ending on 2023-09-15, EDIT: Updated misleading final paragraph because i realized i had misread the calendar |
Thanks Hoss! I had missed this failure, it looks like a real one. I'm looking. |
I just pushed a fix: 3f81f2f. |
PR apache#12382 added a bulk scorer for top-k hits on conjunctions that yielded a significant speedup (annotation [FP](http://people.apache.org/~mikemccand/lucenebench/AndHighHigh.html)). This change proposes a similar change for exhaustive collection of conjunctive queries, e.g. for counting, faceting, etc.
PR #12382 added a bulk scorer for top-k hits on conjunctions that yielded a significant speedup (annotation [FP](http://people.apache.org/~mikemccand/lucenebench/AndHighHigh.html)). This change proposes a similar change for exhaustive collection of conjunctive queries, e.g. for counting, faceting, etc.
PR #12382 added a bulk scorer for top-k hits on conjunctions that yielded a significant speedup (annotation [FP](http://people.apache.org/~mikemccand/lucenebench/AndHighHigh.html)). This change proposes a similar change for exhaustive collection of conjunctive queries, e.g. for counting, faceting, etc.
This implements a specialized
BlockMaxConjunctionBulkScorer
, which is similar toBlockMaxConjunctionScorer
with the following differences:BulkScorer
instead of aScorer
(ie. for top-level conjunctions only).