-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-9402: Let MultiCollector handle minCompetitiveScore #1567
Conversation
// Ignore calls to setMinCompetitiveScore so that if we wrap two | ||
// collectors and one of them wants to skip low-scoring hits, then | ||
// the other collector still sees all hits. We could try to reconcile | ||
// min scores and take the maximum min score across collectors, but | ||
// this is very unlikely to be helpful in practice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should fix this comment
#1566 contains the Solr related changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the way that MinCompetitiveScoreAwareScorable
relies on indices (idx
) could interfere badly with how we remove leaf collectors when there is a CollectionTerminatedException
. The easier way to fix this would be to change the existing handling of CollectionTerminatedException
to no longer compact the array of leaf collectors but just leave null
s and make collect
skip over null
s?
this.collectors = collectors.toArray(new LeafCollector[collectors.size()]); | ||
this.cacheScores = cacheScores; | ||
this.numCollectors = this.collectors.length; | ||
this.skipNonCompetitiveScores = skipNonCompetitive; | ||
this.minScores = new float[this.skipNonCompetitiveScores ? this.numCollectors : 0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make the array null when skipNonCompetitiveScores
is false?
this.minScores = new float[this.skipNonCompetitiveScores ? this.numCollectors : 0]; | |
this.minScores = this.skipNonCompetitiveScores ? new float[ this.numCollectors] : null; |
Directory dir = newDirectory(); | ||
RandomIndexWriter iw = new RandomIndexWriter(random(), dir); | ||
iw.addDocument(new Document()); | ||
iw.commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to commit, the follow-up call to getReader() creates a NRT segment anyway
Ah! good Catch, I missed that completely. I'll fix. |
if (skipNonCompetitiveScores) { | ||
for (int i = 0; i < collectors.length; ++i) { | ||
final LeafCollector c = collectors[i]; | ||
assert c != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that this assertion is right, the collector could be null if the collector already threw a CollectionTerminatedException? (we don't disallow calling setCollector
after collection started)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm I had the null check before, but I thought setScorer
had to only be called before the collect
calls because of the javadoc: Called before successive calls to {@link #collect(int)}.
. I'll put the null checks back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we have at least BooleanScorer that would call setScorer multiple times, since setScorer is called at the beginning of BulkScorer#score
and BooleanScorer
calls BulkScorer#score
on ranges of doc IDs.
assertTrue("c2's collect should be called", c2.collectCalled); | ||
c1.collectCalled = false; | ||
c2.collectCalled = false; | ||
lc.collect(1); // OK, but c1 should terminate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe create a new variant of this test that calls setScorer again after this collect call?
* upstream/master: (218 commits) LUCENE-9412 Do not validate jenkins HTTPS cert LUCENE-8962: add ability to selectively merge on commit (apache#1552) Replace DWPT.DocState with simple method parameters (apache#1594) LUCENE-9402: Let MultiCollector handle minCompetitiveScore (apache#1567) SOLR-14574: Fix or suppress warnings in solr/core/src/test (part 2) SOLR-14561 CoreAdminAPI's parameters instanceDir and dataDir are now validated (apache#1572) SOLR-14532: Add *.iml files to gitignore SOLR-14577: Return BAD REQUEST when field is missing in terms QP (apache#1588) SOLR-14574: Fix or suppress warnings in solr/core/src/test (part 1) remove debug code LUCENE-9408: roll back only called once enforcement LUCENE-8962: Allow waiting for all merges in a merge spec (apache#1585) SOLR-14572 document missing SearchComponents (apache#1581) LUCENE-9359: Avoid test failures when the extra file is a dir. SOLR-14573: Fix or suppress warnings in solrj/src/test LUCENE-9353: Move terms metadata to its own file. (apache#1473) Cleanup TermsHashPerField (apache#1573) SOLR-14558: Record all log lines in SolrLogPostTool (apache#1570) LUCENE-9404: simplify checksum calculation of ByteBuffersIndexOutput LUCENE-9403: tune BufferedChecksum.DEFAULT_BUFFERSIZE ...
Here is an idea to make MultiCollector be able to handle
minCompetitiveScore
. Looking at this comment in the code:This implementation tries to make it so that all collectors can see all the hits and only allow skipping if all collectors set a min competitive score. The value passed to the inner scorer is the minimum among all collectors (instead of the maximum as the comment suggests).