-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-8464: Implement ConstantScoreScorer#setMinCompetitiveScore #495
Conversation
@jpountz PR is ready for review, thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments but the pr looks good overall. Thanks @cbismuth !
@@ -58,6 +58,14 @@ public float getMaxScore(int upTo) throws IOException { | |||
return score; | |||
} | |||
|
|||
@Override | |||
public void setMinCompetitiveScore(float minScore) throws IOException { | |||
float minScoreDown = Math.nextDown(minScore); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we tie-break on doc id and collect in doc id order I don't think you need to take the previous float here.
The TopScoreCollector
sets the minimum score as the next float after the score of the worst top document so you should not need to change the value here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this, the TestPrefixRandom#testPrefixes
test fails with score = 1.0
and minScore = 1.0000001
, so documents are not collected. What do you think of it?
java.lang.AssertionError: Collected: 21 but scorer: -1
at __randomizedtesting.SeedInfo.seed([C3B30196BD4F380C:91676D9F44FA34A5]:0)
at org.apache.lucene.search.AssertingLeafCollector.collect(AssertingLeafCollector.java:48)
at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:263)
at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:214)
at org.apache.lucene.search.AssertingBulkScorer.score(AssertingBulkScorer.java:81)
at org.apache.lucene.search.AssertingBulkScorer.score(AssertingBulkScorer.java:65)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:652)
at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:72)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:567)
at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:419)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:430)
at org.apache.lucene.search.TestPrefixRandom.assertSame(TestPrefixRandom.java:135)
at org.apache.lucene.search.TestPrefixRandom.testPrefixes(TestPrefixRandom.java:125)
[...]
NOTE: reproduce with: ant test -Dtestcase=TestPrefixRandom -Dtests.method=testPrefixes -Dtests.seed=C3B30196BD4F380C -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=lv-LV -Dtests.timezone=Asia/Oral -Dtests.asserts=true -Dtests.file.encoding=UTF-8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a tiny debugging session, it seems we're passing through the AssertingScorable#setMinCompetitiveScore
which not takes the previous float.
lucene-solr/lucene/test-framework/src/java/org/apache/lucene/search/AssertingScorable.java
Lines 41 to 44 in a9acdfd
@Override | |
public void setMinCompetitiveScore(float minScore) throws IOException { | |
in.setMinCompetitiveScore(minScore); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is because you need to add an indirection in ConstantScoreScorer#iterator
. Something like:
@Override
public DocIdSetIterator iterator() {
// add indirection so that if 'it' is updated then it will
// be taken into account
return new DocIdSetIterator() {
@Override
public int nextDoc() throws IOException {
return doc = disi.nextDoc();
}
@Override
public int docID() {
return doc;
}
@Override
public long cost() {
return disi.cost();
}
@Override
public int advance(int target) throws IOException {
return doc = disi.advance(target);
}
};
}
You replace disi
with an empty one when the min score is greater than the query score so you cannot return it directly there. You also need to save the current doc to make sure that you don't lost it when you reset the disi.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And for the record, here is where we set the minimum score to the next float. The AssertingScorable
just delegate the call to the underlying scorer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for your time, these are my Lucene baby steps. I'm currently debugging to understand why we need to add an indirection in ConstantScoreScorer#iterator
.
I understand that we need to save the current doc to not lose it. But even when I don't save it, the indirection you suggest make the test pass and I don't understand why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that we need to save the current doc to not lose it. But even when I don't save it, the indirection you suggest make the test pass and I don't understand why.
That's not related to this test. When minCompetitiveScore
is called the doc id iterator should stay on the current doc and only return NO_MORE_DOCS when it is advanced (nextDoc or advance). You can add a test to check this by calling minCompetitiveScore
in the test and then check that the iterator still return the current doc until it is advanced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thank you, I'll add an assertion in the dedicated test 👍
TestConstantScoreScorerIndex() throws IOException { | ||
directory = newDirectory(); | ||
|
||
writer = new RandomIndexWriter(random(), directory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need to use a merger that retains doc id order. Something like:
RandomIndexWriter w = new RandomIndexWriter(random(), dir, newIndexWriterConfig().setMergePolicy(
// retain doc id order
newLogMergePolicy(random().nextBoolean())));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 0ad2f78, thank you.
doc.add(newTextField(FIELD, VALUE, Field.Store.YES)); | ||
writer.addDocument(doc); | ||
} | ||
reader = writer.getReader(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The random index writer can commit segments randomly so you need to force merge to 1 segments to ensure that all documents are in a single segment:
writer.forceMerge(1);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 8d7d5ae, thank you.
Could you please explain me why docs have to be in a single segment to ensure this test passes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you check only one segment in the test:
lucene-solr/lucene/core/src/test/org/apache/lucene/search/TestConstantScoreScorer.java
Line 94 in 8d7d5ae
LeafReaderContext context = searcher.getIndexReader().leaves().get(0); |
... which is fine if you force merge to 1 segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, I understand, thank you 👍
Thanks a lot @jimczi for the very quick and very helpful review, I've made changes to improve this PR. |
Thanks a lot @jimczi, I've understood my mistakes 👍 PR is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ignore my previous comment as some tests are broken when the scorer has an approximation (see here and here).
Good catch. You also need to set the indirection in the approximation of the TwoPhaseIterator
.
Something like:
public final class ConstantScoreScorer extends Scorer {
private static class DocIdSetIteratorWrapper extends DocIdSetIterator {
DocIdSetIterator delegate;
int doc = -1;
DocIdSetIteratorWrapper(DocIdSetIterator delegate) {
this.delegate = delegate;
}
@Override
public int docID() {
return doc;
}
@Override
public int nextDoc() throws IOException {
return doc = delegate.nextDoc();
}
@Override
public int advance(int target) throws IOException {
return doc = delegate.advance(target);
}
@Override
public long cost() {
return delegate.cost();
}
}
private final float score;
private final TwoPhaseIterator twoPhaseIterator;
private DocIdSetIterator disi;
private DocIdSetIteratorWrapper approx;
/** Constructor based on a {@link DocIdSetIterator} which will be used to
* drive iteration. Two phase iteration will not be supported.
* @param weight the parent weight
* @param score the score to return on each document
* @param disi the iterator that defines matching documents */
public ConstantScoreScorer(Weight weight, float score, DocIdSetIterator disi) {
super(weight);
this.score = score;
this.twoPhaseIterator = null;
// we need to wrap the iterator to be able to reset it if minimum score is set
this.approx = new DocIdSetIteratorWrapper(disi);
this.disi = approx;
}
/** Constructor based on a {@link TwoPhaseIterator}. In that case the
* {@link Scorer} will support two-phase iteration.
* @param weight the parent weight
* @param score the score to return on each document
* @param twoPhaseIterator the iterator that defines matching documents */
public ConstantScoreScorer(Weight weight, float score, TwoPhaseIterator twoPhaseIterator) {
super(weight);
this.score = score;
// we need to wrap the approximation to be able to reset it if minimum score is set
this.approx = new DocIdSetIteratorWrapper(twoPhaseIterator.approximation());
this.twoPhaseIterator = new TwoPhaseIterator(approx) {
@Override
public boolean matches() throws IOException {
return twoPhaseIterator.matches();
}
@Override
public float matchCost() {
return twoPhaseIterator.matchCost();
}
};
this.disi = TwoPhaseIterator.asDocIdSetIterator(twoPhaseIterator);
}
@Override
public float getMaxScore(int upTo) throws IOException {
return score;
}
@Override
public DocIdSetIterator iterator() {
return disi;
}
@Override
public TwoPhaseIterator twoPhaseIterator() {
return twoPhaseIterator;
}
@Override
public int docID() {
return disi.docID();
}
@Override
public float score() throws IOException {
return score;
}
@Override
public void setMinCompetitiveScore(float minScore) throws IOException {
if (minScore > score) {
approx.delegate = DocIdSetIterator.empty();
}
}
@@ -70,7 +98,7 @@ public TwoPhaseIterator twoPhaseIterator() { | |||
|
|||
@Override | |||
public int docID() { | |||
return disi.docID(); | |||
return twoPhaseIterator != null ? twoPhaseIterator.approximation().docID() : doc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would introduce introduce a discrepancy between the docID returned by the scorer and the one returned by the iterator. You need to apply the indirection to the approximation of the twoPhaseIterator
as explained here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it broke some tests.
Great! Exactly what I did except declaring a static inner class, I’ll declare it 👍 thanks! |
Hi @jimczi, PR is up-to-date and ready for review. All Lucene tests are green on my side. As you'll see, I'd like to suggest a slightly different implementation to keep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you'll see, I'd like to suggest a slightly different implementation to keep approximation class attribute null when no two phase iterator has been passed to the ConstantScoreScorer constructor. Feel free to let me now if you prefer to keep your proposal, thanks a lot.
I don't mind changing my proposal but your implementation doesn't work. If you have a twoPhaseIterator
you just need to set its internal approximation to an empty iterator. That's what the wrapping of the TwoPhaseIterator
does, it adds an indirection to its approximation in order to be able to reset the iterator.
Thank you @jimczi. I've added parametrized tests with phrase and term queries to check the two constructors of the I hope it's a good one! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left more comments ;)
We should also ensure is that change doesn't affect queries that don't use the max score optimization (ScoreMode.COMPLETE
or ScoreMode.COMPLETE_NO_SCORES
).
Can you wrap the iterator (and the twoPhaseIterator) only if the scoreMode
is set to ScoreMode.TOP_SCORES
? This requires to change the constructors:
public ConstantScoreScorer(Weight weight, float score, DocIdSetIterator disi, ScoreMode scoreMode) {
super(weight);
this.score = score;
this.approximation = scoreMode == ScoreMode.TOP_SCORES ? new DocIdSetIteratorWrapper(disi) : disi;
this.twoPhaseIterator = null;
this.disi = disi;
this.disi = this.approximation;
}
but this change should ensure that we'll not slow down the constant score query when we cannot apply the max score optimization.
@@ -70,7 +120,7 @@ public TwoPhaseIterator twoPhaseIterator() { | |||
|
|||
@Override | |||
public int docID() { | |||
return disi.docID(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can keep disi.docID()
here and move doc
to the DocIdSetIteratorWrapper
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally, fixed in 114013f, thanks 👍
|
||
public class TestConstantScoreScorer extends LuceneTestCase { | ||
private static final String FIELD = "f"; | ||
private static final String[] VALUES = new String[]{"foo", "bar", "foo bar", "azerty"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add documents that match foo
and bar
but not as a phrase query to test the two phase matching ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! Done in faa0c77.
private static final String FIELD = "f"; | ||
private static final String[] VALUES = new String[]{"foo", "bar", "foo bar", "azerty"}; | ||
|
||
@ParametersFactory(argumentFormatting = "query=%s") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make explicit test case instead ? We don't have permutations to tests here so being explicit should simplify the reading ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's part of faa0c77.
I'll default the score mode to EDIT: I should have defaulted new parameter to the raw string |
Lucene tests are all green on my side, I'll launch Solr ones later this day. I'm not at ease with these two changes here and there. I'm not sure I can easily retrieve the score mode from the What do you think about it? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cbismuth . I left more comments but I think we're getting close
@@ -766,7 +766,7 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti | |||
return new ScorerSupplier() { | |||
@Override | |||
public Scorer get(long LeadCost) throws IOException { | |||
return new ConstantScoreScorer(CachingWrapperWeight.this, 0f, disi); | |||
return new ConstantScoreScorer(CachingWrapperWeight.this, 0f, ScoreMode.TOP_SCORES, disi); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the cached version so the score is never needed in this case. ScoreMode.COMPLETE_NO_SCORES
should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, query caching is related to filters, so no score, thanks 👍
Fixed in d114c02.
@@ -844,7 +844,7 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOException { | |||
return null; | |||
} | |||
|
|||
return new DefaultBulkScorer(new ConstantScoreScorer(this, 0f, disi)); | |||
return new DefaultBulkScorer(new ConstantScoreScorer(this, 0f, ScoreMode.TOP_SCORES, disi)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, part of d114c02.
@@ -234,12 +234,12 @@ public void testConjunction() throws IOException { | |||
case 0: | |||
// simple iterator | |||
sets[i] = set; | |||
iterators[i] = new ConstantScoreScorer(new FakeWeight(), 0f, anonymizeIterator(new BitDocIdSet(set).iterator())); | |||
iterators[i] = new ConstantScoreScorer(new FakeWeight(), 0f, ScoreMode.TOP_SCORES, anonymizeIterator(new BitDocIdSet(set).iterator())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd use SearchMode.COMPLETE_NO_SCORES
to make sure that the expected behavior in the test doesn't change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally, allow to visit all results and score isn't used in test 👍
Fixed in d5c82d8.
@@ -1903,9 +1903,6 @@ public void testRangeOptimizesIfAllPointsMatch() throws IOException { | |||
upperBound[i] = value[i] + random().nextInt(1); | |||
} | |||
Query query = IntPoint.newRangeQuery("point", lowerBound, upperBound); | |||
Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1); | |||
Scorer scorer = weight.scorer(searcher.getIndexReader().leaves().get(0)); | |||
assertEquals(DocIdSetIterator.all(1).getClass(), scorer.iterator().getClass()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The iterator is not wrapped when the score mode is set to COMPLETE_NO_SCORES
so you don't need to change this assertion anymore ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! That's good news, thanks! Fixed in 352dce1.
Thanks @jimczi, I'll rerun all Lucene tests to ensure nothing is broken and I'll let you know. |
Ant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ConstantScoreScorer constantScoreScorer(Query query, float score, ScoreMode scoreMode) throws IOException { | ||
IndexSearcher searcher = newSearcher(reader); | ||
Weight weight = searcher.createWeight(new ConstantScoreQuery(query), scoreMode, 1); | ||
LeafReaderContext context = searcher.getIndexReader().leaves().get(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's assert that leaves.size() is 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, fixed in c5a8317, thanks 👍
* Now support the HEAD verb by doing normal GET and then dropping the response body. Backported from apache/solr@d60458a
See LUCENE-8464.