Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LUCENE-8464: Implement ConstantScoreScorer#setMinCompetitiveScore #495

Closed
wants to merge 39 commits into from
Closed

LUCENE-8464: Implement ConstantScoreScorer#setMinCompetitiveScore #495

wants to merge 39 commits into from

Conversation

cbismuth
Copy link
Contributor

@cbismuth cbismuth commented Nov 8, 2018

We should make it so the iterator returns NO_MORE_DOCS after setMinCompetitiveScore is called with a value that is greater than the constant score.

See LUCENE-8464.

@cbismuth
Copy link
Contributor Author

cbismuth commented Nov 8, 2018

@jpountz PR is ready for review, thank you.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments but the pr looks good overall. Thanks @cbismuth !

@@ -58,6 +58,14 @@ public float getMaxScore(int upTo) throws IOException {
return score;
}

@Override
public void setMinCompetitiveScore(float minScore) throws IOException {
float minScoreDown = Math.nextDown(minScore);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we tie-break on doc id and collect in doc id order I don't think you need to take the previous float here.
The TopScoreCollector sets the minimum score as the next float after the score of the worst top document so you should not need to change the value here.

Copy link
Contributor Author

@cbismuth cbismuth Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this, the TestPrefixRandom#testPrefixes test fails with score = 1.0 and minScore = 1.0000001, so documents are not collected. What do you think of it?

java.lang.AssertionError: Collected: 21 but scorer: -1

	at __randomizedtesting.SeedInfo.seed([C3B30196BD4F380C:91676D9F44FA34A5]:0)
	at org.apache.lucene.search.AssertingLeafCollector.collect(AssertingLeafCollector.java:48)
	at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:263)
	at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:214)
	at org.apache.lucene.search.AssertingBulkScorer.score(AssertingBulkScorer.java:81)
	at org.apache.lucene.search.AssertingBulkScorer.score(AssertingBulkScorer.java:65)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:652)
	at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:72)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:567)
	at org.apache.lucene.search.IndexSearcher.searchAfter(IndexSearcher.java:419)
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:430)
	at org.apache.lucene.search.TestPrefixRandom.assertSame(TestPrefixRandom.java:135)
	at org.apache.lucene.search.TestPrefixRandom.testPrefixes(TestPrefixRandom.java:125)
	[...]

NOTE: reproduce with: ant test  -Dtestcase=TestPrefixRandom -Dtests.method=testPrefixes -Dtests.seed=C3B30196BD4F380C -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=lv-LV -Dtests.timezone=Asia/Oral -Dtests.asserts=true -Dtests.file.encoding=UTF-8

Copy link
Contributor Author

@cbismuth cbismuth Nov 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a tiny debugging session, it seems we're passing through the AssertingScorable#setMinCompetitiveScore which not takes the previous float.

@Override
public void setMinCompetitiveScore(float minScore) throws IOException {
in.setMinCompetitiveScore(minScore);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is because you need to add an indirection in ConstantScoreScorer#iterator. Something like:

 @Override
  public DocIdSetIterator iterator() {
    // add indirection so that if 'it' is updated then it will
    // be taken into account
    return new DocIdSetIterator() {
      @Override
      public int nextDoc() throws IOException {
        return doc = disi.nextDoc();
      }

      @Override
      public int docID() {
        return doc;
      }

      @Override
      public long cost() {
        return disi.cost();
      }

      @Override
      public int advance(int target) throws IOException {
        return doc = disi.advance(target);
      }
    };
  }

You replace disi with an empty one when the min score is greater than the query score so you cannot return it directly there. You also need to save the current doc to make sure that you don't lost it when you reset the disi.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And for the record, here is where we set the minimum score to the next float. The AssertingScorable just delegate the call to the underlying scorer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your time, these are my Lucene baby steps. I'm currently debugging to understand why we need to add an indirection in ConstantScoreScorer#iterator.

I understand that we need to save the current doc to not lose it. But even when I don't save it, the indirection you suggest make the test pass and I don't understand why.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that we need to save the current doc to not lose it. But even when I don't save it, the indirection you suggest make the test pass and I don't understand why.

That's not related to this test. When minCompetitiveScore is called the doc id iterator should stay on the current doc and only return NO_MORE_DOCS when it is advanced (nextDoc or advance). You can add a test to check this by calling minCompetitiveScore in the test and then check that the iterator still return the current doc until it is advanced.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thank you, I'll add an assertion in the dedicated test 👍

TestConstantScoreScorerIndex() throws IOException {
directory = newDirectory();

writer = new RandomIndexWriter(random(), directory);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to use a merger that retains doc id order. Something like:

RandomIndexWriter w = new RandomIndexWriter(random(), dir, newIndexWriterConfig().setMergePolicy(
        // retain doc id order
        newLogMergePolicy(random().nextBoolean())));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 0ad2f78, thank you.

doc.add(newTextField(FIELD, VALUE, Field.Store.YES));
writer.addDocument(doc);
}
reader = writer.getReader();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The random index writer can commit segments randomly so you need to force merge to 1 segments to ensure that all documents are in a single segment:

writer.forceMerge(1);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8d7d5ae, thank you.

Could you please explain me why docs have to be in a single segment to ensure this test passes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you check only one segment in the test:

LeafReaderContext context = searcher.getIndexReader().leaves().get(0);

... which is fine if you force merge to 1 segment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes, I understand, thank you 👍

@cbismuth
Copy link
Contributor Author

cbismuth commented Nov 9, 2018

Thanks a lot @jimczi for the very quick and very helpful review, I've made changes to improve this PR.

@cbismuth
Copy link
Contributor Author

cbismuth commented Nov 9, 2018

Thanks a lot @jimczi, I've understood my mistakes 👍 PR is ready for review.

@cbismuth
Copy link
Contributor Author

cbismuth commented Nov 9, 2018

Please ignore my previous comment as some tests are broken when the scorer has an approximation (see here and here).

I'm working on fixing them.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please ignore my previous comment as some tests are broken when the scorer has an approximation (see here and here).

Good catch. You also need to set the indirection in the approximation of the TwoPhaseIterator.
Something like:

public final class ConstantScoreScorer extends Scorer {
  private static class DocIdSetIteratorWrapper extends DocIdSetIterator {
    DocIdSetIterator delegate;
    int doc = -1;

    DocIdSetIteratorWrapper(DocIdSetIterator delegate) {
      this.delegate = delegate;
    }

    @Override
    public int docID() {
      return doc;
    }

    @Override
    public int nextDoc() throws IOException {
      return doc = delegate.nextDoc();
    }

    @Override
    public int advance(int target) throws IOException {
      return doc = delegate.advance(target);
    }

    @Override
    public long cost() {
      return delegate.cost();
    }
  }

  private final float score;
  private final TwoPhaseIterator twoPhaseIterator;
  private DocIdSetIterator disi;
  private DocIdSetIteratorWrapper approx;

  /** Constructor based on a {@link DocIdSetIterator} which will be used to
   *  drive iteration. Two phase iteration will not be supported.
   *  @param weight the parent weight
   *  @param score the score to return on each document
   *  @param disi the iterator that defines matching documents */
  public ConstantScoreScorer(Weight weight, float score, DocIdSetIterator disi) {
    super(weight);
    this.score = score;
    this.twoPhaseIterator = null;
    // we need to wrap the iterator to be able to reset it if minimum score is set
    this.approx = new DocIdSetIteratorWrapper(disi);
    this.disi = approx;
  }

  /** Constructor based on a {@link TwoPhaseIterator}. In that case the
   *  {@link Scorer} will support two-phase iteration.
   *  @param weight the parent weight
   *  @param score the score to return on each document
   *  @param twoPhaseIterator the iterator that defines matching documents */
  public ConstantScoreScorer(Weight weight, float score, TwoPhaseIterator twoPhaseIterator) {
    super(weight);
    this.score = score;
    // we need to wrap the approximation to be able to reset it if minimum score is set
    this.approx = new DocIdSetIteratorWrapper(twoPhaseIterator.approximation());
    this.twoPhaseIterator = new TwoPhaseIterator(approx) {
      @Override
      public boolean matches() throws IOException {
        return twoPhaseIterator.matches();
      }

      @Override
      public float matchCost() {
        return twoPhaseIterator.matchCost();
      }
    };
    this.disi = TwoPhaseIterator.asDocIdSetIterator(twoPhaseIterator);
  }

  @Override
  public float getMaxScore(int upTo) throws IOException {
    return score;
  }

  @Override
  public DocIdSetIterator iterator() {
    return disi;
  }

  @Override
  public TwoPhaseIterator twoPhaseIterator() {
    return twoPhaseIterator;
  }

  @Override
  public int docID() {
    return disi.docID();
  }

  @Override
  public float score() throws IOException {
    return score;
  }

  @Override
  public void setMinCompetitiveScore(float minScore) throws IOException {
    if (minScore > score) {
      approx.delegate = DocIdSetIterator.empty();
    }
  }

@@ -70,7 +98,7 @@ public TwoPhaseIterator twoPhaseIterator() {

@Override
public int docID() {
return disi.docID();
return twoPhaseIterator != null ? twoPhaseIterator.approximation().docID() : doc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would introduce introduce a discrepancy between the docID returned by the scorer and the one returned by the iterator. You need to apply the indirection to the approximation of the twoPhaseIterator as explained here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it broke some tests.

@cbismuth
Copy link
Contributor Author

cbismuth commented Nov 9, 2018

Great! Exactly what I did except declaring a static inner class, I’ll declare it 👍 thanks!

@cbismuth
Copy link
Contributor Author

cbismuth commented Nov 9, 2018

Well you're right, without a static wrapper (see 19e156c) I can't assign approximation to DocIdSetIterator.empty() after setting minCompetitiveScore.

I'll update this PR with your suggestions and let you know, thanks a lot @jimczi for your help!

@cbismuth
Copy link
Contributor Author

Hi @jimczi, PR is up-to-date and ready for review. All Lucene tests are green on my side.

As you'll see, I'd like to suggest a slightly different implementation to keep approximation class attribute null when no two phase iterator has been passed to the ConstantScoreScorer constructor. Feel free to let me now if you prefer to keep your proposal, thanks a lot.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you'll see, I'd like to suggest a slightly different implementation to keep approximation class attribute null when no two phase iterator has been passed to the ConstantScoreScorer constructor. Feel free to let me now if you prefer to keep your proposal, thanks a lot.

I don't mind changing my proposal but your implementation doesn't work. If you have a twoPhaseIterator you just need to set its internal approximation to an empty iterator. That's what the wrapping of the TwoPhaseIterator does, it adds an indirection to its approximation in order to be able to reset the iterator.

@cbismuth
Copy link
Contributor Author

Thank you @jimczi.

I've added parametrized tests with phrase and term queries to check the two constructors of the ConstantScoreScorer class.
And I also came back to your original proposal (with this.disi = this.approximation) as it's the best fit to reset the iterator.

I hope it's a good one!

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left more comments ;)
We should also ensure is that change doesn't affect queries that don't use the max score optimization (ScoreMode.COMPLETE or ScoreMode.COMPLETE_NO_SCORES).
Can you wrap the iterator (and the twoPhaseIterator) only if the scoreMode is set to ScoreMode.TOP_SCORES ? This requires to change the constructors:

public ConstantScoreScorer(Weight weight, float score, DocIdSetIterator disi, ScoreMode scoreMode) {
    super(weight);
    this.score = score;
    this.approximation = scoreMode == ScoreMode.TOP_SCORES ? new DocIdSetIteratorWrapper(disi) : disi;
    this.twoPhaseIterator = null;
    this.disi = disi;
    this.disi = this.approximation;
  }

but this change should ensure that we'll not slow down the constant score query when we cannot apply the max score optimization.

@@ -70,7 +120,7 @@ public TwoPhaseIterator twoPhaseIterator() {

@Override
public int docID() {
return disi.docID();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can keep disi.docID() here and move doc to the DocIdSetIteratorWrapper ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, fixed in 114013f, thanks 👍


public class TestConstantScoreScorer extends LuceneTestCase {
private static final String FIELD = "f";
private static final String[] VALUES = new String[]{"foo", "bar", "foo bar", "azerty"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add documents that match foo and bar but not as a phrase query to test the two phase matching ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Done in faa0c77.

private static final String FIELD = "f";
private static final String[] VALUES = new String[]{"foo", "bar", "foo bar", "azerty"};

@ParametersFactory(argumentFormatting = "query=%s")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make explicit test case instead ? We don't have permutations to tests here so being explicit should simplify the reading ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's part of faa0c77.

@cbismuth
Copy link
Contributor Author

cbismuth commented Nov 14, 2018

I'll default the score mode to ScoreMode.TOP_SCORES in all ~45 constructor callers. And then, I'll try to change it to a more sensible value based on the caller context.

EDIT: I should have defaulted new parameter to the raw string scoreMode it would have save me some time 😅 as naming guidelines are good all over the code base 👍

@cbismuth
Copy link
Contributor Author

Lucene tests are all green on my side, I'll launch Solr ones later this day.

I'm not at ease with these two changes here and there. I'm not sure I can easily retrieve the score mode from the CachingWrapperWeight class, therefore I passed ScoreMode.TOP_SCORE to not prevent from setting min competitive score.

What do you think about it?

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cbismuth . I left more comments but I think we're getting close

@@ -766,7 +766,7 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti
return new ScorerSupplier() {
@Override
public Scorer get(long LeadCost) throws IOException {
return new ConstantScoreScorer(CachingWrapperWeight.this, 0f, disi);
return new ConstantScoreScorer(CachingWrapperWeight.this, 0f, ScoreMode.TOP_SCORES, disi);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the cached version so the score is never needed in this case. ScoreMode.COMPLETE_NO_SCORES should be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, query caching is related to filters, so no score, thanks 👍

Fixed in d114c02.

@@ -844,7 +844,7 @@ public BulkScorer bulkScorer(LeafReaderContext context) throws IOException {
return null;
}

return new DefaultBulkScorer(new ConstantScoreScorer(this, 0f, disi));
return new DefaultBulkScorer(new ConstantScoreScorer(this, 0f, ScoreMode.TOP_SCORES, disi));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, part of d114c02.

@@ -234,12 +234,12 @@ public void testConjunction() throws IOException {
case 0:
// simple iterator
sets[i] = set;
iterators[i] = new ConstantScoreScorer(new FakeWeight(), 0f, anonymizeIterator(new BitDocIdSet(set).iterator()));
iterators[i] = new ConstantScoreScorer(new FakeWeight(), 0f, ScoreMode.TOP_SCORES, anonymizeIterator(new BitDocIdSet(set).iterator()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use SearchMode.COMPLETE_NO_SCORES to make sure that the expected behavior in the test doesn't change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, allow to visit all results and score isn't used in test 👍

Fixed in d5c82d8.

@@ -1903,9 +1903,6 @@ public void testRangeOptimizesIfAllPointsMatch() throws IOException {
upperBound[i] = value[i] + random().nextInt(1);
}
Query query = IntPoint.newRangeQuery("point", lowerBound, upperBound);
Weight weight = searcher.createWeight(query, ScoreMode.COMPLETE_NO_SCORES, 1);
Scorer scorer = weight.scorer(searcher.getIndexReader().leaves().get(0));
assertEquals(DocIdSetIterator.all(1).getClass(), scorer.iterator().getClass());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The iterator is not wrapped when the score mode is set to COMPLETE_NO_SCORES so you don't need to change this assertion anymore ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! That's good news, thanks! Fixed in 352dce1.

@cbismuth
Copy link
Contributor Author

Thanks @jimczi, I'll rerun all Lucene tests to ensure nothing is broken and I'll let you know.

@cbismuth
Copy link
Contributor Author

Ant precommit task and Lucene test are all green.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ConstantScoreScorer constantScoreScorer(Query query, float score, ScoreMode scoreMode) throws IOException {
IndexSearcher searcher = newSearcher(reader);
Weight weight = searcher.createWeight(new ConstantScoreQuery(query), scoreMode, 1);
LeafReaderContext context = searcher.getIndexReader().leaves().get(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's assert that leaves.size() is 1?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, fixed in c5a8317, thanks 👍

@cbismuth cbismuth closed this Nov 20, 2018
@cbismuth cbismuth deleted the LUCENE-8464 branch April 3, 2019 21:11
@cbismuth cbismuth restored the LUCENE-8464 branch April 3, 2019 21:11
@cbismuth cbismuth deleted the LUCENE-8464 branch April 3, 2019 21:12
janhoy pushed a commit to janhoy/lucene-solr that referenced this pull request May 12, 2022
* Now support the HEAD verb by doing normal GET and then dropping the response body.

Backported from apache/solr@d60458a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants