Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to Lucene QueryRescorer #7707

Closed
wants to merge 10 commits into from

Conversation

mikemccand
Copy link
Contributor

Simon started this with #6232 and I just simplified it a bit ...

One functional change I did was: if a first pass hit did not "participate" in rescoring (because it wasn't in the top window hits), I multiply its score by rescore.queryWeight() matching what we do when it did participate but the 2nd pass query didn't match it. I'm not sure that's "correct" but I think it's "more correct" than just using the raw first pass score ...

s1monw and others added 7 commits July 9, 2014 10:27
A query rescorer was added to lucene that basically does the same as our
rescorer. In-fact we designed it to work pretty similar. This commit
cuts over to the lucene version shipped with 4.8

Closes elastic#5922
ContextIndexSearcher searcher = context.searcher();
if (sourceExplanation == null) {
// this should not happen but just in case
return new ComplexExplanation(false, 0.0f, "nothing matched");
}
// TODO: this isn't right? I.e., we are incorrectly pretending all first pass hits were rescored? If the requested docID was
// beyond the top rescoreContext.window() in the first pass hits, we don't rescore it now?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 We should try to fix it (probably in another change, it doesn't look new?)

@jpountz
Copy link
Contributor

jpountz commented Oct 6, 2014

@mikemccand I tried to review this change and left some questions, but I'm not familiar with rescoring at all so they might be naive

@jpountz jpountz removed the review label Oct 6, 2014
…for pagination and when rescoring penalizes top hits
@mikemccand
Copy link
Contributor Author

I updated the docs to add caution about not changing the rescore window nor size, when stepping through pages, since that may "shift" the results.

I also updated the test cases to 1) test that pagination works (passing non-zero from indeed does what's expected), and 2) test the strange-yet-possible case where the rescorer causes results to have worse scores than non-rescored hits.

I think it's ready...

@@ -19,6 +19,11 @@ alternative rescorers may be made available, for example, a pair-wise rescorer.
<<search-request-search-type,`search_type`>> is set
to `scan` or `count`.

*Note:* when exposing pagination to your users, you should not change
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI using NOTE: instead of *Note:* would make it an admonition paragraph, which is probably what you want? https://github.com/elasticsearch/docs#admonition-paragraphs

@jpountz
Copy link
Contributor

jpountz commented Oct 17, 2014

+1 to push. I left comments since the rescoring behaviour is still obscure to me but I agree this change makes things better!

@jpountz jpountz removed the review label Oct 17, 2014
@mikemccand
Copy link
Contributor Author

Thanks @jpountz ... I improved the docs: I cutover to admonition paragraph (thanks for the pointer!) and softened it to say it's only window_size that must not be changed across page requests.

The "new score is lower than old score" problem, I think we should not try to tackle here: it's tricky, and it may be some use cases want to do this.

I also added another TODO, that it should really be ScoreMode that handles the "2nd pass query did't match" case ... right now we effectively hardwire to ScoreMode.Total.

I'll open a separate issue about explain ignoring window_size...

mikemccand added a commit that referenced this pull request Oct 18, 2014
This is functionally equivalent to before, so there should be no
user-visible impact, except I added a NOTE in the docs warning about
the interaction of pagination and rescoring.

Closes #6232

Closes #7707
@clintongormley clintongormley added the :Search/Search Search-related issues that do not fall into other categories label Mar 19, 2015
@clintongormley clintongormley changed the title Core: switch to Lucene QueryRescorer Switch to Lucene QueryRescorer Jun 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories v1.5.0 v2.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants