New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to Lucene QueryRescorer #7707
Conversation
A query rescorer was added to lucene that basically does the same as our rescorer. In-fact we designed it to work pretty similar. This commit cuts over to the lucene version shipped with 4.8 Closes elastic#5922
…hits are scored to be more consistent
ContextIndexSearcher searcher = context.searcher(); | ||
if (sourceExplanation == null) { | ||
// this should not happen but just in case | ||
return new ComplexExplanation(false, 0.0f, "nothing matched"); | ||
} | ||
// TODO: this isn't right? I.e., we are incorrectly pretending all first pass hits were rescored? If the requested docID was | ||
// beyond the top rescoreContext.window() in the first pass hits, we don't rescore it now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 We should try to fix it (probably in another change, it doesn't look new?)
@mikemccand I tried to review this change and left some questions, but I'm not familiar with rescoring at all so they might be naive |
…for pagination and when rescoring penalizes top hits
I updated the docs to add caution about not changing the rescore window nor size, when stepping through pages, since that may "shift" the results. I also updated the test cases to 1) test that pagination works (passing non-zero from indeed does what's expected), and 2) test the strange-yet-possible case where the rescorer causes results to have worse scores than non-rescored hits. I think it's ready... |
@@ -19,6 +19,11 @@ alternative rescorers may be made available, for example, a pair-wise rescorer. | |||
<<search-request-search-type,`search_type`>> is set | |||
to `scan` or `count`. | |||
|
|||
*Note:* when exposing pagination to your users, you should not change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI using NOTE:
instead of *Note:*
would make it an admonition paragraph, which is probably what you want? https://github.com/elasticsearch/docs#admonition-paragraphs
+1 to push. I left comments since the rescoring behaviour is still obscure to me but I agree this change makes things better! |
Thanks @jpountz ... I improved the docs: I cutover to admonition paragraph (thanks for the pointer!) and softened it to say it's only window_size that must not be changed across page requests. The "new score is lower than old score" problem, I think we should not try to tackle here: it's tricky, and it may be some use cases want to do this. I also added another TODO, that it should really be ScoreMode that handles the "2nd pass query did't match" case ... right now we effectively hardwire to ScoreMode.Total. I'll open a separate issue about explain ignoring window_size... |
Simon started this with #6232 and I just simplified it a bit ...
One functional change I did was: if a first pass hit did not "participate" in rescoring (because it wasn't in the top window hits), I multiply its score by rescore.queryWeight() matching what we do when it did participate but the 2nd pass query didn't match it. I'm not sure that's "correct" but I think it's "more correct" than just using the raw first pass score ...