Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple rescores #4749

Closed
wants to merge 1 commit into from
Closed

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jan 16, 2014

Support multiple rescores

Detects if rescores arrive as an array instead of a plain object. If so
then parse each element of the array as a separate rescore to be executed
one after another. It looks like this:

   "rescore" : [ {
      "window_size" : 100,
      "query" : {
         "rescore_query" : {
            "match" : {
               "field1" : {
                  "query" : "the quick brown",
                  "type" : "phrase",
                  "slop" : 2
               }
            }
         },
         "query_weight" : 0.7,
         "rescore_query_weight" : 1.2
      }
   }, {
      "window_size" : 10,
      "query" : {
         "score_mode": "multiply",
         "rescore_query" : {
            "function_score" : {
               "script_score": {
                  "script": "log10(doc['numeric'].value + 2)"
               }
            }
         }
      }
   } ]

Rescores as a single object are still supported.

Also add documentation on score_mode when adding documentation about multiple
rescores.

Closes #4748
Closes #4742

@nik9000
Copy link
Member Author

nik9000 commented Jan 16, 2014

This isn't quite ready but it is worth reviewing I think. TODO:

  1. Test the explanation of multiple rescores.
  2. Documentation.
  3. Rest testing?

@nik9000
Copy link
Member Author

nik9000 commented Jan 16, 2014

Added test for explanation. It caught that I was building the explanations backwards so the first rescore looked like it processed output from the second when in fact the opposite is true.

@nik9000
Copy link
Member Author

nik9000 commented Jan 16, 2014

Wait! I had it backwards! The code was right and the test was wrong. Fixed.

@nik9000
Copy link
Member Author

nik9000 commented Jan 16, 2014

Added documentation and while I was in there documentation for score_mode.

@nik9000
Copy link
Member Author

nik9000 commented Jan 16, 2014

I don't think I have anything to do for the rest testing/rest spec because it just describes the body of the search request as "The search definition using the Query DSL" which I think the asciidoc covers.

linearly to produce the final `_score` for each document. The relative
importance of the original query and of the rescore query can be
controlled with the `query_weight` and `rescore_query_weight`
By defaulthe scores from the original query and the rescore query are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space missing between 'default' and 'the'

@jpountz
Copy link
Contributor

jpountz commented Jan 16, 2014

This looks good in general. However it seems to me that the second rescorer would be applied to all top docs but not only the first 10? (QueryRescorer.rescore rescores all top docs which was probably ok when there could be a single rescorer but now that there should be several ones, I think the TopDocsFilter should take the window size into account?)

@s1monw
Copy link
Contributor

s1monw commented Jan 16, 2014

cool stuff I like the feature!

@nik9000
Copy link
Member Author

nik9000 commented Jan 16, 2014

@jpountz I'll have a look at that in a bit. I thought I had a test that checked that if the second window is smaller then the first and the first doesn't pull the match into the window then the second one doesn't see it.

@nik9000
Copy link
Member Author

nik9000 commented Jan 16, 2014

@jpountz you were right of course. My test was actually backwards. It was making sure that the second rescore took effect when I wanted the opposite.

I've pushed a fix. I also did some reworking on QueryRescorer#rescore because it was a little twisted. The only real change is that TopDocsFilter now takes a maximum number of docs to filter and I set it to the rescore window. I also set the maximum number of docs returned by the searcher to the rescore window rather than the size of topdocs.

@nik9000
Copy link
Member Author

nik9000 commented Jan 23, 2014

Rebased. Is there anything else I should change in this?

@ghost ghost assigned jpountz Jan 23, 2014
@jpountz
Copy link
Contributor

jpountz commented Jan 23, 2014

This looks very good . My only concern right now is about the client API. Now that it is possible to have several rescorers per request, it feels wrong to me to have the setRescoreWindow method on SearchRequestBuilder. @s1monw what do you think?

Something that would be nice also would be to validate that rescore window sizes are in strictly descending order. Otherwise applying a rescorer that is followed by a rescorer with a greater window size would be useless I think?

@s1monw
Copy link
Contributor

s1monw commented Jan 23, 2014

IMO the rescore window should be max(default_window_size, max_rescorer_window where we can setRescoreWindow as a default so we don't need to specify it everywhere? and I agree we should sort the rescorer by windows size!

@nik9000
Copy link
Member Author

nik9000 commented Jan 23, 2014

Something that would be nice also would be to validate that rescore window sizes are in strictly descending order. Otherwise applying a rescorer that is followed by a rescorer with a greater window size would be useless I think?

I was thinking it might be nice to have a multiply rescore with a big window after a total rescore with a smaller window. The multiply must come after so you multiply the totalled score. The total has a smaller window because it is more expensive then the multiply.

setRescoreWindow as a default

I'll make this change and see what it looks like.

@jpountz
Copy link
Contributor

jpountz commented Jan 23, 2014

I was thinking it might be nice to have a multiply rescore with a big window after a total rescore with a smaller window. The multiply must come after so you multiply the totalled score. The total has a smaller window because it is more expensive then the multiply.

Agreed, let's not check the window sizes in order to allow for this kind of usage.

@nik9000
Copy link
Member Author

nik9000 commented Jan 23, 2014

Added another commit to make setRescoreWindow set a default and to use set/add instead of set/next. If you like it I'll squash the changes together. I didn't want to have to do git backflips to get back to the old api in case this one doesn't make sense.

@jpountz
Copy link
Contributor

jpountz commented Jan 23, 2014

Looks good to me. I'm going to merge this PR if there are no objections.

@nik9000
Copy link
Member Author

nik9000 commented Jan 23, 2014

Cool. Want me to squash the commits or will you handle it?

@s1monw
Copy link
Contributor

s1monw commented Jan 23, 2014

LGTM - would be awesome if we could have an issue for this as well to mark the versions etc. otherwise +1 to the feature thanks nik!

@nik9000
Copy link
Member Author

nik9000 commented Jan 23, 2014

awesome if we could have an issue for this as well to mark the versions

Is #4748 what you need?

@s1monw
Copy link
Contributor

s1monw commented Jan 23, 2014

yeah @jpountz made see it too :) sorry for the noise! ;)

@jpountz
Copy link
Contributor

jpountz commented Jan 23, 2014

Want me to squash the commits or will you handle it?

Actually what would be nice would be to split the change into one commit for documentation of score mode that I'll merge into 1.0,1.x and master and the rest that I'll merge into 1.x and master.

Detects if rescores arrive as an array instead of a plain object.  If so
then parse each element of the array as a separate rescore to be executed
one after another.  It looks like this:
   "rescore" : [ {
      "window_size" : 100,
      "query" : {
         "rescore_query" : {
            "match" : {
               "field1" : {
                  "query" : "the quick brown",
                  "type" : "phrase",
                  "slop" : 2
               }
            }
         },
         "query_weight" : 0.7,
         "rescore_query_weight" : 1.2
      }
   }, {
      "window_size" : 10,
      "query" : {
         "score_mode": "multiply",
         "rescore_query" : {
            "function_score" : {
               "script_score": {
                  "script": "log10(doc['numeric'].value + 2)"
               }
            }
         }
      }
   } ]

Rescores as a single object are still supported.

Closes elastic#4748
@nik9000
Copy link
Member Author

nik9000 commented Jan 23, 2014

Done.

@jpountz
Copy link
Contributor

jpountz commented Jan 23, 2014

Merged, thanks again Nik!

@jpountz jpountz closed this Jan 23, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sequential rescores [docs] Rescore's score_mode not documented
3 participants