New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Function Score: Refactor RandomScoreFunction to be consistent, and return values in range [0.0, 1.0] #7446
Conversation
…urn values in rang [0.0, 1.0] RandomScoreFunction previously relied on the order the documents were iterated in from Lucene. This caused changes in ordering, with the same seed, if documents moved to different segments. With this change, a murmur32 hash of the _uid for each document is used as the "random" value. Also, the hash is adjusted so as to only return values between 0.0 and 1.0 to enable easier manipulation to fit into users' scoring models. closes elastic#6907
@@ -160,7 +160,7 @@ | |||
|
|||
# Path to log files: | |||
# | |||
#path.logs: /path/to/logs | |||
path.logs: logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh! Yes, will remove.
Sorry about that @jpountz I have removed those extraneous changes. |
@@ -28,7 +28,7 @@ | |||
*/ | |||
public class RandomScoreFunctionBuilder implements ScoreFunctionBuilder { | |||
|
|||
private Long seed = null; | |||
private Integer seed = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a particular reason (which I fail to see obviously :-) why this is an object and not a regular int
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is to be able to do if (seed != null)
in the toXContent method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, I rest my case.. not sure if the noargs constructor makes sense, if that one was gone, the seed would never be empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just went with what was there. I think having the option to not supply the seed (ie you don'g care about reproducing, you just want some randomness) is a good option to keep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this was changed from a long to an int? In 1.4 I can no longer use this function because the seed I was using matches data that is only 64 bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diff looks good to me but I'm wondering that we should enable doc values by default on |
@jpountz That's the assumption that I was waiting on this fix for all this time. However, this still brings improvements that I believe are important to get into master (2 bug fixes and an simplification of return values). The caveat of course is the cost of pulling |
Then let's add some documentation to make clear that this feature relies on fielddata of the |
@jpountz I modified the docs for random_score a little to mention |
This is what I had in mind. Maybe also put a warning that it will load field data for the |
…urn values in rang [0.0, 1.0] RandomScoreFunction previously relied on the order the documents were iterated in from Lucene. This caused changes in ordering, with the same seed, if documents moved to different segments. With this change, a murmur32 hash of the _uid for each document is used as the "random" value. Also, the hash is adjusted so as to only return values between 0.0 and 1.0 to enable easier manipulation to fit into users' scoring models. closes #6907, #7446
…urn values in rang [0.0, 1.0] RandomScoreFunction previously relied on the order the documents were iterated in from Lucene. This caused changes in ordering, with the same seed, if documents moved to different segments. With this change, a murmur32 hash of the _uid for each document is used as the "random" value. Also, the hash is adjusted so as to only return values between 0.0 and 1.0 to enable easier manipulation to fit into users' scoring models. closes #6907, #7446
…urn values in rang [0.0, 1.0] RandomScoreFunction previously relied on the order the documents were iterated in from Lucene. This caused changes in ordering, with the same seed, if documents moved to different segments. With this change, a murmur32 hash of the _uid for each document is used as the "random" value. Also, the hash is adjusted so as to only return values between 0.0 and 1.0 to enable easier manipulation to fit into users' scoring models. closes #6907, #7446
RandomScoreFunction previously relied on the order the documents were
iterated in from Lucene. This caused changes in ordering, with the same
seed, if documents moved to different segments. With this change, a
murmur32 hash of the _uid for each document is used as the "random"
value. Also, the hash is adjusted so as to only return values between
0.0 and 1.0 to enable easier manipulation to fit into users' scoring
models.
closes #6907