Function Score: Refactor RandomScoreFunction to be consistent, and return values in range [0.0, 1.0] #7446

rjernst · 2014-08-25T23:27:34Z

RandomScoreFunction previously relied on the order the documents were
iterated in from Lucene. This caused changes in ordering, with the same
seed, if documents moved to different segments. With this change, a
murmur32 hash of the _uid for each document is used as the "random"
value. Also, the hash is adjusted so as to only return values between
0.0 and 1.0 to enable easier manipulation to fit into users' scoring
models.

closes #6907

…urn values in rang [0.0, 1.0] RandomScoreFunction previously relied on the order the documents were iterated in from Lucene. This caused changes in ordering, with the same seed, if documents moved to different segments. With this change, a murmur32 hash of the _uid for each document is used as the "random" value. Also, the hash is adjusted so as to only return values between 0.0 and 1.0 to enable easier manipulation to fit into users' scoring models. closes elastic#6907

jpountz · 2014-08-25T23:30:24Z

config/elasticsearch.yml

@@ -160,7 +160,7 @@

 # Path to log files:
 #
-#path.logs: /path/to/logs
+path.logs: logs


Doh! Yes, will remove.

rjernst · 2014-08-25T23:48:31Z

Sorry about that @jpountz I have removed those extraneous changes.

spinscale · 2014-08-26T06:42:08Z

...main/java/org/elasticsearch/index/query/functionscore/random/RandomScoreFunctionBuilder.java

@@ -28,7 +28,7 @@
 */
 public class RandomScoreFunctionBuilder implements ScoreFunctionBuilder {

-    private Long seed = null;
+    private Integer seed = null;


is there a particular reason (which I fail to see obviously :-) why this is an object and not a regular int?

I think it is to be able to do if (seed != null) in the toXContent method?

makes sense, I rest my case.. not sure if the noargs constructor makes sense, if that one was gone, the seed would never be empty

I just went with what was there. I think having the option to not supply the seed (ie you don'g care about reproducing, you just want some randomness) is a good option to keep.

Is there a reason this was changed from a long to an int? In 1.4 I can no longer use this function because the seed I was using matches data that is only 64 bits.

@harmsk This was due to using a 32 bit hash, however it was fixed so longs (as well as strings) work later in #8311

jpountz · 2014-08-26T08:15:42Z

The diff looks good to me but I'm wondering that we should enable doc values by default on _uid before merging such a change?

rjernst · 2014-08-26T14:54:49Z

@jpountz That's the assumption that I was waiting on this fix for all this time. However, this still brings improvements that I believe are important to get into master (2 bug fixes and an simplification of return values). The caveat of course is the cost of pulling _uid into field data, but I think that is just something to be improved upon in the future, rather than continuing with the current broken behavior.

jpountz · 2014-08-26T16:58:42Z

Then let's add some documentation to make clear that this feature relies on fielddata of the _uid field?

rjernst · 2014-08-27T03:16:01Z

@jpountz I modified the docs for random_score a little to mention uid. Let me know if that is what you had in mind, or if you wanted something more.

jpountz · 2014-08-27T11:37:27Z

This is what I had in mind. Maybe also put a warning that it will load field data for the _uid field (which can be very memory-intensive given that all values are unique). Otherwise LGTM (feel free to push without asking for further review from my end).

…urn values in rang [0.0, 1.0] RandomScoreFunction previously relied on the order the documents were iterated in from Lucene. This caused changes in ordering, with the same seed, if documents moved to different segments. With this change, a murmur32 hash of the _uid for each document is used as the "random" value. Also, the hash is adjusted so as to only return values between 0.0 and 1.0 to enable easier manipulation to fit into users' scoring models. closes #6907, #7446

rjernst mentioned this pull request Aug 25, 2014

Random score order changes on doc updates #6907

Closed

rjernst added bug and removed enhancement labels Aug 25, 2014

jpountz reviewed Aug 25, 2014
View reviewed changes

rjernst added 3 commits August 25, 2014 16:42

Remove unintended random changes

1f5884a

Remove unintended random changes

bc9991e

Remove unintended random changes

8557046

spinscale reviewed Aug 26, 2014
View reviewed changes

jpountz removed the review label Aug 26, 2014

Update random_score docs to mention _uid field is used

91b8e6d

rjernst added the review label Aug 27, 2014

jpountz removed the review label Aug 27, 2014

rjernst closed this Aug 27, 2014

clintongormley changed the title ~~FunctionScore: Refactor RandomScoreFunction to be consistent, and return values in range [0.0, 1.0]~~ Function Score: Refactor RandomScoreFunction to be consistent, and return values in range [0.0, 1.0] Sep 11, 2014

clintongormley added the >breaking label Oct 31, 2014

rjernst deleted the pr/6907 branch January 21, 2015 23:22

rjernst mentioned this pull request Feb 17, 2015

RandomScoreFunction generates numbers > 1.0 for 1.2 & 1.3 #9734

Closed

clintongormley added the :Query DSL label Jun 6, 2015

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Function Score: Refactor RandomScoreFunction to be consistent, and return values in range [0.0, 1.0] #7446

Function Score: Refactor RandomScoreFunction to be consistent, and return values in range [0.0, 1.0] #7446

rjernst commented Aug 25, 2014

jpountz Aug 25, 2014

rjernst Aug 25, 2014

rjernst commented Aug 25, 2014

spinscale Aug 26, 2014

jpountz Aug 26, 2014

spinscale Aug 26, 2014

rjernst Aug 26, 2014

harmsk Nov 17, 2014

rjernst Nov 18, 2014

jpountz commented Aug 26, 2014

rjernst commented Aug 26, 2014

jpountz commented Aug 26, 2014

rjernst commented Aug 27, 2014

jpountz commented Aug 27, 2014

Function Score: Refactor RandomScoreFunction to be consistent, and return values in range [0.0, 1.0] #7446

Function Score: Refactor RandomScoreFunction to be consistent, and return values in range [0.0, 1.0] #7446

Conversation

rjernst commented Aug 25, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjernst commented Aug 25, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jpountz commented Aug 26, 2014

rjernst commented Aug 26, 2014

jpountz commented Aug 26, 2014

rjernst commented Aug 27, 2014

jpountz commented Aug 27, 2014