Fixing questionable PNRG behavior #5613

pickypg · 2014-03-31T04:37:53Z

This strengthens and simplifies the PNRG used by random_score by more closely mirroring the Random.nextFloat() method, rather than a mix of that, nextInt and nextDouble. The docBase or docId are no longer used as they were biasing the result (particularly if it was 0, which consistently made it the highest scoring result in tests), which partially defeats the purpose of random scoring.

…fy its behavior. Closes elastic#5454 and elastic#5578

uboness · 2014-04-01T10:51:19Z

well, the reason why we added the doc id is to have better support for consistent pagination, so it guarantees consistent random number generation on a consistent point of view over the indexes.

pickypg · 2014-04-01T16:12:00Z

But, if you are allowing multiple indexes to invoke it, then isn't it going
to inherently be inconsistent because you will possibly not pick up where
you left off, thereby restarting further down the seed?

Personally, I think that having only the seed gives it a more consistent
flow, and the introducing the docBase + docId just adds the potential for
a strong bias in the case of 0, which must be substituted somehow, and
incidental bias thereafter, generally increasing as the sum grows as it
increases the likelyhood that the nextFloat's former xor operation (rand ^ seed) will have a much larger or much smaller number given a seed with
similar bits. I definitely noticed this behavior while adding the unit
tests, which led me to my current "fix".

On Tue, Apr 1, 2014 at 6:51 AM, uboness notifications@github.com wrote:

well, the reason why we added the doc id is to have better support for
consistent pagination, so it guarantees consistent random number generation
on a consistent point of view over the indexes.

Reply to this email directly or view it on GitHubhttps://github.com//pull/5613#issuecomment-39192401
.

pickypg · 2014-04-01T17:08:49Z

Thinking it through some more, the docId does provide something if two indexes invoke it at the same time since there is no synchronization of the seed update (thus returning the same value for them both without it). However, as long as they are separate invocations, then it's still a random result to both (ironically even if it repeats for one of them) without it, and I am still not convinced that the bias of the docId is superior.

Of course, I am not an expert on random number theory, so take that with a grain of salt.

uboness · 2014-04-01T20:56:28Z

yes, the idea initially was the be consistent with pagination across indices (where the starts at 0 for each index). But thinking about it more, indeed there are too many "if"s here... we can remove the id and stick to the normal PRNG implementation, and who ever wants consistent pagination can just use seed+scroll.

btw, the seed is already biased here as the original seed that the user sends (or the default current time millis) is merged with the shard id (which is a must so each shard will generate different random number per doc)

I'll work on your PR, thx!

pickypg · 2014-04-01T21:03:06Z

Works for me. The shard id modification of the seed makes perfect sense because otherwise there will be "unexpected" collisions within the merged results.

s1monw · 2014-04-02T09:05:21Z

@uboness I labeled this issue and assigned it to you - please feel free to re-label

uboness · 2014-04-02T09:09:02Z

thx @s1monw

Strengthening pseudo random number generator and adding tests to veri…

5693826

…fy its behavior. Closes elastic#5454 and elastic#5578

s1monw added v2.0.0 labels Apr 2, 2014

s1monw assigned uboness Apr 2, 2014

s1monw added the bug label Apr 2, 2014

jpountz removed the v1.1.1 label Apr 14, 2014

pickypg closed this Apr 28, 2014

pickypg deleted the feature/prng-no-doc-id-5578 branch April 28, 2014 17:46

clintongormley added v2.0.0-beta1 v1.2.0 :Query DSL and removed v1.2.0 v2.0.0-beta1 labels Jun 7, 2015

clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Query DSL labels Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing questionable PNRG behavior #5613

Fixing questionable PNRG behavior #5613

pickypg commented Mar 31, 2014

uboness commented Apr 1, 2014

pickypg commented Apr 1, 2014

pickypg commented Apr 1, 2014

uboness commented Apr 1, 2014

pickypg commented Apr 1, 2014

s1monw commented Apr 2, 2014

uboness commented Apr 2, 2014

Fixing questionable PNRG behavior #5613

Fixing questionable PNRG behavior #5613

Conversation

pickypg commented Mar 31, 2014

uboness commented Apr 1, 2014

pickypg commented Apr 1, 2014

pickypg commented Apr 1, 2014

uboness commented Apr 1, 2014

pickypg commented Apr 1, 2014

s1monw commented Apr 2, 2014

uboness commented Apr 2, 2014