Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent behaviour of randomSample depending on whether a random number generator is specified #9591

Open
dlangBugzillaToGithub opened this issue Jun 14, 2012 · 6 comments

Comments

@dlangBugzillaToGithub
Copy link

joseph.wakeling (@WebDrake) reported this on 2012-06-14T12:27:53Z

Transfered from https://issues.dlang.org/show_bug.cgi?id=8247

CC List

Description

Created attachment 1116
Working minimal example illustrating the inconsistencies described.

The randomSample function in std.random can be called with or without
specifying a random number generator to use.

If no RNG is specified, then each lazy evaluation of the sample evaluates
differently, i.e. if you do

   sample1 = randomSample(iota(0, 100), 5);
   writeln(sample1);
   writeln(sample1);
   writeln(sample1);

you will get 3 different samples.

Conversely, if a random number generator is specified, you will get 3 times the
same result:

   sample2 = randomSample(iota(0, 100), 5, Random(unpredictableSeed));
   writeln(sample2);
   writeln(sample2);
   writeln(sample2);

Note that the seeding of the RNG is important, because if an already-existing
RNG is provided to create multiple different samples, they will evaluate
identically, e.g.

   sample3 = randomSample(iota(0, 100), 5, rndGen);
   writeln(sample3);
   sample4 = randomSample(iota(0, 100), 5, rndGen);
   writeln(sample4);
   sample5 = randomSample(iota(0, 100), 5, rndGen);
   writeln(sample5);

... will produce the same output 3 times.  This happens because the RNG passed
to randomSample is copied rather than used by reference.

These inconsistencies lead to a lot of potential confusion and sources of bugs.
 So, first of all, we need a firm decision on how the lazy evaluation of
RandomSample should behave -- should it

  (1) always evaluate to the same sample, or

  (2) always evaluate to a different sample?

... and depending on the answer, we then need to address how to specify and
seed an RNG for RandomSample.

!!!There are attachements in the bugzilla issue that have not been copied over!!!

@dlangBugzillaToGithub
Copy link
Author

joseph.wakeling commented on 2012-06-14T12:35:44Z

Online discussion on this: http://forum.dlang.org/thread/4FD735EB.70404@webdrake.net

@dlangBugzillaToGithub
Copy link
Author

jens.k.mueller commented on 2012-06-14T13:41:13Z

I opt for the returning the same sample (option 1). I want the sample to stay the same.

@dlangBugzillaToGithub
Copy link
Author

issues.dlang (@jmdavis) commented on 2012-06-14T15:27:33Z

If you want randomSample to be consistent as to which you get, it needs to be made to handle both reference and value type random number generating ranges identically, since they could be either. At present, all of those in std.random are value types, which is actually a problem in general. They really should reference types. But regardless of which they are, there's nothing stopping someone from implementing either a value type or reference type range which is a random number generator, in which case you'll get inconsistent behavior if randomSample doesn't code for using both by using save where appropriate.

@dlangBugzillaToGithub
Copy link
Author

jens.k.mueller commented on 2012-06-15T00:54:20Z

Now I see why you want to pass RNG by reference. Because you may want that two functions share the same generator.
But then I would go with passing them all by reference for consistency reasons. And all functions have as default argument rndGen() which could be renamed to defaultRNG().
randomShuffle is already doing it this way. Though I don't see why it sets the template argument RandomGen to Random by default. This should be inferred automatically by the default argument rndGen() anyway.
So randomCover and randomSample should follow the same approach.

I do not see why one needs to pass a RNG by value then. Admittedly I have never used std.random. So I may have wrong use cases in mind.
But having a thread local RNG that is used by default should be okay.

@Jonathan
Why should a RNG type have reference semantics? I think it's fine to pass them by reference where needed.

@dlangBugzillaToGithub
Copy link
Author

issues.dlang (@jmdavis) commented on 2012-06-15T01:05:40Z

> Why should a RNG type have reference semantics? I think it's fine to pass them
by reference where needed.

Because it makes no sense for it to have value semantics. Take this for example

auto func(R)(R r)
{
 r.popFront();
 auto var1 = r.front;
 ///...
}

func(generator);
generator.popFront();
auto var2 = generator.front;

Both var1 and var2 will have the exact same value. This is an easy mistake to make, and since random number generators are supposed to be returning random numbers, having them return the _same_ number after popFront has been called is definitely problematic.

By making them reference types, the only time that you get the same number multiple times in a row is when you do it on purpose (e.g. by storing the value of front or by calling save on the range).

There's a discussion on this in issue# 7067.

@dlangBugzillaToGithub
Copy link
Author

jens.k.mueller commented on 2012-06-15T01:22:31Z

(In reply to comment #5)
> > Why should a RNG type have reference semantics? I think it's fine to pass them
> by reference where needed.
> 
> Because it makes no sense for it to have value semantics. Take this for example
> 
> auto func(R)(R r)
> {
>  r.popFront();
>  auto var1 = r.front;
>  ///...
> }
> 
> func(generator);
> generator.popFront();
> auto var2 = generator.front;
> 
> Both var1 and var2 will have the exact same value. This is an easy mistake to
> make, and since random number generators are supposed to be returning random
> numbers, having them return the _same_ number after popFront has been called is
> definitely problematic.
> 
> By making them reference types, the only time that you get the same number
> multiple times in a row is when you do it on purpose (e.g. by storing the value
> of front or by calling save on the range).
> 
> There's a discussion on this in issue# 7067.

I see. Thanks.
Since passing around RNGs should be by default by reference RNGs should be reference types. Otherwise everybody writing own functions accepting RNGs has to use ref which is error-prone.
Using ref when passing RNGs in std.random won't solve this general design issue.

@LightBender LightBender removed the P3 label Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants