Fix issue 5849 - Add a dice range type to std.random #1702

Zshazz · 2013-11-18T22:37:08Z

Fix for:
https://d.puremagic.com/issues/show_bug.cgi?id=5849

This pull provides a range interface to simulate dice rolls much like dice. The difference is that this diceRange can do some preprocessing to the proportions vector to make it much faster to do repeated rolls with the same die. The algorithm used is simply a cumulative sum array with binary search, so the implementation is straightforward.

A benefit from this type of implementation is that it is essentially equivalent to the current dice implementation when doubles are used. One of the test cases in the unittest covers that. This should allow a user to nearly drop-in replace dice calls and get precisely the same results if needed.

Some internals are exposed in the form of template parameters which can allow a user to optimize the dice rolls for their particular data.

See also (for a use case):
http://forum.dlang.org/thread/prkthuzsmbppjltvohdh@forum.dlang.org?page=5

monarchdodra · 2013-11-19T09:27:06Z

std/random.d

+        && isImplicitlyConvertible!(ElementType!InRange, StorageType))
+    in
+    {
+        enforce(!proportions.empty, "Proportions cannot be empty");


Preconditions should always assert or throw an Error, not Exception.

monarchdodra · 2013-11-19T09:34:00Z

Quick question: How does your DiceRange compare to dice for a single roll?

Is there any "initialization overhead"? If not, would it be reasonable to implement dice in terms of DiceRange (eg: auto dice(Args)(Args args){return diceRange(args).front();}).

Zshazz · 2013-11-19T12:57:04Z

They're both O(n). Both must sum up the entire array. The only difference is that diceRange will binary search the result after summing everything up. So while dice is ~(n + n/2) time, diceRange is ~(n + log n) time (which really doesn't matter that much).

Tests:
Under Windows 7, using -release -inline -O, and averaged over 1,000+ runs.

For proportions that are appropriate for dice (30, I can't think of a reason to only roll a die once with more than 30 proportions):
0.006612 ms for diceRange and 0.006503 ms for dice

For abnormally high proportions (500_000):
4.558 ms for diceRange and 11.503 ms for dice

So, indeed, it could be appropriate to implement dice in terms of DiceRange from a performance perspective.

However, dice and diceRange have incompatible argument orders (diceRange puts proportions first in order to allow UFCS chaining), so it would be slightly more complicated than what you've shown for that reason. In addition, diceRange's output will be different than dice when given integrals. diceRange will match data types internally to avoid the loss of precision that can occur with floating point types. This will drastically change the output because it will then use the integral version of uniform instead of the floating point version. However, it's possible to alleviate that concern by directly instantiating DiceRange.

Another problem is that dice does not enforce that its rnd parameter is a uniform RNG. So either we'd have to change the template constraint for dice or DiceRange/diceRange to make them compatible. I don't think anyone is using dice with something other than an RNG, though.

So, if we were to change it, it would need to be something like:

size_t dice(R, Range)(ref R rnd, Range proportions)
if (isInputRange!Range && isNumeric!(ElementType!Range) && !isArray!Range
    && isUniformRNG!R)
{
    return DiceRange!(SearchPolicy.binarySearch, double, Rng)(rnd, proportions).front;
}

to ensure that it doesn't change the current behavior (probably would set the StorageType to real for reals, though, so that would make it a bit more complicated). A unittest could be written to enforce that it doesn't change the result of a few example test cases (although I wouldn't keep the unittest around longer than it ensure they return the same results).

Of course, if it doesn't matter if it changes the results slightly (as long as they're still correct), then it'd be preferable to just forward to diceRange for simplicity's sake.

monarchdodra · 2013-11-22T21:27:27Z

@WebDrake , pinging you for your opinion about this.

WebDrake · 2013-11-23T11:45:54Z

General thoughts:

Could call the struct and helper functions RandomDie and randomDie respectively -- matches the naming conventions for the other ranges that wrap an RNG (RandomCover, RandomSample, etc.). You could just call them Die and die but that seems a bit short and the latter could be easy to mistype for dice. If we weren't dealing with existing code, I'd suggest that the range and helper functions be called Dice and dice and the individual-roll-of-a-die function be called diceRoll.
the use of a pointer _rng* to the internal RNG will fail in the general case. Consider e.g.

auto foo()
{
    Random rng;  // will be deleted on exit of this scope
    return diceRange([0.5, 0.5], rng); // when we try and use this, it'll fall over
}

This is an unsolvable problem until we implement the long-desired RNGs-as-reference-types solution.

As things stand we should not reimplement dice() in terms of the dice range, because of the above issue. But that's the only reason. We should however consider tweaking dice() to also use binary search.
Because of the no-reference-type-RNG issue, there will be a problem with initializing front in the constructor (you won't see it with the code as-is but you will see it with the code as it needs to be to avoid the pointer-to-deallocated-memory problem). You'll run into the same-front-across-multiple-instances issue that we saw in RandomSample and RandomCover. The only way round this that I can see is to have checks in front as to whether an initial value has been generated.
The arguments for the constructor should match that for dice(), which I don't think were chosen by accident. The design for dice() allows you to call it in different ways:

dice(rng, 5, 5, 10, 5);    // proportions as individual parameters
dice(rng, [5, 5, 10, 5]);   // proportions as elements of an input range

A dice range should be similarly flexible. N.B. for std.random2 we might like to implement a design principle, "RNG comes first in the parameter list if you pass it at all" :-)

Bottom line: ANY functionality like this that wraps an underlying RNG instance is going to be crippled until we do std.random2. This looks like very nice code and it makes me feel very bad that we are going to have to ask for it to be made inferior in order for it to actually work with the existing std.random.

Nice side-benefit: the design principle at work here is going to be very useful in my Dgraph library. Quite a few algorithms for growing complex networks revolve around selecting nodes with a probability proportional to their degree. So thank you, Zshazz! :-)

Zshazz · 2013-11-23T19:32:51Z

Thanks for your comments.
Just to make sure I got everything:

Change the name to randomDie (definitely agree here)
Change _rng to value type instead of pointer. A bit of work will need to be done to use the default rng without copying it, but it shouldn't be a problem. I'll just use some static ifs and directly use it when no Rng type is provided.
I'll create a new PR for a dice with binary search shortly. Do I need to open a bug report for the enhancement prior to submitting the PR? (I don't think there is a "dice is not as fast as it could be" bug report)
Change the argument order to match the current dice. I don't particularly agree with this change (I'll do it anyway, however). I actually had it the way you suggested at first (as the forum post I linked shows) but changed it after reading bearophile's bug report and thinking it over some. It may be consistent with dice (and allow for individual parameters) but it's inconsistent with other ranges and it'll interfere with UFCS chaining as shown in the examples in the docs. It'll also introduce inconsistency in user code because many might (probably will) use UFCS when no rng is given but be unable to when an rng is provided. I'll still do it, but I just thought it was something to mention.

Anyway, I'm glad there will be some benefit to this. I know I'll be using it for a few things on occasion (I've used dice in the past in cases where this would have been more appropriate)

Using "RandomDie" instead of DiceRange makes it more consistent with the names of other ranges wrapping RNG (such as RandomCover and RandomSample) in std.random.

InputTypes, a list of all valid input types, was used in two different tests but was not shared. The refactoring moved it up a scope and allowed it to be shared instead of duplicated. A test to check that dice rolls didn't always return the same was added. This was something that was missing previously and might be useful to assure the validity of future changes.

This change makes randomDie consistent with other random ranges w.r.t. storing the rng as a value instead of as a pointer. This will prevent issues arising in code such as this: auto foo() { Random rng; // will be deleted on exit of this scope return randomDie([0.5, 0.5], rng); // using later will error } Additionally, the documentation has been updated to display a warning similar to the warnings given to randomCover and randomSample to alert users of the common pitfall of using value-based rngs. Finally, a test assuring that using the default rng is safe from this pitfall has been added.

Changing the order of the arguments makes randomDice more consistent with dice and allows randomDice to have a typesafe variadic form, such as allowing for: auto list = ["somewhat common", "very common", "rare"]; auto choices = randomDie(25, 50, 1).map!(e => list[e]) .take(1000).array(); The documentation has been updated to reflect the changes and the new feature and a unittest has been written to confirm its functionality.

Zshazz · 2013-11-24T08:41:43Z

std/random.d

+                    static assert(0);
+            }
+
+            e = accumulator;


This fails when storage type is immutable or const real (which can happen when an immutable or const real range is provided with the way that RandomDieStorageFor works). Will fix & improve tests in next commit push.

I might actually change StorageType to be immutable by default and just use assumeUnique when actually stored (since there is no reason to keep the cumulative sum array mutable after it's made).

Previously, a bug concerning immutable reals slipped through the cracks. The unittests now test against const and immutable types to cover those kinds of cases. Example of code that would fail to compile: immutable real[] props = [0.0, 1.0]; auto dicer = randomDie(props); To make the unittest pass, a simple fix was also implemented in the way that the StorageType is chosen by default.

WebDrake · 2013-11-24T11:59:06Z

Change _rng to value type instead of pointer. A bit of work will need to be done to use the default rng without copying it, but it shouldn't be a problem. I'll just use some static ifs and directly use it when no Rng type is provided.

Yup, you can see exactly how to do this in RandomCover and RandomSample. It's an annoyance but necessary with the current std.random.

I'll create a new PR for a dice with binary search shortly. Do I need to open a bug report for the enhancement prior to submitting the PR? (I don't think there is a "dice is not as fast as it could be" bug report)

You don't need to but it's a good idea as the issue report is used to generate the change list for new releases.

Change the argument order to match the current dice. I don't particularly agree with this change (I'll do it anyway, however). I actually had it the way you suggested at first (as the forum post I linked shows) but changed it after reading bearophile's bug report and thinking it over some. It may be consistent with dice (and allow for individual parameters) but it's inconsistent with other ranges and it'll interfere with UFCS chaining as shown in the examples in the docs.

I hadn't particularly thought it through along those lines, so if you want to change it back to allow for UFCS chaining etc., I'd be happy with that.

This makes it so that randomDie explicitly supports Ranges that have qualified ElementTypes (e.g. immutable, const, shared). In addition it changes the way that RandomDie stores the cumulative array internally to be immutable (since it will never change after being made). This change would make it so that qualified StorageTypes are equivalent to their unqualified forms, so to make that clear only unqualified StorageTypes are accepted. The documentation and unittests for RandomDie have been updated to reflect this change.

This removes the variadic argument feature and restores the original order of the arguments to improve the usage of randomDie as a range. This allows for consistent UFCS usage of randomDie.

WebDrake · 2014-03-17T21:35:30Z

A note here -- this range is effectively identical to what in Boost/C++11 is called discrete_distribution.

This is adapted from Chris Cain's RandomDie submission to Phobos in dlang/phobos#1702 The alternative name is chosen by analogy to Boost.Random and the C++11 standard.

quickfur · 2014-07-17T16:56:48Z

ping
Any updates since last comments?

mihails-strasuns · 2014-07-17T20:30:04Z

Looks to me like all comments has been addressed. @WebDrake I will trust your judgement, is it ready to merge?

WebDrake · 2014-07-17T21:51:45Z

I have mixed feelings. To be honest I think that I feel uncomfortable about the idea of introducing new RNG-wrapping ranges to std.random because of the known issues when they are initialized with specified RNG instances (the whole copying-RNG-by-value thing).

Let me remind myself of the details and compare to what's currently in hap.random. I don't know how motivated @Zshazz is to do any more work on this, but if we're going to merge this it might be worth pulling in a few of the tweaks I made (including naming it DiscreteDistribution).

mihails-strasuns · 2014-07-17T22:03:17Z

because of the known issues when they are initialized with specified RNG instances (the whole copying-RNG-by-value thing

This shouldn't bother you. As long as it is consistent with existing std.random stuff and useful it is worth merging until hap can be reviewed as possible replacement.

WebDrake · 2014-07-17T22:03:34Z

Should add that I'm also happy to do that work myself -- I'll see if I can give it some time at the weekend.

WebDrake · 2014-07-17T22:08:11Z

This shouldn't bother you. As long as it is consistent with existing std.random stuff and useful

OK, but it will have to be carefully documented with warnings about its points of failure. As I said, I'll see if I can take a look at the weekend and propose some revisions.

quickfur · 2014-08-05T17:44:44Z

ping @WebDrake

WebDrake · 2014-08-11T23:17:16Z

I'm really sorry for the delay, it's been a very busy period. Will try and get onto this soon.

andralex · 2014-10-19T15:50:17Z

ping

mihails-strasuns · 2014-11-11T18:59:48Z

Considering @WebDrake won't be available for a long time and @Zshazz is not responding to pings I am closing this. Feel free to rescue steal the PR.

Fix issue 5849 - Add a dice range type to std.random

bba61c6

monarchdodra reviewed Nov 19, 2013
View reviewed changes

Zshazz added 2 commits November 22, 2013 13:46

Change precondition of std.random.DiceRange constructor to assert

9ae63ba

Fix type in internal documentation for diceRange

7e7a931

Zshazz added 4 commits November 23, 2013 18:34

Change name of DiceRange to RandomDie

e812a95

Using "RandomDie" instead of DiceRange makes it more consistent with the names of other ranges wrapping RNG (such as RandomCover and RandomSample) in std.random.

Zshazz reviewed Nov 24, 2013
View reviewed changes

Zshazz added 2 commits November 24, 2013 17:10

Change back ordering of arguments for RandomDie

33e6265

This removes the variadic argument feature and restores the original order of the arguments to improve the usage of randomDie as a range. This allows for consistent UFCS usage of randomDie.

Zshazz mentioned this pull request Nov 25, 2013

Fix issue 11597 - Speed up std.random.dice #1716

Closed

mihails-strasuns added the needs review label Jul 29, 2014

mihails-strasuns closed this Nov 11, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue 5849 - Add a dice range type to std.random #1702

Fix issue 5849 - Add a dice range type to std.random #1702

Zshazz commented Nov 18, 2013

monarchdodra Nov 19, 2013

monarchdodra commented Nov 19, 2013

Zshazz commented Nov 19, 2013

monarchdodra commented Nov 22, 2013

WebDrake commented Nov 23, 2013

Zshazz commented Nov 23, 2013

Zshazz Nov 24, 2013

WebDrake commented Nov 24, 2013

WebDrake commented Mar 17, 2014

quickfur commented Jul 17, 2014

mihails-strasuns commented Jul 17, 2014

WebDrake commented Jul 17, 2014

mihails-strasuns commented Jul 17, 2014

WebDrake commented Jul 17, 2014

WebDrake commented Jul 17, 2014

quickfur commented Aug 5, 2014

WebDrake commented Aug 11, 2014

andralex commented Oct 19, 2014

mihails-strasuns commented Nov 11, 2014

Fix issue 5849 - Add a dice range type to std.random #1702

Fix issue 5849 - Add a dice range type to std.random #1702

Conversation

Zshazz commented Nov 18, 2013

monarchdodra Nov 19, 2013

Choose a reason for hiding this comment

monarchdodra commented Nov 19, 2013

Zshazz commented Nov 19, 2013

monarchdodra commented Nov 22, 2013

WebDrake commented Nov 23, 2013

Zshazz commented Nov 23, 2013

Zshazz Nov 24, 2013

Choose a reason for hiding this comment

WebDrake commented Nov 24, 2013

WebDrake commented Mar 17, 2014

quickfur commented Jul 17, 2014

mihails-strasuns commented Jul 17, 2014

WebDrake commented Jul 17, 2014

mihails-strasuns commented Jul 17, 2014

WebDrake commented Jul 17, 2014

WebDrake commented Jul 17, 2014

quickfur commented Aug 5, 2014

WebDrake commented Aug 11, 2014

andralex commented Oct 19, 2014

mihails-strasuns commented Nov 11, 2014