Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issue 5849 - Add a dice range type to std.random #1702

Closed
wants to merge 10 commits into from
Closed

Fix issue 5849 - Add a dice range type to std.random #1702

wants to merge 10 commits into from

Conversation

Zshazz
Copy link
Contributor

@Zshazz Zshazz commented Nov 18, 2013

Fix for:
https://d.puremagic.com/issues/show_bug.cgi?id=5849

This pull provides a range interface to simulate dice rolls much like dice. The difference is that this diceRange can do some preprocessing to the proportions vector to make it much faster to do repeated rolls with the same die. The algorithm used is simply a cumulative sum array with binary search, so the implementation is straightforward.

A benefit from this type of implementation is that it is essentially equivalent to the current dice implementation when doubles are used. One of the test cases in the unittest covers that. This should allow a user to nearly drop-in replace dice calls and get precisely the same results if needed.

Some internals are exposed in the form of template parameters which can allow a user to optimize the dice rolls for their particular data.

See also (for a use case):
http://forum.dlang.org/thread/prkthuzsmbppjltvohdh@forum.dlang.org?page=5

&& isImplicitlyConvertible!(ElementType!InRange, StorageType))
in
{
enforce(!proportions.empty, "Proportions cannot be empty");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preconditions should always assert or throw an Error, not Exception.

@monarchdodra
Copy link
Collaborator

Quick question: How does your DiceRange compare to dice for a single roll?

Is there any "initialization overhead"? If not, would it be reasonable to implement dice in terms of DiceRange (eg: auto dice(Args)(Args args){return diceRange(args).front();}).

@Zshazz
Copy link
Contributor Author

Zshazz commented Nov 19, 2013

They're both O(n). Both must sum up the entire array. The only difference is that diceRange will binary search the result after summing everything up. So while dice is ~(n + n/2) time, diceRange is ~(n + log n) time (which really doesn't matter that much).

Tests:
Under Windows 7, using -release -inline -O, and averaged over 1,000+ runs.

For proportions that are appropriate for dice (30, I can't think of a reason to only roll a die once with more than 30 proportions):
0.006612 ms for diceRange and 0.006503 ms for dice

For abnormally high proportions (500_000):
4.558 ms for diceRange and 11.503 ms for dice

So, indeed, it could be appropriate to implement dice in terms of DiceRange from a performance perspective.

However, dice and diceRange have incompatible argument orders (diceRange puts proportions first in order to allow UFCS chaining), so it would be slightly more complicated than what you've shown for that reason. In addition, diceRange's output will be different than dice when given integrals. diceRange will match data types internally to avoid the loss of precision that can occur with floating point types. This will drastically change the output because it will then use the integral version of uniform instead of the floating point version. However, it's possible to alleviate that concern by directly instantiating DiceRange.

Another problem is that dice does not enforce that its rnd parameter is a uniform RNG. So either we'd have to change the template constraint for dice or DiceRange/diceRange to make them compatible. I don't think anyone is using dice with something other than an RNG, though.

So, if we were to change it, it would need to be something like:

size_t dice(R, Range)(ref R rnd, Range proportions)
if (isInputRange!Range && isNumeric!(ElementType!Range) && !isArray!Range
    && isUniformRNG!R)
{
    return DiceRange!(SearchPolicy.binarySearch, double, Rng)(rnd, proportions).front;
}

to ensure that it doesn't change the current behavior (probably would set the StorageType to real for reals, though, so that would make it a bit more complicated). A unittest could be written to enforce that it doesn't change the result of a few example test cases (although I wouldn't keep the unittest around longer than it ensure they return the same results).

Of course, if it doesn't matter if it changes the results slightly (as long as they're still correct), then it'd be preferable to just forward to diceRange for simplicity's sake.

@monarchdodra
Copy link
Collaborator

@WebDrake , pinging you for your opinion about this.

@WebDrake
Copy link
Contributor

General thoughts:

  • Could call the struct and helper functions RandomDie and randomDie respectively -- matches the naming conventions for the other ranges that wrap an RNG (RandomCover, RandomSample, etc.). You could just call them Die and die but that seems a bit short and the latter could be easy to mistype for dice. If we weren't dealing with existing code, I'd suggest that the range and helper functions be called Dice and dice and the individual-roll-of-a-die function be called diceRoll.
  • the use of a pointer _rng* to the internal RNG will fail in the general case. Consider e.g.
auto foo()
{
    Random rng;  // will be deleted on exit of this scope
    return diceRange([0.5, 0.5], rng); // when we try and use this, it'll fall over
}

This is an unsolvable problem until we implement the long-desired RNGs-as-reference-types solution.

  • As things stand we should not reimplement dice() in terms of the dice range, because of the above issue. But that's the only reason. We should however consider tweaking dice() to also use binary search.
  • Because of the no-reference-type-RNG issue, there will be a problem with initializing front in the constructor (you won't see it with the code as-is but you will see it with the code as it needs to be to avoid the pointer-to-deallocated-memory problem). You'll run into the same-front-across-multiple-instances issue that we saw in RandomSample and RandomCover. The only way round this that I can see is to have checks in front as to whether an initial value has been generated.
  • The arguments for the constructor should match that for dice(), which I don't think were chosen by accident. The design for dice() allows you to call it in different ways:
dice(rng, 5, 5, 10, 5);    // proportions as individual parameters
dice(rng, [5, 5, 10, 5]);   // proportions as elements of an input range

A dice range should be similarly flexible. N.B. for std.random2 we might like to implement a design principle, "RNG comes first in the parameter list if you pass it at all" :-)

Bottom line: ANY functionality like this that wraps an underlying RNG instance is going to be crippled until we do std.random2. This looks like very nice code and it makes me feel very bad that we are going to have to ask for it to be made inferior in order for it to actually work with the existing std.random.

Nice side-benefit: the design principle at work here is going to be very useful in my Dgraph library. Quite a few algorithms for growing complex networks revolve around selecting nodes with a probability proportional to their degree. So thank you, Zshazz! :-)

@Zshazz
Copy link
Contributor Author

Zshazz commented Nov 23, 2013

Thanks for your comments.
Just to make sure I got everything:

  • Change the name to randomDie (definitely agree here)
  • Change _rng to value type instead of pointer. A bit of work will need to be done to use the default rng without copying it, but it shouldn't be a problem. I'll just use some static ifs and directly use it when no Rng type is provided.
  • I'll create a new PR for a dice with binary search shortly. Do I need to open a bug report for the enhancement prior to submitting the PR? (I don't think there is a "dice is not as fast as it could be" bug report)
  • Change the argument order to match the current dice. I don't particularly agree with this change (I'll do it anyway, however). I actually had it the way you suggested at first (as the forum post I linked shows) but changed it after reading bearophile's bug report and thinking it over some. It may be consistent with dice (and allow for individual parameters) but it's inconsistent with other ranges and it'll interfere with UFCS chaining as shown in the examples in the docs. It'll also introduce inconsistency in user code because many might (probably will) use UFCS when no rng is given but be unable to when an rng is provided. I'll still do it, but I just thought it was something to mention.

Anyway, I'm glad there will be some benefit to this. I know I'll be using it for a few things on occasion (I've used dice in the past in cases where this would have been more appropriate)

Using "RandomDie" instead of DiceRange makes it more consistent with
the names of other ranges wrapping RNG (such as RandomCover and
RandomSample) in std.random.
InputTypes, a list of all valid input types, was used in two different
tests but was not shared. The refactoring moved it up a scope and
allowed it to be shared instead of duplicated.

A test to check that dice rolls didn't always return the same was
added. This was something that was missing previously and might be
useful to assure the validity of future changes.
This change makes randomDie consistent with other random ranges w.r.t.
storing the rng as a value instead of as a pointer. This will prevent
issues arising in code such as this:

    auto foo()
    {
        Random rng;  // will be deleted on exit of this scope
        return randomDie([0.5, 0.5], rng); // using later will error
    }

Additionally, the documentation has been updated to display a warning
similar to the warnings given to randomCover and randomSample to alert
users of the common pitfall of using value-based rngs.

Finally, a test assuring that using the default rng is safe from this
pitfall has been added.
Changing the order of the arguments makes randomDice more consistent
with dice and allows randomDice to have a typesafe variadic form,
such as allowing for:

    auto list = ["somewhat common", "very common", "rare"];
    auto choices = randomDie(25, 50, 1).map!(e => list[e])
                        .take(1000).array();

The documentation has been updated to reflect the changes and the new
feature and a unittest has been written to confirm its functionality.
static assert(0);
}

e = accumulator;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails when storage type is immutable or const real (which can happen when an immutable or const real range is provided with the way that RandomDieStorageFor works). Will fix & improve tests in next commit push.

I might actually change StorageType to be immutable by default and just use assumeUnique when actually stored (since there is no reason to keep the cumulative sum array mutable after it's made).

Previously, a bug concerning immutable reals slipped through the
cracks. The unittests now test against const and immutable types to
cover those kinds of cases. Example of code that would fail to compile:

    immutable real[] props = [0.0, 1.0];
    auto dicer = randomDie(props);

To make the unittest pass, a simple fix was also implemented in the
way that the StorageType is chosen by default.
@WebDrake
Copy link
Contributor

Change _rng to value type instead of pointer. A bit of work will need to be done to use the default rng without copying it, but it shouldn't be a problem. I'll just use some static ifs and directly use it when no Rng type is provided.

Yup, you can see exactly how to do this in RandomCover and RandomSample. It's an annoyance but necessary with the current std.random.

I'll create a new PR for a dice with binary search shortly. Do I need to open a bug report for the enhancement prior to submitting the PR? (I don't think there is a "dice is not as fast as it could be" bug report)

You don't need to but it's a good idea as the issue report is used to generate the change list for new releases.

Change the argument order to match the current dice. I don't particularly agree with this change (I'll do it anyway, however). I actually had it the way you suggested at first (as the forum post I linked shows) but changed it after reading bearophile's bug report and thinking it over some. It may be consistent with dice (and allow for individual parameters) but it's inconsistent with other ranges and it'll interfere with UFCS chaining as shown in the examples in the docs.

I hadn't particularly thought it through along those lines, so if you want to change it back to allow for UFCS chaining etc., I'd be happy with that.

This makes it so that randomDie explicitly supports Ranges that have
qualified ElementTypes (e.g. immutable, const, shared). In addition it
changes the way that RandomDie stores the cumulative array internally
to be immutable (since it will never change after being made).

This change would make it so that qualified StorageTypes are equivalent
to their unqualified forms, so to make that clear only unqualified
StorageTypes are accepted. The documentation and unittests for
RandomDie have been updated to reflect this change.
This removes the variadic argument feature and restores the original
order of the arguments to improve the usage of randomDie as a range.
This allows for consistent UFCS usage of randomDie.
@WebDrake
Copy link
Contributor

A note here -- this range is effectively identical to what in Boost/C++11 is called discrete_distribution.

WebDrake added a commit to WebDrake/hap that referenced this pull request Mar 18, 2014
This is adapted from Chris Cain's RandomDie submission to Phobos
in dlang/phobos#1702

The alternative name is chosen by analogy to Boost.Random and the
C++11 standard.
WebDrake added a commit to WebDrake/hap that referenced this pull request Mar 19, 2014
This is adapted from Chris Cain's RandomDie submission to Phobos
in dlang/phobos#1702

The alternative name is chosen by analogy to Boost.Random and the
C++11 standard.
@quickfur
Copy link
Member

ping
Any updates since last comments?

@mihails-strasuns
Copy link

Looks to me like all comments has been addressed. @WebDrake I will trust your judgement, is it ready to merge?

@WebDrake
Copy link
Contributor

I have mixed feelings. To be honest I think that I feel uncomfortable about the idea of introducing new RNG-wrapping ranges to std.random because of the known issues when they are initialized with specified RNG instances (the whole copying-RNG-by-value thing).

Let me remind myself of the details and compare to what's currently in hap.random. I don't know how motivated @Zshazz is to do any more work on this, but if we're going to merge this it might be worth pulling in a few of the tweaks I made (including naming it DiscreteDistribution).

@mihails-strasuns
Copy link

because of the known issues when they are initialized with specified RNG instances (the whole copying-RNG-by-value thing

This shouldn't bother you. As long as it is consistent with existing std.random stuff and useful it is worth merging until hap can be reviewed as possible replacement.

@WebDrake
Copy link
Contributor

Should add that I'm also happy to do that work myself -- I'll see if I can give it some time at the weekend.

@WebDrake
Copy link
Contributor

This shouldn't bother you. As long as it is consistent with existing std.random stuff and useful

OK, but it will have to be carefully documented with warnings about its points of failure. As I said, I'll see if I can take a look at the weekend and propose some revisions.

@quickfur
Copy link
Member

quickfur commented Aug 5, 2014

ping @WebDrake

@WebDrake
Copy link
Contributor

I'm really sorry for the delay, it's been a very busy period. Will try and get onto this soon.

@andralex
Copy link
Member

ping

@mihails-strasuns
Copy link

Considering @WebDrake won't be available for a long time and @Zshazz is not responding to pings I am closing this. Feel free to rescue steal the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants