Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std.random.dice is better as a range #9901

Open
dlangBugzillaToGithub opened this issue Apr 16, 2011 · 11 comments
Open

std.random.dice is better as a range #9901

dlangBugzillaToGithub opened this issue Apr 16, 2011 · 11 comments

Comments

@dlangBugzillaToGithub
Copy link

bearophile_hugs reported this on 2011-04-16T18:06:59Z

Transfered from https://issues.dlang.org/show_bug.cgi?id=5849

CC List

Description

When I have to use std.random.dice the frequencies don't change often, but I usually have to compute many random values efficiently. Calling dice() many times is not efficient, because it performs some preprocessing of the given frequencies.

So I suggest to replace the function dice(xxx) with a range that generates the results.

So the current usage of:
dice(70, 20, 10)
gets replaced by:
dice(70, 20, 10).front

And you are able to write:

take(dice(70, 20, 10), 5)

The good thing of this generator is its performance compared to the function. See the performance difference between the two following implementations (5.3 seconds for the first version and 0.7 seconds for the second one).

------------------------

import std.stdio, std.random, std.string;

void main() {
  enum int N = 10_000_000;
  enum pr = [1/5., 1/6., 1/7., 1/8., 1/9., 1/10., 1/11., 1759/27720.];

  double[pr.length] counts = 0.0;
  foreach (i; 0 .. N)
    counts[dice(pr)]++;

  foreach (i, p; pr)
    writefln("%.8f   %.8f", p, counts[i] / N);
}

------------------------

import std.stdio, std.random, std.string;

void main() {
  enum int N = 10_000_000;
  enum pr = [1/5., 1/6., 1/7., 1/8., 1/9., 1/10., 1/11., 1759/27720.];
  double[pr.length] cumulatives = pr[];
  foreach (i, ref c; cumulatives[1 .. $-1])
    c += cumulatives[i];
  cumulatives[$-1] = 1.0;

  double[pr.length] counts = 0.0;
  auto rnd = Xorshift(unpredictableSeed());
  foreach (i; 0 .. N) {
    double rnum = rnd.front() / cast(double)typeof(rnd.front()).max;
    rnd.popFront();
    int j;
    for ( ; rnum > cumulatives[j]; j++) {}
    counts[j]++;
  }

  foreach (i, p; pr)
    writefln("%.8f   %.8f", p, counts[i] / N);
}

-------------------------

See also some other improvements I have suggested for std.random.

!!!There are attachements in the bugzilla issue that have not been copied over!!!

@dlangBugzillaToGithub
Copy link
Author

bearophile_hugs commented on 2011-04-24T14:26:40Z

See also issue 5883

@dlangBugzillaToGithub
Copy link
Author

bearophile_hugs commented on 2011-12-29T15:37:50Z

Created attachment 1060
First draft of a fast dice range

D code adapted from Java code:
http://www.keithschwarz.com/interesting/code/?dir=alias-method

http://www.keithschwarz.com/darts-dice-coins/

http://www.reddit.com/r/programming/comments/nu78s/darts_dice_and_coins_a_beautiful_algorithm_for/

@dlangBugzillaToGithub
Copy link
Author

andrei (@andralex) commented on 2011-12-29T16:00:18Z

You may want to package contributions as pull requests. Thanks!

@dlangBugzillaToGithub
Copy link
Author

bearophile_hugs commented on 2012-01-04T14:58:04Z

Created attachment 1066
Version 0.5 of the fast dice

> You may want to package contributions as pull requests. Thanks!

Thank you. I think the current code is not yet fit for a pull request, there are some things to be done/solved first:

- Create statistical unittests.
- Is popLast() overkill/not useful enough?
- Is the Range correct now?
- Is uint[] enough for mAlias?
- ddoc comments are missing still.
- Currently the Dice constructor lacks a pre-condition: I think the input
  probabilities must sum to 1.0. But is this necessary? An alternative
  design is to accept any array of numbers (integers too), and normalize
  their sum to 1.0 inside the Dice constructor itself.

@dlangBugzillaToGithub
Copy link
Author

bearophile_hugs commented on 2012-05-28T19:19:55Z

See also:
https://github.com/D-Programming-Language/phobos/pull/553

@dlangBugzillaToGithub
Copy link
Author

bearophile_hugs commented on 2012-07-01T22:09:25Z

Probably partially done here:

https://github.com/D-Programming-Language/phobos/pull/553

https://github.com/D-Programming-Language/phobos/commit/a47707a19796c41a9cb77ae9ae09605afd56fa6f

@dlangBugzillaToGithub
Copy link
Author

bearophile_hugs commented on 2013-08-31T17:44:25Z

If we don't want to break the API of dice() making it a range, a simple solution is to call the dice range "diceRange" and keep both (but perhaps dice becomes a small function that returns diceRange.front).

Another interesting suggestion: in my code I've seen that most times I want to use a dice/diceRange I have to map its result on the items of an array:


import std.stdio, std.random;
void main() {
    immutable probabilities = [0.2, 0.6, 0.1, 0.1];
    immutable values = "ACGT";
    foreach (_; 0 .. 10)
        values[probabilities.dice].write;
}


That outputs something similar to:
CGCCAAACCC


So an improvement for diceRange() is to accept as first argument the range of probabilities, and as second optional argument a range of items. If the second argument is not given, then it generates integers as dice(). If the second argument is given, then it yields the items, according to their probabilities. So that program becomes:


import std.stdio, std.random, std.range;
void main() {
    immutable probabilities = [0.2, 0.6, 0.1, 0.1];
    immutable values = "ACGT";
    probabilities.diceRange(values).take(10).writeln;
}


This is quite handy.

@dlangBugzillaToGithub
Copy link
Author

bearophile_hugs commented on 2013-08-31T17:47:46Z

(In reply to comment #7)

> import std.stdio, std.random, std.range;
> void main() {
>     immutable probabilities = [0.2, 0.6, 0.1, 0.1];
>     immutable values = "ACGT";
>     probabilities.diceRange(values).take(10).writeln;
> }
> 
> 
> This is quite handy.

If the second optional argument is not supported, then the code becomes:


import std.stdio, std.random, std.range;
void main() {
    immutable probabilities = [0.2, 0.6, 0.1, 0.1];
    immutable values = "ACGT";
    probabilities.diceRange.map!(i => values[i]).take(10).writeln;
}


But I am not sure it's strictly equivalent.

@dlangBugzillaToGithub
Copy link
Author

bearophile_hugs commented on 2013-08-31T19:47:44Z

Also, why is the random generator engine the first optional argument of dice(), while it's the last for uniform()?

@dlangBugzillaToGithub
Copy link
Author

bearophile_hugs commented on 2013-11-22T04:43:56Z

See also:
http://www.keithschwarz.com/interesting/code/?dir=alias-method

Once paid the time to create the range, all the successive popFront() become just something like this (where for speed nextDouble is not a call to uniform(), but it's more like uniform01):

int column = random.nextInt(probability.length);
boolean coinToss = random.nextDouble() < probability[column];
return coinToss ? column : alias[column];

@dlangBugzillaToGithub
Copy link
Author

greensunny12 commented on 2018-03-31T15:28:02Z

FYI: Here's my implementation of the alias method for mir:

https://github.com/libmir/mir/blob/da76cf406d06957e472b9ba90b4c90b917480cb9/source/mir/random/discrete.d

(it's Boost-licensed)

@LightBender LightBender removed the P4 label Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants