Fix issue 11598 - uniform could be faster for integrals #1717

Zshazz · 2013-11-25T04:14:36Z

See:
https://d.puremagic.com/issues/show_bug.cgi?id=11598
http://forum.dlang.org/thread/jnu1an$rjr$1@digitalmars.com

These changes retain the uniform quality of this function while, in the common case, cutting the number of division operations in half.

Additionally, extra unittests were added to assure that all of the bounds were tested properly.

Example benchmark:

import std.stdio, std.datetime, std.random, std.range, std.algorithm;
alias RngType = Xorshift;

RngType rnd;
int[] arr;

void main() {
    rnd = RngType(328374);
    arr = iota(100_000).array();
    auto bench = benchmark!(shuffleTest)(100);
    writeln(bench[].map!(e => e.msecs), " ms");
}

void shuffleTest() {
    auto cpRnd = rnd.save();
    arr.randomShuffle(cpRnd);
}

Results on my machine:
Old uniform (using Xorshift): 686ms
New uniform (using Xorshift): 451ms
New uniform (using Mt19937): 597ms

So, as you can see, this new uniform is significantly faster than the old. In code that uses the integral uniform heavily (for instance, randomShuffle) it's possible to use the higher quality Mt19937 and still be faster than the old uniform using the much faster XorShift.

These changes retain the uniform quality of this function while, in the common case, cutting the number of division operations in half. Additionally, extra unittests were added to assure that all of the bounds were tested properly.

WebDrake · 2013-12-01T12:23:58Z

Please write up some test code (not unittests, it's too heavy; just test code for the sake of this pull request) to simulate a few different cases of rolling uniform(0, n) a sufficient number of times (say, n * 10 million or n * 100 million) to confirm that each rollable value has no systematic bias.

Do it for n = 2, 3, ..., 10, 20, 50, 100, and post the results here and in the bug report.

Sorry if it seems like I'm being fussy or doubting you, but there have been bugs in the past in std.random that would never have got there if the developer responsible had done a simple test like this before submitting a pull request. Obviously serious statistical tests-of-randomness are a bit too heavy for now, but this I think is reasonable to ask.

Zshazz · 2013-12-01T16:55:20Z

It's a fair request to have documented rigor for such an important function in std.random. I'll document my entire process here.

For tests doing what you describe, refer to this gist:

https://gist.github.com/Zshazz/1bffebf9cb7cb97cfabc

One thing to note, knowing the implementation for this change (a bit of information on the idea behind the implementation is at the end of this post), is that this wouldn't be sufficient for detecting all of the potential problems. For instance, if you used a bad uniform implementation with just rnum % upperDist (using the notation of this implementation), then the distribution error over uniform(0, 100) would be indetectable (even using a 32-bit rnum).

In that case, there would be ((2^32) / 100) + 1 buckets (around 43 million) and the last bucket would only cover up to ((2^32) % 100) - 1, which is 95. So, 96-99 each would be off by about 1 in 43 million rolls. Not really detectable visually. That would essentially appear to be background noise at that point and explainable just by the nature of random rolls.

So, let's define test1, a test that is divised to guard against that particular type of error. That's easy enough: just pick your numbers in such a way to minimize the number of buckets while still having some elements missing from the last bucket. So, picking ((2^32) / 4) * 3 would result in two buckets, with about 2/3 of the numbers in that range not being represented in the second bucket. Consequently, this makes it very easy to count and very easy to tell that something is off. Roll 100_000 times or so and count how many of those rolls fall into the first third of the range, and then count how many of those rolls fall into the last third of the range. If the counts differ (and they will in the case of badUniform), then you know there's an issue.

Now, let's define test2 to answer the following question: what if the reroll condition was chosen in such a way that it was off-by-one? Well, we'd have a very, very tough time figuring that out since, even with 32-bit numbers, a count array holding all numbers in that space would be too massive. Thoroughly analyzing that data would be even more of a challenge. So, to be fully rigorous about this, test2 has to slightly modify the types internal to the new uniform implementation to ensure that the rnum is generated using ubytes to make it so that it's easy to analyze (it uses typeof(unsigned(b-lower)) by default which is minimally a uint, which prevents excessive and expensive rerolls when smaller types are used). This does not change overall algorithm, so it still applies to the bigger version of the new uniform as well. Now test2 just tries out several i in uniform(0, i) s.t. i is a number around 128 (which is the exact middle of the range of 8-bit integers) to make sure everything behaves properly around that critical point.

Here is the code and results from running the code for these two tests using the new uniform and a bad uniform:

https://gist.github.com/Zshazz/450f57a5374bf0b80fb5

So all that said, these tests aren't necessarily going to cover every possible question about statistical issues that this new uniform could bring up, either. Hence a general argument for correctness is probably a better approach. For a complete description, I refer to my document on my dropbox:

https://dl.dropboxusercontent.com/u/2206555/uniformUpgrade.pdf

Because it's possible (and likely eventually) that this document will no longer appear on my dropbox, I'll reproduce the most important information here (not exactly what is in the document, but close enough):

The modulus operator maps an integer to a small, finite space. For instance, x % 3 will map whatever x is into the range [0 .. 3). 0 maps to 0, 1 maps to 1, 2 maps to 2, 3 maps to 0, and so on infinitely. As long as the integer is uniformly chosen from the infinite space of all non-negative integers then x % 3 will uniformly fall into that range.

(Non-negative is important in this case because some definitions of modulus, namely the one used in computers generally, map negative numbers differently to (-3 .. 0]. uniform does not use negative number modulus, thus we can safely ignore that fact.)

The issue with computers is that integers have a finite space they must fit in, and our uniformly chosen random number is picked in that finite space. So, that method is not sufficient. You can look at it as the integer space being divided into "buckets" and every bucket after the first bucket maps directly into that first bucket. [0, 1, 2], [3, 4, 5], ... When integers are finite, then the last bucket has the chance to be "incomplete": [uint.max - 3, uint.max - 2, uint.max - 1], [uint.max] ... (Uh oh, the last bucket only has 1!). The issue here is that every bucket maps completely to the first bucket except for that last one. The last one doesn't have corresponding mappings to 1 or 2, in this case, which makes it unfair.

So, the answer is to simply "reroll" if you're in that last bucket, since it's the only unfair one. Eventually you'll roll into a fair bucket. Simply, instead of the meaning of the last bucket being "maps to [0]", it changes to "maps to [0, 1, 2]", which is precisely what we want.

To generalize, upperDist represents the size of our buckets (and, thus, the exclusive upper bound for our desired uniform number). rnum is a uniformly random number picked from the space of integers that a computer can hold (we'll say UpperType represents that type).

We'll first try to do the mapping into the first bucket by doing offset = rnum % upperDist. We can figure out the position of the front of the bucket we're in by bucketFront = rnum - offset.

If we start at UpperType.max and walk backwards upperDist - 1 spaces, then the space we land on is the last acceptable position where a full bucket can fit:

   bucketFront     UpperType.max
      v                 v
[..., 0, 1, 2, ..., upperDist - 1]
      ^~~ upperDist - 1 ~~^

If the bucket starts any later, then it must have lost at least one number and
at least that number won't be represented fairly.

                bucketFront     UpperType.max
                     v                v
[..., upperDist - 1, 0, 1, 2, ..., upperDist - 2]
          ^~~~~~~~ upperDist - 1 ~~~~~~~^

Hence, our condition to reroll is
bucketFront > (UpperType.max - (upperDist - 1))

OK, I think that about covers everything I've done on this. I hope that wasn't too much, but I did want to document the work put into developing this.

braddr · 2013-12-01T19:49:25Z

w/in the comment history of a github pull is a horrible place for documentation. It belongs as ddoc comments w/in the source itself so that it becomes part of the generated documentation, or as a separate article referenced by the source documentation.

Zshazz · 2013-12-02T03:59:22Z

Should that really be part of the ddoc? It's more of implementation information, not necessarily something you need to know in order to use it. I could add it as internal documentation. That probably wouldn't be a bad idea since for the original uniform I wasn't entirely sure what the mental model was.

WebDrake · 2013-12-02T14:55:57Z

I think internal documentation is fine.

monarchdodra · 2013-12-03T11:42:46Z

std/random.d

+    }
+
+    assert(upperDist != 0);
+    if (upperDist == 1) return lower;


Is this test really necessary? Seems like a slowdown for the general case?

Not really. Without the check, the math still works. I think the only reason I had that line is because it was in the previous uniform (but that's not a good reason to keep it). I'll remove it in the next push. That gives a consistent ~3% boost in performance to the benchmark mentioned in the introductory post of the PR with Xorshift, which is certainly nice.

monarchdodra · 2013-12-03T11:53:05Z

@WebDrake : LGTM, mostly, but I'm no PRNG expert. Does this also look good to you?

A special case was checked that only occurs when calling uniform with a range of only one element. This was not necessary to keep the math correct and represented a slight slowdown in the common case for no apparent benefit.

WebDrake · 2013-12-03T12:51:41Z

@monarchdodra @Zshazz Can I have 24 hours to really go over it in detail? @Zshazz any reference works you can point me to that are relevant to your implementation here?

I'm sure the code is fine, I just want to make sure that I really, really understand it first.

Zshazz · 2013-12-03T13:00:38Z

I have no problem with you taking some time on it (I recognize that this needs very careful consideration). I don't have any reference information for it, but I don't think I'm particularly clever, so it probably exists out there. I'll look around and see if I can find anything that looks similar. The best I can do now is recommend you read uniformUpgrade.pdf/the documentation above, but I'm certainly not the best writer. If I find anything similar, I'll post it here.

Zshazz · 2013-12-03T14:12:15Z

Well, here's something interesting:
http://docs.oracle.com/javase/7/docs/api/java/util/Random.html#nextInt(int)

Java uses this same technique. The difference is that it overflows as opposed to the method here which doesn't rely on overflow behavior. It doesn't help with understanding, but it provides some precedence for this approach. It also special-cases powers of two for a bit of a performance boost in that circumstance.

monarchdodra · 2013-12-13T11:45:25Z

@WebDrake : LGTM. Anything to add?

WebDrake · 2013-12-13T11:52:26Z

I didn't manage to make the time to go over it in depth as I'd wanted, but I have no reason to hold it up. So let's go, if later I come up with an objection (unlikely!) we can always revert or revise.

Fix issue 11598 - uniform could be faster for integrals

monarchdodra · 2013-12-16T13:24:30Z

Thankyou for this commit.

WebDrake · 2014-05-31T22:09:16Z

Just to note, this patch is responsible for an observable change in the sequences of integers generated by calls to uniform. Try running:

import std.algorithm, std.random, std.range, std.stdio;

void main()
{
    rndGen.seed(12);
    iota(0, 10).map!(a => uniform(0, 10, rndGen)).writeln;
}

With the earlier uniform, this generates [1, 4, 7, 8, 2, 0, 5, 7, 0, 3]. With the new one provided in this patch, it generates [3, 3, 2, 7, 3, 6, 9, 3, 8, 0].

With (admittedly fairly basic and ad-hoc) tests I don't see any sign of bias in the new uniform, but it does produce strongly distinct results from the earlier one.

WebDrake · 2014-06-01T13:58:15Z

More seriously, I discovered this uniform will fail if an open lower bound is requested with character types or with integral types smaller than int. Try running:

void uniformTest(T)()
{
    import std.random;

    // WORK
    auto u1 = uniform!"[)"(T(0), T(10));
    auto u2 = uniform!"[]"(T(0), T(10));

    // MAY FAIL
    auto u3 = uniform!"()"(T(0), T(10));
    auto u4 = uniform!"(]"(T(0), T(10));
}

void main()
{
    import std.typetuple;

    foreach (T; TypeTuple!(real, double, float, ulong, long, uint, int,
                           ushort, short, ubyte, byte, wchar, dchar, char))
    {
        uniformTest!T();
    }
}

I'm writing up a bug report on this now.

WebDrake · 2014-06-01T14:01:20Z

Update: it looks like it's a general problem of how lower bounds have been dealt with, not specific to this particular implementation of uniform. So, this patch didn't introduce the problem.

WebDrake · 2014-06-01T14:10:36Z

Issue filed: https://issues.dlang.org/show_bug.cgi?id=12835

This corresponds to the changes made in phobos pull request dlang/phobos#1717

Zshazz added 2 commits December 2, 2013 03:58

Fix formatting issues

7c01f66

Include algorithm documentation for integral uniform

18b8716

monarchdodra reviewed Dec 3, 2013
View reviewed changes

Zshazz added 2 commits December 3, 2013 07:16

Remove special-case check on rare condition

3e63ce8

A special case was checked that only occurs when calling uniform with a range of only one element. This was not necessary to keep the math correct and represented a slight slowdown in the common case for no apparent benefit.

Fixed trailing whitespace

62f058f

monarchdodra added a commit that referenced this pull request Dec 16, 2013

Merge pull request #1717 from Zshazz/issue_11598

fc48d56

Fix issue 11598 - uniform could be faster for integrals

monarchdodra merged commit fc48d56 into dlang:master Dec 16, 2013

Zshazz deleted the issue_11598 branch December 17, 2013 03:56

WebDrake added a commit to WebDrake/hap that referenced this pull request Jun 2, 2014

Add Chris Cain's improved uniform() for integral types

9f244d4

This corresponds to the changes made in phobos pull request dlang/phobos#1717

Uh oh!

Fix issue 11598 - uniform could be faster for integrals #1717

Fix issue 11598 - uniform could be faster for integrals #1717

Uh oh!

Conversation

Zshazz commented Nov 25, 2013

Uh oh!

WebDrake commented Dec 1, 2013

Uh oh!

Zshazz commented Dec 1, 2013

Uh oh!

braddr commented Dec 1, 2013

Uh oh!

Zshazz commented Dec 2, 2013

Uh oh!

WebDrake commented Dec 2, 2013

Uh oh!

monarchdodra Dec 3, 2013

Choose a reason for hiding this comment

Uh oh!

Zshazz Dec 3, 2013

Choose a reason for hiding this comment

Uh oh!

monarchdodra commented Dec 3, 2013

Uh oh!

WebDrake commented Dec 3, 2013

Uh oh!

Zshazz commented Dec 3, 2013

Uh oh!

Zshazz commented Dec 3, 2013

Uh oh!

monarchdodra commented Dec 13, 2013

Uh oh!

WebDrake commented Dec 13, 2013

Uh oh!

monarchdodra commented Dec 16, 2013

Uh oh!

WebDrake commented May 31, 2014

Uh oh!

WebDrake commented Jun 1, 2014

Uh oh!

WebDrake commented Jun 1, 2014

Uh oh!

WebDrake commented Jun 1, 2014

Uh oh!

Uh oh!