Skip to content

Fix issue 11598 - uniform could be faster for integrals #1717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 16, 2013
Merged

Fix issue 11598 - uniform could be faster for integrals #1717

merged 5 commits into from
Dec 16, 2013

Conversation

Zshazz
Copy link
Contributor

@Zshazz Zshazz commented Nov 25, 2013

See:
https://d.puremagic.com/issues/show_bug.cgi?id=11598
http://forum.dlang.org/thread/jnu1an$rjr$1@digitalmars.com

These changes retain the uniform quality of this function while, in the common case, cutting the number of division operations in half.

Additionally, extra unittests were added to assure that all of the bounds were tested properly.

Example benchmark:

import std.stdio, std.datetime, std.random, std.range, std.algorithm;
alias RngType = Xorshift;

RngType rnd;
int[] arr;

void main() {
    rnd = RngType(328374);
    arr = iota(100_000).array();
    auto bench = benchmark!(shuffleTest)(100);
    writeln(bench[].map!(e => e.msecs), " ms");
}

void shuffleTest() {
    auto cpRnd = rnd.save();
    arr.randomShuffle(cpRnd);
}

Results on my machine:
Old uniform (using Xorshift): 686ms
New uniform (using Xorshift): 451ms
New uniform (using Mt19937): 597ms

So, as you can see, this new uniform is significantly faster than the old. In code that uses the integral uniform heavily (for instance, randomShuffle) it's possible to use the higher quality Mt19937 and still be faster than the old uniform using the much faster XorShift.

These changes retain the uniform quality of this function while, in the
common case, cutting the number of division operations in half.

Additionally, extra unittests were added to assure that all of the
bounds were tested properly.
@WebDrake
Copy link
Contributor

WebDrake commented Dec 1, 2013

Please write up some test code (not unittests, it's too heavy; just test code for the sake of this pull request) to simulate a few different cases of rolling uniform(0, n) a sufficient number of times (say, n * 10 million or n * 100 million) to confirm that each rollable value has no systematic bias.

Do it for n = 2, 3, ..., 10, 20, 50, 100, and post the results here and in the bug report.

Sorry if it seems like I'm being fussy or doubting you, but there have been bugs in the past in std.random that would never have got there if the developer responsible had done a simple test like this before submitting a pull request. Obviously serious statistical tests-of-randomness are a bit too heavy for now, but this I think is reasonable to ask.

@Zshazz
Copy link
Contributor Author

Zshazz commented Dec 1, 2013

It's a fair request to have documented rigor for such an important function in std.random. I'll document my entire process here.

For tests doing what you describe, refer to this gist:

https://gist.github.com/Zshazz/1bffebf9cb7cb97cfabc

One thing to note, knowing the implementation for this change (a bit of information on the idea behind the implementation is at the end of this post), is that this wouldn't be sufficient for detecting all of the potential problems. For instance, if you used a bad uniform implementation with just rnum % upperDist (using the notation of this implementation), then the distribution error over uniform(0, 100) would be indetectable (even using a 32-bit rnum).

In that case, there would be ((2^32) / 100) + 1 buckets (around 43 million) and the last bucket would only cover up to ((2^32) % 100) - 1, which is 95. So, 96-99 each would be off by about 1 in 43 million rolls. Not really detectable visually. That would essentially appear to be background noise at that point and explainable just by the nature of random rolls.

So, let's define test1, a test that is divised to guard against that particular type of error. That's easy enough: just pick your numbers in such a way to minimize the number of buckets while still having some elements missing from the last bucket. So, picking ((2^32) / 4) * 3 would result in two buckets, with about 2/3 of the numbers in that range not being represented in the second bucket. Consequently, this makes it very easy to count and very easy to tell that something is off. Roll 100_000 times or so and count how many of those rolls fall into the first third of the range, and then count how many of those rolls fall into the last third of the range. If the counts differ (and they will in the case of badUniform), then you know there's an issue.

Now, let's define test2 to answer the following question: what if the reroll condition was chosen in such a way that it was off-by-one? Well, we'd have a very, very tough time figuring that out since, even with 32-bit numbers, a count array holding all numbers in that space would be too massive. Thoroughly analyzing that data would be even more of a challenge. So, to be fully rigorous about this, test2 has to slightly modify the types internal to the new uniform implementation to ensure that the rnum is generated using ubytes to make it so that it's easy to analyze (it uses typeof(unsigned(b-lower)) by default which is minimally a uint, which prevents excessive and expensive rerolls when smaller types are used). This does not change overall algorithm, so it still applies to the bigger version of the new uniform as well. Now test2 just tries out several i in uniform(0, i) s.t. i is a number around 128 (which is the exact middle of the range of 8-bit integers) to make sure everything behaves properly around that critical point.

Here is the code and results from running the code for these two tests using the new uniform and a bad uniform:

https://gist.github.com/Zshazz/450f57a5374bf0b80fb5


So all that said, these tests aren't necessarily going to cover every possible question about statistical issues that this new uniform could bring up, either. Hence a general argument for correctness is probably a better approach. For a complete description, I refer to my document on my dropbox:

https://dl.dropboxusercontent.com/u/2206555/uniformUpgrade.pdf

Because it's possible (and likely eventually) that this document will no longer appear on my dropbox, I'll reproduce the most important information here (not exactly what is in the document, but close enough):

The modulus operator maps an integer to a small, finite space. For instance, x % 3 will map whatever x is into the range [0 .. 3). 0 maps to 0, 1 maps to 1, 2 maps to 2, 3 maps to 0, and so on infinitely. As long as the integer is uniformly chosen from the infinite space of all non-negative integers then x % 3 will uniformly fall into that range.

(Non-negative is important in this case because some definitions of modulus, namely the one used in computers generally, map negative numbers differently to (-3 .. 0]. uniform does not use negative number modulus, thus we can safely ignore that fact.)

The issue with computers is that integers have a finite space they must fit in, and our uniformly chosen random number is picked in that finite space. So, that method is not sufficient. You can look at it as the integer space being divided into "buckets" and every bucket after the first bucket maps directly into that first bucket. [0, 1, 2], [3, 4, 5], ... When integers are finite, then the last bucket has the chance to be "incomplete": [uint.max - 3, uint.max - 2, uint.max - 1], [uint.max] ... (Uh oh, the last bucket only has 1!). The issue here is that every bucket maps completely to the first bucket except for that last one. The last one doesn't have corresponding mappings to 1 or 2, in this case, which makes it unfair.

So, the answer is to simply "reroll" if you're in that last bucket, since it's the only unfair one. Eventually you'll roll into a fair bucket. Simply, instead of the meaning of the last bucket being "maps to [0]", it changes to "maps to [0, 1, 2]", which is precisely what we want.

To generalize, upperDist represents the size of our buckets (and, thus, the exclusive upper bound for our desired uniform number). rnum is a uniformly random number picked from the space of integers that a computer can hold (we'll say UpperType represents that type).

We'll first try to do the mapping into the first bucket by doing offset = rnum % upperDist. We can figure out the position of the front of the bucket we're in by bucketFront = rnum - offset.

If we start at UpperType.max and walk backwards upperDist - 1 spaces, then the space we land on is the last acceptable position where a full bucket can fit:

   bucketFront     UpperType.max
      v                 v
[..., 0, 1, 2, ..., upperDist - 1]
      ^~~ upperDist - 1 ~~^

If the bucket starts any later, then it must have lost at least one number and
at least that number won't be represented fairly.

                bucketFront     UpperType.max
                     v                v
[..., upperDist - 1, 0, 1, 2, ..., upperDist - 2]
          ^~~~~~~~ upperDist - 1 ~~~~~~~^

Hence, our condition to reroll is
bucketFront > (UpperType.max - (upperDist - 1))


OK, I think that about covers everything I've done on this. I hope that wasn't too much, but I did want to document the work put into developing this.

@braddr
Copy link
Member

braddr commented Dec 1, 2013

w/in the comment history of a github pull is a horrible place for documentation. It belongs as ddoc comments w/in the source itself so that it becomes part of the generated documentation, or as a separate article referenced by the source documentation.

@Zshazz
Copy link
Contributor Author

Zshazz commented Dec 2, 2013

Should that really be part of the ddoc? It's more of implementation information, not necessarily something you need to know in order to use it. I could add it as internal documentation. That probably wouldn't be a bad idea since for the original uniform I wasn't entirely sure what the mental model was.

@WebDrake
Copy link
Contributor

WebDrake commented Dec 2, 2013

I think internal documentation is fine.

}

assert(upperDist != 0);
if (upperDist == 1) return lower;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test really necessary? Seems like a slowdown for the general case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. Without the check, the math still works. I think the only reason I had that line is because it was in the previous uniform (but that's not a good reason to keep it). I'll remove it in the next push. That gives a consistent ~3% boost in performance to the benchmark mentioned in the introductory post of the PR with Xorshift, which is certainly nice.

@monarchdodra
Copy link
Collaborator

@WebDrake : LGTM, mostly, but I'm no PRNG expert. Does this also look good to you?

A special case was checked that only occurs when calling uniform with a range of only one element. This was not necessary to keep the math correct and represented a slight slowdown in the common case for no apparent benefit.
@WebDrake
Copy link
Contributor

WebDrake commented Dec 3, 2013

@monarchdodra @Zshazz Can I have 24 hours to really go over it in detail? @Zshazz any reference works you can point me to that are relevant to your implementation here?

I'm sure the code is fine, I just want to make sure that I really, really understand it first.

@Zshazz
Copy link
Contributor Author

Zshazz commented Dec 3, 2013

I have no problem with you taking some time on it (I recognize that this needs very careful consideration). I don't have any reference information for it, but I don't think I'm particularly clever, so it probably exists out there. I'll look around and see if I can find anything that looks similar. The best I can do now is recommend you read uniformUpgrade.pdf/the documentation above, but I'm certainly not the best writer. If I find anything similar, I'll post it here.

@Zshazz
Copy link
Contributor Author

Zshazz commented Dec 3, 2013

Well, here's something interesting:
http://docs.oracle.com/javase/7/docs/api/java/util/Random.html#nextInt(int)

Java uses this same technique. The difference is that it overflows as opposed to the method here which doesn't rely on overflow behavior. It doesn't help with understanding, but it provides some precedence for this approach. It also special-cases powers of two for a bit of a performance boost in that circumstance.

@monarchdodra
Copy link
Collaborator

@WebDrake : LGTM. Anything to add?

@WebDrake
Copy link
Contributor

I didn't manage to make the time to go over it in depth as I'd wanted, but I have no reason to hold it up. So let's go, if later I come up with an objection (unlikely!) we can always revert or revise.

monarchdodra added a commit that referenced this pull request Dec 16, 2013
Fix issue 11598 - uniform could be faster for integrals
@monarchdodra monarchdodra merged commit fc48d56 into dlang:master Dec 16, 2013
@monarchdodra
Copy link
Collaborator

Thankyou for this commit.

@Zshazz Zshazz deleted the issue_11598 branch December 17, 2013 03:56
@WebDrake
Copy link
Contributor

Just to note, this patch is responsible for an observable change in the sequences of integers generated by calls to uniform. Try running:

import std.algorithm, std.random, std.range, std.stdio;

void main()
{
    rndGen.seed(12);
    iota(0, 10).map!(a => uniform(0, 10, rndGen)).writeln;
}

With the earlier uniform, this generates [1, 4, 7, 8, 2, 0, 5, 7, 0, 3]. With the new one provided in this patch, it generates [3, 3, 2, 7, 3, 6, 9, 3, 8, 0].

With (admittedly fairly basic and ad-hoc) tests I don't see any sign of bias in the new uniform, but it does produce strongly distinct results from the earlier one.

@WebDrake
Copy link
Contributor

WebDrake commented Jun 1, 2014

More seriously, I discovered this uniform will fail if an open lower bound is requested with character types or with integral types smaller than int. Try running:

void uniformTest(T)()
{
    import std.random;

    // WORK
    auto u1 = uniform!"[)"(T(0), T(10));
    auto u2 = uniform!"[]"(T(0), T(10));

    // MAY FAIL
    auto u3 = uniform!"()"(T(0), T(10));
    auto u4 = uniform!"(]"(T(0), T(10));
}

void main()
{
    import std.typetuple;

    foreach (T; TypeTuple!(real, double, float, ulong, long, uint, int,
                           ushort, short, ubyte, byte, wchar, dchar, char))
    {
        uniformTest!T();
    }
}

I'm writing up a bug report on this now.

@WebDrake
Copy link
Contributor

WebDrake commented Jun 1, 2014

Update: it looks like it's a general problem of how lower bounds have been dealt with, not specific to this particular implementation of uniform. So, this patch didn't introduce the problem.

@WebDrake
Copy link
Contributor

WebDrake commented Jun 1, 2014

WebDrake added a commit to WebDrake/hap that referenced this pull request Jun 2, 2014
This corresponds to the changes made in phobos pull request
dlang/phobos#1717
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants