Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

RFC: Performance updates for randn() to randmtzig.c. #1372

Closed
wants to merge 5 commits into
from

Conversation

Projects
None yet
8 participants
Owner

ViralBShah commented Oct 12, 2012

I have modified randmtzig.c to use dSFMT's double precision random numbers. It ends up using the randomness from the 52-bit Mantissa. The code is changed to reflect this slightly lower precision.

This makes individual calls to randn() about 25% faster, and avoids the use of dsfmt_genrand_uint32, as discussed in #1348.

I request for a thorough review of this code, to see if I have messed up anything. I have verified the mean() and std() to be correct, but it would be great if we can come up with a few more rigorous tests in test/rng.jl.

@ViralBShah ViralBShah Performance updates for randn() to randmtzig.c.
I have modified randmtzig.c to use dSFMT's double precision random
numbers. It ends up using the randomness from the 52-bit Mantissa.
The code is changed to reflect this slightly lower precision.

This makes individual calls to randn() about 25% faster.
a9656a1
Owner

ViralBShah commented Oct 12, 2012

# master
julia> gc();@time for i=1:100000000; randn(); end;
elapsed time: 0.9869461059570312 seconds

# vs/randn branch
julia> gc(); @time for i=1:100000000; randn(); end;
elapsed time: 0.7849278450012207 seconds
Owner

StefanKarpinski commented Oct 12, 2012

This looks good from a moderately casual inspection. This is kind of hard code to know if there's anything wrong with it just by looking through, I'm afraid.

Owner

ViralBShah commented Oct 12, 2012

Yes, it took me quite a while to figure it out, and I have tried to keep it as close to the original as possible. It would be easier to put some tests in place that test the rng.

Owner

StefanKarpinski commented Oct 12, 2012

RNG testing is super hard but we should definitely do that.

Member

johnmyleswhite commented Oct 12, 2012

Here are my (poorly ordered) notes and code for using STS to test a Julia RNG:

https://github.com/johnmyleswhite/RNGTesting/blob/master/NOTES.md

I'm at a conference right now, so I can't help edit this code to test Viral's change, but it should be easy. You can use something like the src/generate_julia_bits.jl to create a text file with random bits. Then you can use STS as described to test those bits.

Owner

ViralBShah commented Oct 12, 2012

I did plot a histogram, and it does come out looking right.

Owner

ViralBShah commented Oct 12, 2012

@johnmyleswhite I thought the DieHard suite was for uniformly distributed random numbers. Would it work for normally distributed random numbers too?

If we lose 1 bit why is this divided by 4? And is losing that bit really ok?

I was wondering that myself, but on the other hand it seems to me that selecting a random float between 0 and 1 only requires 52 bits.

Owner

JeffBezanson replied Oct 12, 2012

But this algorithm steals 1 bit for the sign.

Right, that's why it's concerning. I completely naïvely feel like 53 bits shouldn't be enough entropy for random Gaussians given their greater range than random uniforms.

Owner

JeffBezanson replied Oct 12, 2012

There must be a way to use the exponent bits, though of course by themselves they aren't distributed right.

Owner

ViralBShah replied Oct 12, 2012

Actually, it was originally 53-bit + 1-bit for the sign, which is why there was randi54(). Hence, the division by 4. I don't think losing those bits is a problem, because the algorithm generates 64-bits of randomness and trashes 10 bits. In fact, implementations are often 32-bit. Also, the bits for idx end up being used twice, which isn't the case in ziggurat.jl in test/perf2. The exponent bits do not have any randomness given that double precision numbers are generated in (0,1) or (1,2) by dsfmt.

Member

johnmyleswhite commented Oct 13, 2012

@ViralBShah Sorry for the confusion. I had lost context and forgot that you needed a test suite for a Gaussian variable. I need to confirm the following claim by looking at Devroye, but I believe it should possible to map every output from randn() through the inverse CDF of the standard Gaussian distribution and then use the uniform variable tests to confirm that you're producing valid input. Since every distribution is uniquely determined by its CDF and the quantiles of any random stream should be uniformly distributed, this should work unless numerical precision issues in the calculation of the inverse CDF artificially corrupt the results. Will get back to you.

Member

johnmyleswhite commented Oct 13, 2012

Devroye didn't have an obvious answer, but this document confirms that the inverse CDF method is a valid
way to test an RNG for Gaussian variables: http://www.cse.cuhk.edu.hk/~phwl/mt/public/archives/papers/grng_acmcs07.pdf

Member

pao commented Oct 13, 2012

This seems reasonable, since the CDF is the naive method for generating random variates for a given distribution from a uniform [0,1) random variate. Since we trust the entropy source (dsfmt) and presumably trust our inverse CDF that should verify that we properly used that entropy to create normally-distributed variates.

Owner

ViralBShah commented Oct 14, 2012

From the paper:

It is shown that the Wallace method is the fastest, but can suffer from correlation problems; the Ziggurat method, the second in speed, is about 33% slower than the Wallace method but does not suffer from correlation problems. Thus, when maintaining extremely high statistical quality is the first priority, and subject to that constraint, speed is also desired, the Ziggurat method will often be the most appropriate choice. If the quality requirements are not so stringent but speed is essential then Wallace may be appropriate.

So, our choice of Ziggurat is the right choice.

Owner

ViralBShah commented Oct 14, 2012

It's worth noting that if I use clang with -O4 in the standalone mode, it is twice as fast as with -O3. Link time optimizations get used at -O4, which do not help when used as a shared library loaded by julia.

Owner

ViralBShah commented Oct 14, 2012

@johnmyleswhite Would it be sufficient to apply the function Phi(x) from this paper (referred from the one you posted) and then apply RNGTesting?

http://www.jstatsoft.org/v11/a05/paper

It would be great to also see what @dmbates has to say.

ViralBShah added some commits Oct 12, 2012

@ViralBShah ViralBShah Performance updates for randn() to randmtzig.c.
I have modified randmtzig.c to use dSFMT's double precision random
numbers. It ends up using the randomness from the 52-bit Mantissa.
The code is changed to reflect this slightly lower precision.

This makes individual calls to randn() about 25% faster.

Use 64-bit ints for randmtzig fill functions.
Minor cleanups to randmtzig.c.

Minor changes to randmtzig.c for clarity.
acf88a9
@ViralBShah ViralBShah Merge branch 'vs/randn' of github.com:JuliaLang/julia into vs/randn 23bde72
Member

johnmyleswhite commented Oct 14, 2012

I'd really like to have @dmbates input on this as well, but I would say that it will be sufficient to use Phi(x) from the Marsaglia 2004 paper to map into the uniform distribution for testing. Part of me suspects that using cPhi(x) for x near 1 might be important, but I worry that inconsistencies between the two implementations will cause the RNG to fail unfairly.

Looking back at RNGTesting, I realize that it has a bug in it that leads it to essentially only test whether the median is correctly placed or not. I'll revise it now to estimate things correctly for all the bits in a random number.

Owner

ViralBShah commented Oct 14, 2012

@johnmyleswhite Let me know when RNGTesting is ready. We should add Phi(x) and cPhi(x) from the Marsaglia 2004 paper to RNGTesting as well.

Owner

ViralBShah commented Oct 14, 2012

I did do some quick checks on the output of Phi(randn(x)), and the histogram does look ok.

Member

johnmyleswhite commented Oct 14, 2012

I'm working on it now, but it should be usable for your purposes already since I'm changing the code to generate random bits rather than code to test those bits (since that code comes from STS directly). If you can dump out the output of Phi(randn(x)) as a long bitstring file, you can test that directly using STS.

I'm working on creating that bitstring file. I realized that I did something stupid the last time I wrote this, so now I'm using your approach and only working with the 52-bit mantissa.

Member

johnmyleswhite commented Oct 14, 2012

I think I'm confused about how to access the raw bits of a Julian Float64. If I call bits(rand()), I should have the sign bit as the first bit, the exponent as the next 11 bits and then the mantissa, right? If that were correct, then rand() may be slightly corrupt because the last few bits are not distributed 50/50 as 0's and 1's. I think that may be required in order to force the results into the interval [0, 1), but it would be good to hear from others who are more familiar with details of floating point representations. Whatever the source of this discrepancy, we can only test a subset of the bits from calls to rand().

In order to use STS, we need to provide a bitstream in which every bit is a random 0/1 variable with probability 0.5 of coming up 1. This stream can be handled as ASCII (which is what I've done so far) or as binary in which every single bit is random. Either way, we cannot naively dump floating point numbers into a file that STS will read because so many of the bits are clearly not random. I am still trying to determine which bits are plausibly random.

Member

andreasnoack commented Oct 14, 2012

Have you considered

http://www.iro.umontreal.ca/~simardr/testu01/

It takes bits and uniformly distributes floats and hence you only need pass the Phi(x) to the test battery. You can run the small battery for saved values, but I think the easiest way is to test randmtzig_randn() in pure C.

I am running the medium sized test right now on the old randn and it seems to work fine. Also, it looks like Marsaglias Psi that works for never happening outcomes is not really needed. My code is

#include "bbattery.h"
#include "dsfmt-2.2/dSFMT.c"
#include "randmtzig.c"

double pnorm5(double, double, double, int, int);
double randntorand(void)
{
    return pnorm5(randmtzig_randn(), 0.0, 1.0, 1, 0);
}

int main (void)
{
    dsfmt_gv_init_gen_rand(117);
    randmtzig_create_ziggurat_tables();
    unif01_Gen *gen;
    // swrite_Basic = FALSE;
    gen = unif01_CreateExternGen01 ("randmtzig_randn", randntorand);
    bbattery_Crush (gen);
    return 0;
}

and I guess that the test would fail if R's cdf was not precise enough. It takes some hours to run the medium test but I can try out the new implementation of randmtzig_randn() soon and report the result as soon as it arrives.

[pao: code formatting]

Member

johnmyleswhite commented Oct 14, 2012

I'll try that. Last time I looked into it, I couldn't get it to build on my machine.

Owner

ViralBShah commented Oct 14, 2012

I would have thought that using the last 52-bits from calls to rand() should be all you should need.

-viral

On 14-Oct-2012, at 8:18 PM, John Myles White wrote:

I think I'm confused about how to access the raw bits of a Julian Float64. If I call bits(rand()), I should have the sign bit as the first bit, the exponent as the next 11 bits and then the mantissa, right? If that were correct, then rand() may be slightly corrupt because the last few bits are not distributed 50/50 as 0's and 1's. I think that may be required in order to force the results into the interval [0, 1), but it would be good to hear from others who are more familiar with details of floating point representations. Whatever the source of this discrepancy, we can only test a subset of the bits from calls to rand().

In order to use STS, we need to provide a bitstream in which every bit is a random 0/1 variable with probability 0.5 of coming up 1. This stream can be handled as ASCII (which is what I've done so far) or as binary in which every single bit is random. Either way, we cannot naively dump floating point numbers into a file that STS will read because so many of the bits are clearly not random. I am still trying to determine which bits are plausibly random.


Reply to this email directly or view it on GitHub.

Member

johnmyleswhite commented Oct 14, 2012

Try this code. On my machine, the last four bits are clearly corrupt.

all_bits = zeros(Int, 250_000, 64)

for i in 1:250_000
  all_bits[i, :] = map(b -> int(b), split(bits(rand()), ""))
end

print(join(mean(all_bits, 1), "\n"))
Member

andreasnoack commented Oct 14, 2012

The old randmtzig_randn() seems to work. I'll try the new one now.

========= Summary results of Crush =========

Version: TestU01 1.2.3
Generator: randmtzig_randn
Number of statistics: 144
Total CPU time: 01:52:08.64

All tests were passed

Member

andreasnoack commented Oct 14, 2012

The new one passed the small test. I'll continue with the medium sized.

========= Summary results of SmallCrush =========

Version: TestU01 1.2.3
Generator: randmtzig_randn
Number of statistics: 15
Total CPU time: 00:00:32.03

All tests were passed

Member

johnmyleswhite commented Oct 14, 2012

Great. Thanks so much for doing this!

Owner

ViralBShah commented Oct 14, 2012

Could you create a repository on github with the testu01 and the driver code? It will be useful to replicate results on a few machines, and also going forward. If the new one works, then this can be merged.

-viral

On 14-Oct-2012, at 8:43 PM, Andreas Noack Jensen wrote:

The old randmtzig_randn() seems to work. I'll try the new one now.

========= Summary results of Crush =========

Version: TestU01 1.2.3
Generator: randmtzig_randn
Number of statistics: 144
Total CPU time: 01:52:08.64

All tests were passed


Reply to this email directly or view it on GitHub.

Member

andreasnoack commented Oct 14, 2012

I'll try, but I have never written a shell script or a Makefile before, so I guess it will take some time.

Owner

ViralBShah commented Oct 14, 2012

If you can create a repo and the compilation instructions, I can provide the Makefile.

-viral

On 14-Oct-2012, at 8:55 PM, Andreas Noack Jensen wrote:

I'll try, but I have never written a shell script or a Makefile before, so I guess it will take some time.


Reply to this email directly or view it on GitHub.

Owner

ViralBShah commented Oct 14, 2012

@johnmyleswhite - Even the top bits of the mantissa look a bit strange, but I do not know enough to say much. dsfmt does pass the crush tests, so perhaps I am not understanding it. Need to read the paper.

-viral

On 14-Oct-2012, at 8:42 PM, John Myles White wrote:

Try this code. On my machine, the last four bits are clearly corrupt.

all_bits = zeros(Int, 250_000, 64)

for i in 1:250_000
all_bits[i, :] = map(b -> int(b), split(bits(rand()), ""))
end

print(join(mean(all_bits, 1), "\n"))


Reply to this email directly or view it on GitHub.

Member

johnmyleswhite commented Oct 14, 2012

Agreed. As long as we're passing the Crush tests, I suspect the "anomalies" we see are an artifact of how floats map into bit patterns.

Member

andreasnoack commented Oct 14, 2012

Here are the files and some bullets explaining what I did.

https://github.com/andreasnoackjensen/rngtest.jl

Member

andreasnoack commented Oct 14, 2012

More than 12 minutes faster and no tests failed.

========= Summary results of Crush =========

Version: TestU01 1.2.3
Generator: randmtzig_randn
Number of statistics: 144
Total CPU time: 01:39:39.27

All tests were passed

Owner

ViralBShah commented Oct 14, 2012

Thanks everyone.

This branch is a bit messed up, so I am just going to commit afresh.

-viral

On 14-Oct-2012, at 10:34 PM, Andreas Noack Jensen wrote:

More than 12 minutes faster and no tests failed.

========= Summary results of Crush =========

Version: TestU01 1.2.3
Generator: randmtzig_randn
Number of statistics: 144
Total CPU time: 01:39:39.27

All tests were passed


Reply to this email directly or view it on GitHub.

@ViralBShah ViralBShah closed this Oct 14, 2012

Owner

ViralBShah commented Oct 16, 2012

Another noteworthy comment from the "Normal Random Number Generators" paper:

The Ziggurat method passes all tests in single-precision, but in double-precision, fails a single test, the Collisions test. This is because in the published version of the Ziggurat algorithm [Marsaglia and Tsang 2000] the same random value is used both to select the rectangle and to provide the sample, so there is a correlation between the low bits of the sample used to select the rectangle, and the absolute magnitude of the sample. Using an independent random number generator to select the rectangle fixes this minor problem. For example, the eight least significant bits of the floating point fraction can be replaced using bit-wise operations, requiring one extra 32-bit uniform random integer for every four generated Gaussian random samples.

Owner

ViralBShah commented Oct 16, 2012

According to the Cleve's Corner article, which is now 10 years old:

http://www.mathworks.in/company/newsletters/news_notes/clevescorner/spring01_cleve.html

MATLAB's built-in function randn uses two unsigned 32-bit integers, one for the seed in the uniform generator and one as the shift register for the integer generator. The code that processes points outside the core of the ziggurat accesses these two generators independently, so the period of the overall generator is. At 10 million normal deviates per second, randn will take over 58,000 years before it repeats.

Owner

ViralBShah commented Oct 16, 2012

Also worthwhile to see this paper in this context - An Improved Ziggurat Method to Generate Normal Random Samples

http://www.doornik.com/research/ziggurat.pdf

Member

johnmyleswhite commented Oct 16, 2012

Did we pass the Collision test?

Owner

ViralBShah commented Oct 16, 2012

We should expect the collision test to fail, since this code does reuse the
bits.

-viral
On Oct 16, 2012 8:19 PM, "John Myles White" notifications@github.com
wrote:

Did we pass the Collision test?


Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/pull/1372#issuecomment-9492776.

Member

johnmyleswhite commented Oct 16, 2012

Any sense how much performance we would lose from employing an additional source of bits like in Doornik's paper?

Member

andreasnoack commented Oct 19, 2012

A follow up on this one. The BigCrush is over and it failed one test, but only slightly and therefore I ran that specific test two times with new seeds without failure. Hence I think we can trust the normal random numbers in Julia as they are. Julia uses a different RNG than Doornik which might explain the different results.

========= Summary results of BigCrush =========

 Version:          TestU01 1.2.3
 Generator:        randmtzig_randn
 Number of statistics:  160
 Total CPU time:   18:37:07.44
 The following tests gave p-values outside [0.001, 0.9990]:
 (eps  means a value < 1.0e-300):
 (eps1 means a value < 1.0e-15):

       Test                          p-value
 ----------------------------------------------
  9  CollisionOver, t = 14           1.1e-4
 ----------------------------------------------
 All other tests were passed

***********************************************************
Test smarsa_CollisionOver calling smultin_MultinomialOver

***********************************************************
HOST = ubuntu, Linux

randmtzig_randn


smultin_MultinomialOver test:
-----------------------------------------------
   N = 30,  n = 20000000,  r =  0,   d =    8,   t = 14,
       Sparse =   TRUE

       GenerCell = smultin_GenerCellSerial
       Number of cells = d^t =      4398046511104
       Expected number per cell =  1 /  219902.33
       EColl = n^2 / (2k) =  45.47473509
       Hashing =   TRUE

       Collision test

       CollisionOver:   density = n / k =  1 /  219902.33
       Expected number of collisions = Mu =      45.47


-----------------------------------------------
Results of CollisionOver test:

POISSON approximation                 :
Expected number of collisions = N*Mu  :     1364.24
Observed number of collisions         :     1409
p-value of test                       :    0.12

-----------------------------
Total number of cells containing j balls

  j =  0                              :  131940795334529
  j =  1                              :        599997182
  j =  2                              :             1409
  j =  3                              :                0
  j =  4                              :                0
  j =  5                              :                0

-----------------------------------------------
CPU time used                    :  00:05:33.49


***********************************************************
Test smarsa_CollisionOver calling smultin_MultinomialOver

***********************************************************
HOST = ubuntu, Linux

randmtzig_randn


smultin_MultinomialOver test:
-----------------------------------------------
   N = 30,  n = 20000000,  r =  0,   d =    8,   t = 14,
       Sparse =   TRUE

       GenerCell = smultin_GenerCellSerial
       Number of cells = d^t =      4398046511104
       Expected number per cell =  1 /  219902.33
       EColl = n^2 / (2k) =  45.47473509
       Hashing =   TRUE

       Collision test

       CollisionOver:   density = n / k =  1 /  219902.33
       Expected number of collisions = Mu =      45.47


-----------------------------------------------
Results of CollisionOver test:

POISSON approximation                 :
Expected number of collisions = N*Mu  :     1364.24
Observed number of collisions         :     1363
p-value of test                       :    0.51

-----------------------------
Total number of cells containing j balls

  j =  0                              :  131940795334483
  j =  1                              :        599997274
  j =  2                              :             1363
  j =  3                              :                0
  j =  4                              :                0
  j =  5                              :                0

-----------------------------------------------
CPU time used                    :  00:05:34.12
Member

johnmyleswhite commented Oct 19, 2012

How long does this test take? It would be nice to know the distribution of p-values. They should be uniform if the test is really being passed.

-- John

On Oct 19, 2012, at 4:37 AM, Andreas Noack Jensen notifications@github.com wrote:

A follow up on this one. The BigCrush is over and it failed one test, but only slightly and therefore I ran that specific test two times with new seeds without failure. Hence I think we can trust the normal random numbers in Julia as they are.

========= Summary results of BigCrush =========

Version: TestU01 1.2.3
Generator: randmtzig_randn
Number of statistics: 160
Total CPU time: 18:37:07.44
The following tests gave p-values outside [0.001, 0.9990]:
(eps means a value < 1.0e-300):
(eps1 means a value < 1.0e-15):

   Test                          p-value

9 CollisionOver, t = 14 1.1e-4


All other tests were passed


Test smarsa_CollisionOver calling smultin_MultinomialOver


HOST = ubuntu, Linux

randmtzig_randn

smultin_MultinomialOver test:

N = 30, n = 20000000, r = 0, d = 8, t = 14,
Sparse = TRUE

   GenerCell = smultin_GenerCellSerial
   Number of cells = d^t =      4398046511104
   Expected number per cell =  1 /  219902.33
   EColl = n^2 / (2k) =  45.47473509
   Hashing =   TRUE

   Collision test

   CollisionOver:   density = n / k =  1 /  219902.33
   Expected number of collisions = Mu =      45.47

Results of CollisionOver test:

POISSON approximation :
Expected number of collisions = N*Mu : 1364.24
Observed number of collisions : 1409
p-value of test : 0.12


Total number of cells containing j balls

j = 0 : 131940795334529
j = 1 : 599997182
j = 2 : 1409
j = 3 : 0
j = 4 : 0
j = 5 : 0


CPU time used : 00:05:33.49


Test smarsa_CollisionOver calling smultin_MultinomialOver


HOST = ubuntu, Linux

randmtzig_randn

smultin_MultinomialOver test:

N = 30, n = 20000000, r = 0, d = 8, t = 14,
Sparse = TRUE

   GenerCell = smultin_GenerCellSerial
   Number of cells = d^t =      4398046511104
   Expected number per cell =  1 /  219902.33
   EColl = n^2 / (2k) =  45.47473509
   Hashing =   TRUE

   Collision test

   CollisionOver:   density = n / k =  1 /  219902.33
   Expected number of collisions = Mu =      45.47

Results of CollisionOver test:

POISSON approximation :
Expected number of collisions = N*Mu : 1364.24
Observed number of collisions : 1363
p-value of test : 0.51


Total number of cells containing j balls

j = 0 : 131940795334483
j = 1 : 599997274
j = 2 : 1363
j = 3 : 0
j = 4 : 0
j = 5 : 0


CPU time used : 00:05:34.12

Reply to this email directly or view it on GitHub.

Owner

StefanKarpinski commented Oct 19, 2012

We happen to have just gotten an 80-core research machine. This would be pretty easy to test on there (not to mention super fun :-).

Owner

StefanKarpinski commented Oct 19, 2012

I was also wondering if there's some statistically sound procedure for running p-value tests in a reliable way. As humans, we can look at a series of tests and if there's one failure every now and then, we know it's ok – we just run it again a few times and if it passes, it's ok. But is there a way to do that automatically so that we can have an automated RNG test suite that doesn't give spurious false failures?

Member

johnmyleswhite commented Oct 19, 2012

I think the test for uniformly distributed p-values is probably the best approach if we're going to use p-values. I think there are better ways, but will have to chime in later when I'm not distracted by a conference going around me.

I should note as a psychologist that we human beings are very untrustworthy in our assessment of how often a small or large p-value should be occurring.

Owner

timholy commented Oct 19, 2012

Best procedure might be to design the threshold according to how we want it to
behave. If it's a test suite that will run once per day, and you're aiming for
roughly one false positive per year, then I'm sure no one needs to tell you
that (at least under assumptions of independence) it should be very easy to
calculate what your threshold should be. Indeed, the answer is log(2)/(m*365),
where m is the number of independent p-value comparisons per day. For m=1,
approximately 0.0019. (The "2" comes from setting the chance of getting no
false positives as 50%.)

Owner

StefanKarpinski commented Oct 19, 2012

Our badness at judging these things is part of why having a completely automated way of deciding would be ideal. The trouble with the once-per-year approach is that it only applies to a single test – since we'll be running dozens or even hundreds of tests, we'd expect at least one of them to fail every few days, which is bound to get annoying.

Owner

timholy commented Oct 19, 2012

Plug in m > 1 to the above analysis (m is the number of tests/day). Of course,
if the tests aren't independent (and I presume that's likely), this will give
a threshold that is too strict. But anything truly broken with the RNGs might
give an off-the-charts p-value anyway.

Of course it's possible to analyze the non-independent case too, but that
starts to depend on particulars, and in some cases it is hard to make progress
without Monte Carlo methods. When you're testing your random number generator,
it might be unwise to rely on a random number generator to do your statistics
for you!

Member

andreasnoack commented Oct 20, 2012

First a disclaimer. I have no special knowledge about RGN testing. However, for what I have read, I do not think that we can do better that following the advices of Pierre l'Ecuyer and use the testu01 suite to evaluate the overall performance of Julia's random numbers. Also, I think I should have been more enthusiastic about the passing of BigCrush. It is really good.

To be more precise, I think we need to distinguish between tests for the quality of the rng implementations and test for bugs in the implementations. The former applies to @ViralBShah's modification of the ziggurat algorthm for normals. The question here is if the new implementation introduces some hard-to-detect non-randomness. I think the best thing here is to run BigCrush a single time. If one or more tests fail, rerun those tests to find out if the fail is just chance.

When testing for bugs in the implementation, I am thinking of e.g. the bug in the exponential randoms. They are fairly easy to detect and the tests of @johnmyleswhite should be sufficient here. As I see it, they could be run as part of the standard testing of Julia if we just set the seed before the calculation. Then they will not fail randomly. Ether never or always, depending on the seed, so we just search for a seed that makes all test pass. We could also run the SmallCrush but that is an out-of-Julia test.

Finally I shall comment on the use of p-values and the testu01 suite. The Crush batteries are tests for uniformly distributed variables and hence we should not test if the p-values of those tests are uniformly distributed. That would send us into an infinitely meta testing of p-values where we could again test the uniformity of the p-values of the test for uniformity of the p-values of the tests for uniformity.

Due to the extremely large number of randoms tested in the BigCrush test I think we to go as follows. The power of the tests is one and the size is correct. Hence, if the standard test levels of BugCrush and the rng is deficient, it will always fail at least one test. If the rng is good it will occasionally fail a test, but the same test will not fail for different seeds.

Member

johnmyleswhite commented Oct 20, 2012

I agree that we are not going to improve upon TestU01. That's a given in my book.

I think the issue is that we need to find ways to test all the RNG's we've created and it's not clear that TestU01 is going to be sufficient for that broader goal.

For normals, we came up with the trick of using the inverse CDF to create uniformly distributed variables. But for multidimensional distributions, that trick is not going to work. So we need other types of tests for things like multinomials and Dirichlet variables. If TestU01 has them, we should use those. If it doesn't, we should add our own tests to do things like confirm that first and second moments are not biased from their theoretical values.

At present we're not even doing that. I think the distance between our current location and that location is a single day's work. If I can make time, I will do that work today and we can be done with it.

Basically, we should have fallback tests of moments and other hypotheses when we can't use TestU01 and use TestU01 when it's applicable.

Owner

ViralBShah commented Oct 22, 2012

I am reasonably sure that we can fix the collision test failure by generating an independent random number for the selection of the step, rather than using the last 8 bits of the generated random number.

I would hate to generate a 32-bit int and use only 8 bits though. I guess the first step would be to try and see if applying this fix indeed makes it pass BigCrush cleanly, and then try to use the 32-bit int across 4 calls to randn().

-viral

On 20-Oct-2012, at 7:00 PM, John Myles White wrote:

I agree that we are not going to improve upon TestU01. That's a given in my book.

I think the issue is that we need to find ways to test all the RNG's we've created and it's not clear that TestU01 is going to be sufficient for that broader goal.

For normals, we came up with the trick of using the inverse CDF to create uniformly distributed variables. But for multidimensional distributions, that trick is not going to work. So we need other types of tests for things like multinomials and Dirichlet variables. If TestU01 has them, we should use those. If it doesn't, we should add our own tests to do things like confirm that first and second moments are not biased from their theoretical values.

At present we're not even doing that. I think the distance between our current location and that location is a single day's work. If I can make time, I will do that work today and we can be done with it.

Basically, we should have fallback tests of moments and other hypotheses when we can't use TestU01 and use TestU01 when it's applicable.


Reply to this email directly or view it on GitHub.

Owner

StefanKarpinski commented Oct 22, 2012

We shouldn't just be aiming to pass the test – if we know we're not generating theoretically enough entropy, then that itself is a problem. I'm not sure if that's the case because I'm not really following what we're doing clearly enough here.

Owner

ViralBShah commented Oct 22, 2012

There are two issues here - the collision test, and enough entropy for a double precision randn().

The original randmtzig code follows the original Marsaglia and Tsang paper that fails the collision test. Now that we revisited randn, we have discovered this flaw, and also note that it has been documented in later papers. For example, GSL and octave seem to use the original algorithm, and I would expect them to fail the test. Matlab, at least going by Cleve's corner article, should not have this problem - but needs to be tested. The next step is to address it, while still getting the highest possible performance, and having systematic tests that can give us the comfort that we have indeed addressed the problem.

The second issue is the precision. We get 52-bits of entropy from DSFMT. We are using 1 bit for the sign. Thus, we are now generating a double precision randn() using 51 bits (assuming that we will no longer use the 8 least significant bits for the selection of the ziggurat step). How do we prove this is really an issue? If it is an issue, how do we fix it, and do we accept the resulting loss in performance? In this case, the issue is really that of having good tests (whether TestU01, or some additional tests), so that we go about making all these decisions in a systematic and incremental manner.

Member

andreasnoack commented Oct 22, 2012

Is there an issue? The present implementation is not failing BigCrush. I think the BigCrush test has changed since Doornik wrote his paper, but not to become easier to pass. I have tried to run the exact Collision test that Doornik wrote about but Julia is not failing that either.

Owner

ViralBShah commented Oct 22, 2012

I thought there wasn't consensus in this thread about the interpretation of the p-value, if it is acceptable or not.

Member

andreasnoack commented Oct 22, 2012

Okay. I just thought that you had misread the test results.

Member

johnmyleswhite commented Oct 22, 2012

I think the p-values seem to be alright. It's just not clear to me how an algorithm we know has a published flaw can be passing the test it was described as failing. But I may have misunderstood something along the way.

Owner

StefanKarpinski commented Oct 22, 2012

Regarding the entropy issue (keeping in mind that I know very little about this matter), it seems to me like 52 bits of entropy shouldn't really be enough for a fully random Gaussian value because it must be sampled from the full range of floating-point values, albeit not uniformly. Maybe that's good enough for most purposes, but it doesn't really intuitively seem like enough entropy – it seems like you'd want all 64 bits and still do some rejection.

Member

johnmyleswhite commented Oct 22, 2012

I'm hesitant to trust intuitions about how much entropy is needed. If our Gaussian variables pass the tests in TestU01, I'd say that we're good.

In contrast to randn, I've found a bunch of little errors in the Distributions module that I'm slowly cleaning up. I should have it finished before the end of the week.

Owner

ViralBShah commented Oct 22, 2012

This is perhaps one of the reasons why we need tests for the normal distribution, over and above the tests that check for uniformity after transforming. I have also seen implementations that generate double precision randn() with a 32-bit random integer.

To sample from (-Inf,Inf), using more bits will always be better than using fewer. For N(0,1), we are in (-3,3) 99.7% of the time. It would seem that 52-bits from the U(-1,1) sampling + 8 bits for step selection should be pretty good, and even 52-bits with 8 least significant bits for step size should be ok. The issue is sampling from the tail, but in that case we are already generating a minimum of two random numbers from U(0,1), which should give adequate entropy for the tail.

Member

andreasnoack commented Oct 23, 2012

I ran the three first test cases of Table I in Doornik's paper this night. They all passed the test. Even the one with SHR3 RNG and the original ziggurat algorithm. I did not change anything in Doornik's code which is available at his homepage. I wonder if there has been a bug in TestU01 at that time or if the difference could be due to a 32 bit machine. I cannot see if Doornik ran the crush tests on his 32 og 64 bit machine but will try to run the test in 32 bit myself. The paper is not published so maybe others have had similar thoughts.

Member

andreasnoack commented Oct 23, 2012

I wrote to Richard Simard who maintains TestU01 and got a version of the suite of Spring 05 when Doornik did his simulations. It turned out that the tests included had changed which is the reason why I was unable to replicate Doornik's results. I found the exact Collision Test of Knuth that Doornik's code could not pass and was now able to replicate him. The good news is Julia's randn does not fail the same test even with the "erroneous" ziggurat implementation. I have included the Collision Test in rngtest.jl. The results are shown below

***********************************************************
Test sknuth_Collision calling smultin_Multinomial

***********************************************************
HOST = new-host-2.home, Darwin

RNOR_MWC8222


smultin_Multinomial test:
-----------------------------------------------
   N =  2,  n = 10000000,  r =  0,   d = 1073741824,   t =  1,
       Sparse =   TRUE

       GenerCell = smultin_GenerCellSerial
       Number of cells = d^t =         1073741824
       Expected number per cell =  1 /  107.37418
       EColl = n^2 / (2k) =  46566.12873
       Hashing =   TRUE

       Collision test,    Mu =      46421.899,    Sigma =      214.3

-----------------------------------------------
Test Results for Collisions

For the total number of collisions, we use
      the Poisson approximation:
Expected number of collisions = N*Mu  :    92843.80
Observed number of collisions         :   115104
p-value of test                       :   eps      *****


-----------------------------
Total number of cells containing j balls

  j =  0                              :       2127598752
  j =  1                              :         19770360
  j =  2                              :           113974
  j =  3                              :              556
  j =  4                              :                6
  j =  5                              :                0

-----------------------------------------------
CPU time used                    :  00:00:07.68

***********************************************************
Test sknuth_Collision calling smultin_Multinomial

***********************************************************
HOST = new-host-2.home, Darwin

randmtzig_randn


smultin_Multinomial test:
-----------------------------------------------
   N =  2,  n = 10000000,  r =  0,   d = 1073741824,   t =  1,
       Sparse =   TRUE

       GenerCell = smultin_GenerCellSerial
       Number of cells = d^t =         1073741824
       Expected number per cell =  1 /  107.37418
       EColl = n^2 / (2k) =  46566.12873
       Hashing =   TRUE

       Collision test,    Mu =      46421.899,    Sigma =      214.3

-----------------------------------------------
Test Results for Collisions

For the total number of collisions, we use
      the Poisson approximation:
Expected number of collisions = N*Mu  :    92843.80
Observed number of collisions         :    93118
p-value of test                       :    0.18


-----------------------------
Total number of cells containing j balls

  j =  0                              :       2127576766
  j =  1                              :         19814063
  j =  2                              :            92520
  j =  3                              :              299
  j =  4                              :                0
  j =  5                              :                0

-----------------------------------------------
CPU time used                    :  00:00:11.09
Owner

StefanKarpinski commented Oct 23, 2012

[Slow clap]. Outstanding effort and followthrough here. This is great. We need to get this all bundled up as a package people can install and run themselves.

Member

andreasnoack commented Oct 24, 2012

Is there a package up and running that I can get inspiration from?

Owner

StefanKarpinski commented Oct 24, 2012

So far there's this: https://github.com/JuliaLang/Example.jl. I got interrupted in the process of adding metadata for it here yesterday, but will work on that shortly...

Owner

StefanKarpinski commented Oct 24, 2012

In the meantime, just make a git repo with the code you run to make this work and we can morph it into the correct form easily enough later.

Member

andreasnoack commented Oct 24, 2012

Thank you. The git repo is there already: https://github.com/andreasnoackjensen/rngtest.jl

Owner

StefanKarpinski commented Oct 24, 2012

Ah, great. So it looks like this uses the C code that Julia uses directly? We probably ought to also do end-to-end testing from Julia code to make sure we're not dropping the ball somewhere along the line. Also that way we can avoid having to have compiled C code to test.

Owner

ViralBShah commented Oct 25, 2012

@andreasnoackjensen That is indeed fantastic follow up!

Owner

ViralBShah commented Oct 25, 2012

At some point, I would start wanting to have detailed documentation for each function, where all this can be documented, with references to papers and such. That would really drive confidence in quality, and make future change management easier.

Member

johnmyleswhite commented Oct 25, 2012

I can start to work up that documentation as I go about building a new version of the Distributions module.

One thing that R does very well is have an explicit academic references section in its function-level documentation. We should probably pick some standard for doing the same.

Owner

ViralBShah commented Oct 25, 2012

That would be superb. Perhaps Distributions can become the model for documentation.

Member

johnmyleswhite commented Oct 25, 2012

So where should we put the documentation? In a new folder in the main Julia repo?

Member

nolta commented Oct 25, 2012

What's wrong w/ adding it to the current documentation?

Member

johnmyleswhite commented Oct 25, 2012

Which current documentation? The manual?

I think Viral is hoping for something more structured, but perhaps I've misunderstood.

Member

nolta commented Oct 25, 2012

For now, let's just put it in doc/stdlib/distributions.rst.

Owner

ViralBShah commented Oct 25, 2012

Yes, I was thinking of something structured, but I think it could go into the current documentation, so long as we have well-defined sections for "Detailed Notes", "Stress Testing", "References", etc.

Member

johnmyleswhite commented Oct 25, 2012

Ok. I'll work on adding some documentation over the weekend using those sections.

@ViralBShah ViralBShah added the RNG label Nov 22, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment