Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

RFC: Performance updates for randn() to randmtzig.c. #1372

Closed
wants to merge 5 commits into from

8 participants

@ViralBShah
Owner

I have modified randmtzig.c to use dSFMT's double precision random numbers. It ends up using the randomness from the 52-bit Mantissa. The code is changed to reflect this slightly lower precision.

This makes individual calls to randn() about 25% faster, and avoids the use of dsfmt_genrand_uint32, as discussed in #1348.

I request for a thorough review of this code, to see if I have messed up anything. I have verified the mean() and std() to be correct, but it would be great if we can come up with a few more rigorous tests in test/rng.jl.

@ViralBShah ViralBShah Performance updates for randn() to randmtzig.c.
I have modified randmtzig.c to use dSFMT's double precision random
numbers. It ends up using the randomness from the 52-bit Mantissa.
The code is changed to reflect this slightly lower precision.

This makes individual calls to randn() about 25% faster.
a9656a1
@ViralBShah
Owner
# master
julia> gc();@time for i=1:100000000; randn(); end;
elapsed time: 0.9869461059570312 seconds

# vs/randn branch
julia> gc(); @time for i=1:100000000; randn(); end;
elapsed time: 0.7849278450012207 seconds
@StefanKarpinski

This looks good from a moderately casual inspection. This is kind of hard code to know if there's anything wrong with it just by looking through, I'm afraid.

@ViralBShah
Owner

Yes, it took me quite a while to figure it out, and I have tried to keep it as close to the original as possible. It would be easier to put some tests in place that test the rng.

@StefanKarpinski

RNG testing is super hard but we should definitely do that.

@johnmyleswhite
Collaborator

Here are my (poorly ordered) notes and code for using STS to test a Julia RNG:

https://github.com/johnmyleswhite/RNGTesting/blob/master/NOTES.md

I'm at a conference right now, so I can't help edit this code to test Viral's change, but it should be easy. You can use something like the src/generate_julia_bits.jl to create a text file with random bits. Then you can use STS as described to test those bits.

@ViralBShah
Owner

I did plot a histogram, and it does come out looking right.

@ViralBShah
Owner

@johnmyleswhite I thought the DieHard suite was for uniformly distributed random numbers. Would it work for normally distributed random numbers too?

@JeffBezanson

If we lose 1 bit why is this divided by 4? And is losing that bit really ok?

I was wondering that myself, but on the other hand it seems to me that selecting a random float between 0 and 1 only requires 52 bits.

But this algorithm steals 1 bit for the sign.

Right, that's why it's concerning. I completely naïvely feel like 53 bits shouldn't be enough entropy for random Gaussians given their greater range than random uniforms.

There must be a way to use the exponent bits, though of course by themselves they aren't distributed right.

Actually, it was originally 53-bit + 1-bit for the sign, which is why there was randi54(). Hence, the division by 4. I don't think losing those bits is a problem, because the algorithm generates 64-bits of randomness and trashes 10 bits. In fact, implementations are often 32-bit. Also, the bits for idx end up being used twice, which isn't the case in ziggurat.jl in test/perf2. The exponent bits do not have any randomness given that double precision numbers are generated in (0,1) or (1,2) by dsfmt.

@johnmyleswhite
Collaborator

@ViralBShah Sorry for the confusion. I had lost context and forgot that you needed a test suite for a Gaussian variable. I need to confirm the following claim by looking at Devroye, but I believe it should possible to map every output from randn() through the inverse CDF of the standard Gaussian distribution and then use the uniform variable tests to confirm that you're producing valid input. Since every distribution is uniquely determined by its CDF and the quantiles of any random stream should be uniformly distributed, this should work unless numerical precision issues in the calculation of the inverse CDF artificially corrupt the results. Will get back to you.

@johnmyleswhite
Collaborator

Devroye didn't have an obvious answer, but this document confirms that the inverse CDF method is a valid
way to test an RNG for Gaussian variables: http://www.cse.cuhk.edu.hk/~phwl/mt/public/archives/papers/grng_acmcs07.pdf

@pao
Collaborator

This seems reasonable, since the CDF is the naive method for generating random variates for a given distribution from a uniform [0,1) random variate. Since we trust the entropy source (dsfmt) and presumably trust our inverse CDF that should verify that we properly used that entropy to create normally-distributed variates.

@ViralBShah
Owner

From the paper:

It is shown that the Wallace method is the fastest, but can suffer from correlation problems; the Ziggurat method, the second in speed, is about 33% slower than the Wallace method but does not suffer from correlation problems. Thus, when maintaining extremely high statistical quality is the first priority, and subject to that constraint, speed is also desired, the Ziggurat method will often be the most appropriate choice. If the quality requirements are not so stringent but speed is essential then Wallace may be appropriate.

So, our choice of Ziggurat is the right choice.

@ViralBShah
Owner

It's worth noting that if I use clang with -O4 in the standalone mode, it is twice as fast as with -O3. Link time optimizations get used at -O4, which do not help when used as a shared library loaded by julia.

@ViralBShah
Owner

@johnmyleswhite Would it be sufficient to apply the function Phi(x) from this paper (referred from the one you posted) and then apply RNGTesting?

http://www.jstatsoft.org/v11/a05/paper

It would be great to also see what @dmbates has to say.

ViralBShah added some commits
@ViralBShah ViralBShah Performance updates for randn() to randmtzig.c.
I have modified randmtzig.c to use dSFMT's double precision random
numbers. It ends up using the randomness from the 52-bit Mantissa.
The code is changed to reflect this slightly lower precision.

This makes individual calls to randn() about 25% faster.

Use 64-bit ints for randmtzig fill functions.
Minor cleanups to randmtzig.c.

Minor changes to randmtzig.c for clarity.
acf88a9
@ViralBShah ViralBShah Merge branch 'vs/randn' of github.com:JuliaLang/julia into vs/randn 23bde72
@johnmyleswhite
Collaborator

I'd really like to have @dmbates input on this as well, but I would say that it will be sufficient to use Phi(x) from the Marsaglia 2004 paper to map into the uniform distribution for testing. Part of me suspects that using cPhi(x) for x near 1 might be important, but I worry that inconsistencies between the two implementations will cause the RNG to fail unfairly.

Looking back at RNGTesting, I realize that it has a bug in it that leads it to essentially only test whether the median is correctly placed or not. I'll revise it now to estimate things correctly for all the bits in a random number.

@ViralBShah
Owner

@johnmyleswhite Let me know when RNGTesting is ready. We should add Phi(x) and cPhi(x) from the Marsaglia 2004 paper to RNGTesting as well.

@ViralBShah
Owner

I did do some quick checks on the output of Phi(randn(x)), and the histogram does look ok.

@johnmyleswhite
Collaborator

I'm working on it now, but it should be usable for your purposes already since I'm changing the code to generate random bits rather than code to test those bits (since that code comes from STS directly). If you can dump out the output of Phi(randn(x)) as a long bitstring file, you can test that directly using STS.

I'm working on creating that bitstring file. I realized that I did something stupid the last time I wrote this, so now I'm using your approach and only working with the 52-bit mantissa.

@johnmyleswhite
Collaborator

I think I'm confused about how to access the raw bits of a Julian Float64. If I call bits(rand()), I should have the sign bit as the first bit, the exponent as the next 11 bits and then the mantissa, right? If that were correct, then rand() may be slightly corrupt because the last few bits are not distributed 50/50 as 0's and 1's. I think that may be required in order to force the results into the interval [0, 1), but it would be good to hear from others who are more familiar with details of floating point representations. Whatever the source of this discrepancy, we can only test a subset of the bits from calls to rand().

In order to use STS, we need to provide a bitstream in which every bit is a random 0/1 variable with probability 0.5 of coming up 1. This stream can be handled as ASCII (which is what I've done so far) or as binary in which every single bit is random. Either way, we cannot naively dump floating point numbers into a file that STS will read because so many of the bits are clearly not random. I am still trying to determine which bits are plausibly random.

@andreasnoack
Collaborator

Have you considered

http://www.iro.umontreal.ca/~simardr/testu01/

It takes bits and uniformly distributes floats and hence you only need pass the Phi(x) to the test battery. You can run the small battery for saved values, but I think the easiest way is to test randmtzig_randn() in pure C.

I am running the medium sized test right now on the old randn and it seems to work fine. Also, it looks like Marsaglias Psi that works for never happening outcomes is not really needed. My code is

#include "bbattery.h"
#include "dsfmt-2.2/dSFMT.c"
#include "randmtzig.c"

double pnorm5(double, double, double, int, int);
double randntorand(void)
{
    return pnorm5(randmtzig_randn(), 0.0, 1.0, 1, 0);
}

int main (void)
{
    dsfmt_gv_init_gen_rand(117);
    randmtzig_create_ziggurat_tables();
    unif01_Gen *gen;
    // swrite_Basic = FALSE;
    gen = unif01_CreateExternGen01 ("randmtzig_randn", randntorand);
    bbattery_Crush (gen);
    return 0;
}

and I guess that the test would fail if R's cdf was not precise enough. It takes some hours to run the medium test but I can try out the new implementation of randmtzig_randn() soon and report the result as soon as it arrives.

[pao: code formatting]

@johnmyleswhite
Collaborator

I'll try that. Last time I looked into it, I couldn't get it to build on my machine.

@ViralBShah
Owner
@johnmyleswhite
Collaborator

Try this code. On my machine, the last four bits are clearly corrupt.

all_bits = zeros(Int, 250_000, 64)

for i in 1:250_000
  all_bits[i, :] = map(b -> int(b), split(bits(rand()), ""))
end

print(join(mean(all_bits, 1), "\n"))
@andreasnoack
Collaborator

The old randmtzig_randn() seems to work. I'll try the new one now.

========= Summary results of Crush =========

Version: TestU01 1.2.3
Generator: randmtzig_randn
Number of statistics: 144
Total CPU time: 01:52:08.64

All tests were passed

@andreasnoack
Collaborator

The new one passed the small test. I'll continue with the medium sized.

========= Summary results of SmallCrush =========

Version: TestU01 1.2.3
Generator: randmtzig_randn
Number of statistics: 15
Total CPU time: 00:00:32.03

All tests were passed

@johnmyleswhite
Collaborator

Great. Thanks so much for doing this!

@ViralBShah
Owner
@andreasnoack
Collaborator

I'll try, but I have never written a shell script or a Makefile before, so I guess it will take some time.

@ViralBShah
Owner
@ViralBShah
Owner
@johnmyleswhite
Collaborator

Agreed. As long as we're passing the Crush tests, I suspect the "anomalies" we see are an artifact of how floats map into bit patterns.

@andreasnoack
Collaborator

Here are the files and some bullets explaining what I did.

https://github.com/andreasnoackjensen/rngtest.jl

@andreasnoack
Collaborator

More than 12 minutes faster and no tests failed.

========= Summary results of Crush =========

Version: TestU01 1.2.3
Generator: randmtzig_randn
Number of statistics: 144
Total CPU time: 01:39:39.27

All tests were passed

@ViralBShah
Owner
@ViralBShah ViralBShah closed this
@ViralBShah
Owner

Another noteworthy comment from the "Normal Random Number Generators" paper:

The Ziggurat method passes all tests in single-precision, but in double-precision, fails a single test, the Collisions test. This is because in the published version of the Ziggurat algorithm [Marsaglia and Tsang 2000] the same random value is used both to select the rectangle and to provide the sample, so there is a correlation between the low bits of the sample used to select the rectangle, and the absolute magnitude of the sample. Using an independent random number generator to select the rectangle fixes this minor problem. For example, the eight least significant bits of the floating point fraction can be replaced using bit-wise operations, requiring one extra 32-bit uniform random integer for every four generated Gaussian random samples.

@ViralBShah
Owner

According to the Cleve's Corner article, which is now 10 years old:

http://www.mathworks.in/company/newsletters/news_notes/clevescorner/spring01_cleve.html

MATLAB's built-in function randn uses two unsigned 32-bit integers, one for the seed in the uniform generator and one as the shift register for the integer generator. The code that processes points outside the core of the ziggurat accesses these two generators independently, so the period of the overall generator is. At 10 million normal deviates per second, randn will take over 58,000 years before it repeats.

@ViralBShah
Owner

Also worthwhile to see this paper in this context - An Improved Ziggurat Method to Generate Normal Random Samples

http://www.doornik.com/research/ziggurat.pdf

@johnmyleswhite
Collaborator

Did we pass the Collision test?

@ViralBShah
Owner
@johnmyleswhite
Collaborator

Any sense how much performance we would lose from employing an additional source of bits like in Doornik's paper?

@andreasnoack
Collaborator

A follow up on this one. The BigCrush is over and it failed one test, but only slightly and therefore I ran that specific test two times with new seeds without failure. Hence I think we can trust the normal random numbers in Julia as they are. Julia uses a different RNG than Doornik which might explain the different results.

========= Summary results of BigCrush =========

 Version:          TestU01 1.2.3
 Generator:        randmtzig_randn
 Number of statistics:  160
 Total CPU time:   18:37:07.44
 The following tests gave p-values outside [0.001, 0.9990]:
 (eps  means a value < 1.0e-300):
 (eps1 means a value < 1.0e-15):

       Test                          p-value
 ----------------------------------------------
  9  CollisionOver, t = 14           1.1e-4
 ----------------------------------------------
 All other tests were passed

***********************************************************
Test smarsa_CollisionOver calling smultin_MultinomialOver

***********************************************************
HOST = ubuntu, Linux

randmtzig_randn


smultin_MultinomialOver test:
-----------------------------------------------
   N = 30,  n = 20000000,  r =  0,   d =    8,   t = 14,
       Sparse =   TRUE

       GenerCell = smultin_GenerCellSerial
       Number of cells = d^t =      4398046511104
       Expected number per cell =  1 /  219902.33
       EColl = n^2 / (2k) =  45.47473509
       Hashing =   TRUE

       Collision test

       CollisionOver:   density = n / k =  1 /  219902.33
       Expected number of collisions = Mu =      45.47


-----------------------------------------------
Results of CollisionOver test:

POISSON approximation                 :
Expected number of collisions = N*Mu  :     1364.24
Observed number of collisions         :     1409
p-value of test                       :    0.12

-----------------------------
Total number of cells containing j balls

  j =  0                              :  131940795334529
  j =  1                              :        599997182
  j =  2                              :             1409
  j =  3                              :                0
  j =  4                              :                0
  j =  5                              :                0

-----------------------------------------------
CPU time used                    :  00:05:33.49


***********************************************************
Test smarsa_CollisionOver calling smultin_MultinomialOver

***********************************************************
HOST = ubuntu, Linux

randmtzig_randn


smultin_MultinomialOver test:
-----------------------------------------------
   N = 30,  n = 20000000,  r =  0,   d =    8,   t = 14,
       Sparse =   TRUE

       GenerCell = smultin_GenerCellSerial
       Number of cells = d^t =      4398046511104
       Expected number per cell =  1 /  219902.33
       EColl = n^2 / (2k) =  45.47473509
       Hashing =   TRUE

       Collision test

       CollisionOver:   density = n / k =  1 /  219902.33
       Expected number of collisions = Mu =      45.47


-----------------------------------------------
Results of CollisionOver test:

POISSON approximation                 :
Expected number of collisions = N*Mu  :     1364.24
Observed number of collisions         :     1363
p-value of test                       :    0.51

-----------------------------
Total number of cells containing j balls

  j =  0                              :  131940795334483
  j =  1                              :        599997274
  j =  2                              :             1363
  j =  3                              :                0
  j =  4                              :                0
  j =  5                              :                0

-----------------------------------------------
CPU time used                    :  00:05:34.12
@johnmyleswhite
Collaborator
@StefanKarpinski

We happen to have just gotten an 80-core research machine. This would be pretty easy to test on there (not to mention super fun :-).

@StefanKarpinski

I was also wondering if there's some statistically sound procedure for running p-value tests in a reliable way. As humans, we can look at a series of tests and if there's one failure every now and then, we know it's ok – we just run it again a few times and if it passes, it's ok. But is there a way to do that automatically so that we can have an automated RNG test suite that doesn't give spurious false failures?

@johnmyleswhite
Collaborator

I think the test for uniformly distributed p-values is probably the best approach if we're going to use p-values. I think there are better ways, but will have to chime in later when I'm not distracted by a conference going around me.

I should note as a psychologist that we human beings are very untrustworthy in our assessment of how often a small or large p-value should be occurring.

@timholy
Collaborator
@StefanKarpinski

Our badness at judging these things is part of why having a completely automated way of deciding would be ideal. The trouble with the once-per-year approach is that it only applies to a single test – since we'll be running dozens or even hundreds of tests, we'd expect at least one of them to fail every few days, which is bound to get annoying.

@timholy
Collaborator
@andreasnoack
Collaborator

First a disclaimer. I have no special knowledge about RGN testing. However, for what I have read, I do not think that we can do better that following the advices of Pierre l'Ecuyer and use the testu01 suite to evaluate the overall performance of Julia's random numbers. Also, I think I should have been more enthusiastic about the passing of BigCrush. It is really good.

To be more precise, I think we need to distinguish between tests for the quality of the rng implementations and test for bugs in the implementations. The former applies to @ViralBShah's modification of the ziggurat algorthm for normals. The question here is if the new implementation introduces some hard-to-detect non-randomness. I think the best thing here is to run BigCrush a single time. If one or more tests fail, rerun those tests to find out if the fail is just chance.

When testing for bugs in the implementation, I am thinking of e.g. the bug in the exponential randoms. They are fairly easy to detect and the tests of @johnmyleswhite should be sufficient here. As I see it, they could be run as part of the standard testing of Julia if we just set the seed before the calculation. Then they will not fail randomly. Ether never or always, depending on the seed, so we just search for a seed that makes all test pass. We could also run the SmallCrush but that is an out-of-Julia test.

Finally I shall comment on the use of p-values and the testu01 suite. The Crush batteries are tests for uniformly distributed variables and hence we should not test if the p-values of those tests are uniformly distributed. That would send us into an infinitely meta testing of p-values where we could again test the uniformity of the p-values of the test for uniformity of the p-values of the tests for uniformity.

Due to the extremely large number of randoms tested in the BigCrush test I think we to go as follows. The power of the tests is one and the size is correct. Hence, if the standard test levels of BugCrush and the rng is deficient, it will always fail at least one test. If the rng is good it will occasionally fail a test, but the same test will not fail for different seeds.

@johnmyleswhite
Collaborator

I agree that we are not going to improve upon TestU01. That's a given in my book.

I think the issue is that we need to find ways to test all the RNG's we've created and it's not clear that TestU01 is going to be sufficient for that broader goal.

For normals, we came up with the trick of using the inverse CDF to create uniformly distributed variables. But for multidimensional distributions, that trick is not going to work. So we need other types of tests for things like multinomials and Dirichlet variables. If TestU01 has them, we should use those. If it doesn't, we should add our own tests to do things like confirm that first and second moments are not biased from their theoretical values.

At present we're not even doing that. I think the distance between our current location and that location is a single day's work. If I can make time, I will do that work today and we can be done with it.

Basically, we should have fallback tests of moments and other hypotheses when we can't use TestU01 and use TestU01 when it's applicable.

@ViralBShah
Owner
@StefanKarpinski

We shouldn't just be aiming to pass the test – if we know we're not generating theoretically enough entropy, then that itself is a problem. I'm not sure if that's the case because I'm not really following what we're doing clearly enough here.

@ViralBShah
Owner

There are two issues here - the collision test, and enough entropy for a double precision randn().

The original randmtzig code follows the original Marsaglia and Tsang paper that fails the collision test. Now that we revisited randn, we have discovered this flaw, and also note that it has been documented in later papers. For example, GSL and octave seem to use the original algorithm, and I would expect them to fail the test. Matlab, at least going by Cleve's corner article, should not have this problem - but needs to be tested. The next step is to address it, while still getting the highest possible performance, and having systematic tests that can give us the comfort that we have indeed addressed the problem.

The second issue is the precision. We get 52-bits of entropy from DSFMT. We are using 1 bit for the sign. Thus, we are now generating a double precision randn() using 51 bits (assuming that we will no longer use the 8 least significant bits for the selection of the ziggurat step). How do we prove this is really an issue? If it is an issue, how do we fix it, and do we accept the resulting loss in performance? In this case, the issue is really that of having good tests (whether TestU01, or some additional tests), so that we go about making all these decisions in a systematic and incremental manner.

@andreasnoack
Collaborator

Is there an issue? The present implementation is not failing BigCrush. I think the BigCrush test has changed since Doornik wrote his paper, but not to become easier to pass. I have tried to run the exact Collision test that Doornik wrote about but Julia is not failing that either.

@ViralBShah
Owner

I thought there wasn't consensus in this thread about the interpretation of the p-value, if it is acceptable or not.

@andreasnoack
Collaborator

Okay. I just thought that you had misread the test results.

@johnmyleswhite
Collaborator

I think the p-values seem to be alright. It's just not clear to me how an algorithm we know has a published flaw can be passing the test it was described as failing. But I may have misunderstood something along the way.

@StefanKarpinski

Regarding the entropy issue (keeping in mind that I know very little about this matter), it seems to me like 52 bits of entropy shouldn't really be enough for a fully random Gaussian value because it must be sampled from the full range of floating-point values, albeit not uniformly. Maybe that's good enough for most purposes, but it doesn't really intuitively seem like enough entropy – it seems like you'd want all 64 bits and still do some rejection.

@johnmyleswhite
Collaborator

I'm hesitant to trust intuitions about how much entropy is needed. If our Gaussian variables pass the tests in TestU01, I'd say that we're good.

In contrast to randn, I've found a bunch of little errors in the Distributions module that I'm slowly cleaning up. I should have it finished before the end of the week.

@ViralBShah
Owner

This is perhaps one of the reasons why we need tests for the normal distribution, over and above the tests that check for uniformity after transforming. I have also seen implementations that generate double precision randn() with a 32-bit random integer.

To sample from (-Inf,Inf), using more bits will always be better than using fewer. For N(0,1), we are in (-3,3) 99.7% of the time. It would seem that 52-bits from the U(-1,1) sampling + 8 bits for step selection should be pretty good, and even 52-bits with 8 least significant bits for step size should be ok. The issue is sampling from the tail, but in that case we are already generating a minimum of two random numbers from U(0,1), which should give adequate entropy for the tail.

@andreasnoack
Collaborator

I ran the three first test cases of Table I in Doornik's paper this night. They all passed the test. Even the one with SHR3 RNG and the original ziggurat algorithm. I did not change anything in Doornik's code which is available at his homepage. I wonder if there has been a bug in TestU01 at that time or if the difference could be due to a 32 bit machine. I cannot see if Doornik ran the crush tests on his 32 og 64 bit machine but will try to run the test in 32 bit myself. The paper is not published so maybe others have had similar thoughts.

@andreasnoack
Collaborator

I wrote to Richard Simard who maintains TestU01 and got a version of the suite of Spring 05 when Doornik did his simulations. It turned out that the tests included had changed which is the reason why I was unable to replicate Doornik's results. I found the exact Collision Test of Knuth that Doornik's code could not pass and was now able to replicate him. The good news is Julia's randn does not fail the same test even with the "erroneous" ziggurat implementation. I have included the Collision Test in rngtest.jl. The results are shown below

***********************************************************
Test sknuth_Collision calling smultin_Multinomial

***********************************************************
HOST = new-host-2.home, Darwin

RNOR_MWC8222


smultin_Multinomial test:
-----------------------------------------------
   N =  2,  n = 10000000,  r =  0,   d = 1073741824,   t =  1,
       Sparse =   TRUE

       GenerCell = smultin_GenerCellSerial
       Number of cells = d^t =         1073741824
       Expected number per cell =  1 /  107.37418
       EColl = n^2 / (2k) =  46566.12873
       Hashing =   TRUE

       Collision test,    Mu =      46421.899,    Sigma =      214.3

-----------------------------------------------
Test Results for Collisions

For the total number of collisions, we use
      the Poisson approximation:
Expected number of collisions = N*Mu  :    92843.80
Observed number of collisions         :   115104
p-value of test                       :   eps      *****


-----------------------------
Total number of cells containing j balls

  j =  0                              :       2127598752
  j =  1                              :         19770360
  j =  2                              :           113974
  j =  3                              :              556
  j =  4                              :                6
  j =  5                              :                0

-----------------------------------------------
CPU time used                    :  00:00:07.68

***********************************************************
Test sknuth_Collision calling smultin_Multinomial

***********************************************************
HOST = new-host-2.home, Darwin

randmtzig_randn


smultin_Multinomial test:
-----------------------------------------------
   N =  2,  n = 10000000,  r =  0,   d = 1073741824,   t =  1,
       Sparse =   TRUE

       GenerCell = smultin_GenerCellSerial
       Number of cells = d^t =         1073741824
       Expected number per cell =  1 /  107.37418
       EColl = n^2 / (2k) =  46566.12873
       Hashing =   TRUE

       Collision test,    Mu =      46421.899,    Sigma =      214.3

-----------------------------------------------
Test Results for Collisions

For the total number of collisions, we use
      the Poisson approximation:
Expected number of collisions = N*Mu  :    92843.80
Observed number of collisions         :    93118
p-value of test                       :    0.18


-----------------------------
Total number of cells containing j balls

  j =  0                              :       2127576766
  j =  1                              :         19814063
  j =  2                              :            92520
  j =  3                              :              299
  j =  4                              :                0
  j =  5                              :                0

-----------------------------------------------
CPU time used                    :  00:00:11.09
@StefanKarpinski

[Slow clap]. Outstanding effort and followthrough here. This is great. We need to get this all bundled up as a package people can install and run themselves.

@andreasnoack
Collaborator

Is there a package up and running that I can get inspiration from?

@StefanKarpinski

So far there's this: https://github.com/JuliaLang/Example.jl. I got interrupted in the process of adding metadata for it here yesterday, but will work on that shortly...

@StefanKarpinski

In the meantime, just make a git repo with the code you run to make this work and we can morph it into the correct form easily enough later.

@andreasnoack
Collaborator

Thank you. The git repo is there already: https://github.com/andreasnoackjensen/rngtest.jl

@StefanKarpinski

Ah, great. So it looks like this uses the C code that Julia uses directly? We probably ought to also do end-to-end testing from Julia code to make sure we're not dropping the ball somewhere along the line. Also that way we can avoid having to have compiled C code to test.

@ViralBShah
Owner

@andreasnoackjensen That is indeed fantastic follow up!

@ViralBShah
Owner

At some point, I would start wanting to have detailed documentation for each function, where all this can be documented, with references to papers and such. That would really drive confidence in quality, and make future change management easier.

@johnmyleswhite
Collaborator

I can start to work up that documentation as I go about building a new version of the Distributions module.

One thing that R does very well is have an explicit academic references section in its function-level documentation. We should probably pick some standard for doing the same.

@ViralBShah
Owner

That would be superb. Perhaps Distributions can become the model for documentation.

@johnmyleswhite
Collaborator

So where should we put the documentation? In a new folder in the main Julia repo?

@nolta
Collaborator

What's wrong w/ adding it to the current documentation?

@johnmyleswhite
Collaborator

Which current documentation? The manual?

I think Viral is hoping for something more structured, but perhaps I've misunderstood.

@nolta
Collaborator

For now, let's just put it in doc/stdlib/distributions.rst.

@ViralBShah
Owner

Yes, I was thinking of something structured, but I think it could go into the current documentation, so long as we have well-defined sections for "Detailed Notes", "Stress Testing", "References", etc.

@johnmyleswhite
Collaborator

Ok. I'll work on adding some documentation over the weekend using those sections.

@MichaelHatherly MichaelHatherly referenced this pull request from a commit
@ViralBShah ViralBShah Update randmtzig.c to use dsfmt's double precision RNG.
Earlier, randmtzig used the int32 RNG from dsfmt, which is
not the preferred way to use dsfmt. 52-bits of randomness
are now used from the mantissa of the doubles generated by
dsfmt. This is two fewer bits than the original code, but
much faster.

This passes the Testu01 crush testsuite, when the output of
randn() is mapped through the inverse CDF into the uniform
distribution. The testsuite is at
https://github.com/andreasnoackjensen/rngtest.jl

Detailed discussion in #1372
Close #1348.
c6089ee
@ViralBShah ViralBShah added the RNG label
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Oct 12, 2012
  1. @ViralBShah

    Performance updates for randn() to randmtzig.c.

    ViralBShah authored
    I have modified randmtzig.c to use dSFMT's double precision random
    numbers. It ends up using the randomness from the 52-bit Mantissa.
    The code is changed to reflect this slightly lower precision.
    
    This makes individual calls to randn() about 25% faster.
  2. @ViralBShah

    Use 64-bit ints for randmtzig fill functions.

    ViralBShah authored
    Minor cleanups to randmtzig.c.
Commits on Oct 14, 2012
  1. @ViralBShah
  2. @ViralBShah

    Performance updates for randn() to randmtzig.c.

    ViralBShah authored
    I have modified randmtzig.c to use dSFMT's double precision random
    numbers. It ends up using the randomness from the 52-bit Mantissa.
    The code is changed to reflect this slightly lower precision.
    
    This makes individual calls to randn() about 25% faster.
    
    Use 64-bit ints for randmtzig fill functions.
    Minor cleanups to randmtzig.c.
    
    Minor changes to randmtzig.c for clarity.
  3. @ViralBShah
This page is out of date. Refresh to see the latest.
Showing with 106 additions and 91 deletions.
  1. +2 −2 base/librandom.jl
  2. +104 −89 deps/random/randmtzig.c
View
4 base/librandom.jl
@@ -173,7 +173,7 @@ end
function randmtzig_fill_randn!(A)
ccall(dlsym(Base.librandom, :randmtzig_fill_randn),
Void,
- (Ptr{Float64}, Uint32),
+ (Ptr{Float64}, Int),
A, numel(A))
return A
end
@@ -187,7 +187,7 @@ end
function randmtzig_fill_exprnd!(A)
ccall(dlsym(Base.librandom, :randmtzig_fill_exprnd),
Void,
- (Ptr{Float64}, Uint32),
+ (Ptr{Float64}, Int),
A, numel(A))
return A
end
View
193 deps/random/randmtzig.c
@@ -1,4 +1,13 @@
/*
+ Parts Copyright (C) 2012 Viral B. Shah
+ All rights reserved.
+
+ Modifications made for julia to support dsfmt and only __LP64__ systems.
+ 52-bits of randomness are used from the mantissa of random double precision
+ numbers generated by dsfmt.
+ */
+
+/*
A C-program for MT19937, with initialization improved 2002/2/10.
Coded by Takuji Nishimura and Makoto Matsumoto.
This is a faster version by taking Shawn Cokus's optimization,
@@ -49,7 +58,15 @@
#include <time.h>
#include <sys/time.h>
-typedef int randmtzig_idx_type;
+#ifdef STANDALONE
+#include <stdlib.h>
+#include <string.h>
+
+#define DSFMT_DO_NOT_USE_OLD_NAMES
+#include "dsfmt-2.2/dSFMT.c"
+#endif
+
+typedef long randmtzig_idx_type;
typedef signed char randmtzig_int8_t;
typedef unsigned char randmtzig_uint8_t;
typedef short randmtzig_int16_t;
@@ -61,59 +78,33 @@ typedef unsigned long long randmtzig_uint64_t;
/* Declarations */
-void randmtzig_create_ziggurat_tables (void);
-double randmtzig_randn (void);
-void randmtzig_fill_randn (double *p, randmtzig_idx_type n);
-double randmtzig_exprnd (void);
-void randmtzig_fill_exprnd (double *p, randmtzig_idx_type n);
+extern void randmtzig_create_ziggurat_tables (void);
+extern double randmtzig_randn (void);
+extern void randmtzig_fill_randn (double *p, randmtzig_idx_type n);
+extern double randmtzig_exprnd (void);
+extern void randmtzig_fill_exprnd (double *p, randmtzig_idx_type n);
/* ===== Uniform generators ===== */
-inline static randmtzig_uint64_t randi53 (void)
+inline static randmtzig_uint64_t randi (void)
{
- const randmtzig_uint32_t lo = dsfmt_gv_genrand_uint32();
- const randmtzig_uint32_t hi = dsfmt_gv_genrand_uint32()&0x1FFFFF;
-//#ifndef __LP64__
-#if 0
- randmtzig_uint64_t u;
- randmtzig_uint32_t *p = (randmtzig_uint32_t *)&u;
- p[0] = lo;
- p[1] = hi;
- return u;
-#else
- return (((randmtzig_uint64_t)hi<<32)|lo);
-#endif
-}
-
-inline static randmtzig_uint64_t randi54 (void)
-{
- const randmtzig_uint32_t lo = dsfmt_gv_genrand_uint32();
- const randmtzig_uint32_t hi = dsfmt_gv_genrand_uint32()&0x3FFFFF;
-//#ifndef __LP64__
-#if 0
- randmtzig_uint64_t u;
- randmtzig_uint32_t *p = (randmtzig_uint32_t *)&u;
- p[0] = lo;
- p[1] = hi;
- return u;
-#else
- return (((randmtzig_uint64_t)hi<<32)|lo);
-#endif
+ double r = dsfmt_gv_genrand_close1_open2();
+ return *((uint64_t *) &r) & 0x000fffffffffffff;
}
/* generates a random number on (0,1) with 53-bit resolution */
-inline static double randu53 (void)
+inline static double randu (void)
{
return dsfmt_gv_genrand_open_open();
}
/* ===== Ziggurat normal and exponential generators ===== */
# define ZIGINT randmtzig_uint64_t
-# define EMANTISSA 9007199254740992.0 /* 53 bit mantissa */
-# define ERANDI randi53() /* 53 bits for mantissa */
+# define EMANTISSA 2251799813685248 /* 52 bit mantissa */
+# define ERANDI randi() /* 52 bits for mantissa */
# define NMANTISSA EMANTISSA
-# define NRANDI randi54() /* 53 bits for mantissa + 1 bit sign */
-# define RANDU randu53()
+# define NRANDI randi() /* 51 bits for mantissa + 1 bit sign */
+# define RANDU randu()
#define ZIGGURAT_TABLE_SIZE 256
@@ -164,9 +155,12 @@ reason is just the use of the Mersenne Twister and not the inlining,
so I'm not going to try and optimize further.
*/
+// Tables for randn
static ZIGINT ki[ZIGGURAT_TABLE_SIZE];
-static ZIGINT ke[ZIGGURAT_TABLE_SIZE];
static double wi[ZIGGURAT_TABLE_SIZE], fi[ZIGGURAT_TABLE_SIZE];
+
+// Tables for exprnd
+static ZIGINT ke[ZIGGURAT_TABLE_SIZE];
static double we[ZIGGURAT_TABLE_SIZE], fe[ZIGGURAT_TABLE_SIZE];
void randmtzig_create_ziggurat_tables (void)
@@ -249,53 +243,36 @@ double randmtzig_randn (void)
{
while (1)
{
-//#ifdef __LP64__
-#if 1
/* arbitrary mantissa (selected by NRANDI, with 1 bit for sign) */
- const randmtzig_uint64_t r = NRANDI;
- const randmtzig_int64_t rabs=r>>1;
- const int idx = (int)(rabs&0xFF);
- const double x = ( r&1 ? -rabs : rabs) * wi[idx];
-#else
- double x;
- int si,idx;
- register randmtzig_uint32_t lo, hi;
- randmtzig_int64_t rabs;
- randmtzig_uint32_t *p = (randmtzig_uint32_t *)&rabs;
- lo = dsfmt_gv_genrand_uint32();
- idx = lo&0xFF;
- hi = dsfmt_gv_genrand_uint32();
- si = hi&UMASK;
- p[0] = lo;
- p[1] = hi&0x1FFFFF;
- x = ( si ? -rabs : rabs ) * wi[idx];
-# endif
-
- if (rabs < (randmtzig_int64_t)ki[idx])
- return x; /* 99.3% of the time we return here 1st try */
- else if (idx == 0)
- {
- /* As stated in Marsaglia and Tsang
- *
- * For the normal tail, the method of Marsaglia[5] provides:
- * generate x = -ln(U_1)/r, y = -ln(U_2), until y+y > x*x,
- * then return r+x. Except that r+x is always in the positive
- * tail!!!! Any thing random might be used to determine the
- * sign, but as we already have r we might as well use it
- *
- * [PAK] but not the bottom 8 bits, since they are all 0 here!
- */
- double xx, yy;
- do
- {
- xx = - ZIGGURAT_NOR_INV_R * log (RANDU);
- yy = - log (RANDU);
+ const randmtzig_uint64_t r = NRANDI;
+ const randmtzig_int64_t rabs=r>>1;
+ const int idx = (int)(rabs&0xFF);
+ const double x = ( r&1 ? -rabs : rabs) * wi[idx];
+
+ if (rabs < (randmtzig_int64_t)ki[idx]) {
+ return x; /* 99.3% of the time we return here 1st try */
+ } else if (idx == 0) {
+ /* As stated in Marsaglia and Tsang
+ *
+ * For the normal tail, the method of Marsaglia[5] provides:
+ * generate x = -ln(U_1)/r, y = -ln(U_2), until y+y > x*x,
+ * then return r+x. Except that r+x is always in the positive
+ * tail!!!! Any thing random might be used to determine the
+ * sign, but as we already have r we might as well use it
+ *
+ * [PAK] but not the bottom 8 bits, since they are all 0 here!
+ */
+ double xx, yy;
+ do {
+ xx = - ZIGGURAT_NOR_INV_R * log (RANDU);
+ yy = - log (RANDU);
}
- while ( yy+yy <= xx*xx);
- return (rabs&0x100 ? -ZIGGURAT_NOR_R-xx : ZIGGURAT_NOR_R+xx);
+ while ( yy+yy <= xx*xx);
+ return (rabs&0x100 ? -ZIGGURAT_NOR_R-xx : ZIGGURAT_NOR_R+xx);
+ } else if ((fi[idx-1] - fi[idx]) * RANDU + fi[idx] < exp(-0.5*x*x)) {
+ return x;
}
- else if ((fi[idx-1] - fi[idx]) * RANDU + fi[idx] < exp(-0.5*x*x))
- return x;
+
}
}
@@ -325,14 +302,52 @@ double randmtzig_exprnd (void)
void randmtzig_fill_randn (double *p, randmtzig_idx_type n)
{
randmtzig_idx_type i;
- for (i = 0; i < n; i++)
- p[i] = randmtzig_randn();
+ for (i = 0; i < n; i++) p[i] = randmtzig_randn();
}
void randmtzig_fill_exprnd (double *p, randmtzig_idx_type n)
{
randmtzig_idx_type i;
- for (i = 0; i < n; i++)
- p[i] = randmtzig_exprnd();
+ for (i = 0; i < n; i++) p[i] = randmtzig_exprnd();
}
+
+#ifdef STANDALONE
+
+int main(int ac, char *av[]) {
+ if (ac == 1) {
+ printf("Usage: randmtzig <n>\n");
+ return (-1);
+ }
+
+ int n = atoi(av[1]);
+ time_t t1;
+
+ dsfmt_gv_init_gen_rand(0);
+ randmtzig_create_ziggurat_tables();
+
+ double *p; posix_memalign((void **)&p, 16, n*sizeof(double));
+ uint32_t *u; posix_memalign((void **)&u, 16, 2*n*sizeof(uint32_t));
+
+ t1 = clock();
+ dsfmt_gv_fill_array_close_open(p, n);
+ printf("Uniform fill (n): %f\n", (clock() - t1) / (double) CLOCKS_PER_SEC);
+
+ t1 = clock();
+ for (int i = 0; i < n; i++) p[i] = dsfmt_gv_genrand_close_open();
+ printf("Uniform (n): %f\n", (clock() - t1) / (double) CLOCKS_PER_SEC);
+
+ t1 = clock();
+ for (int i = 0; i < 2*n; i++) u[i] = dsfmt_gv_genrand_uint32();
+ printf("Uniform 32-bit ints (2*n): %f\n", (clock() - t1) / (double) CLOCKS_PER_SEC);
+
+ memset((void *)p, 0, n*sizeof(double));
+ t1 = clock();
+ for (int i = 0; i < n; i++) p[i] = randmtzig_randn();
+ printf("Normal (n): %f\n", (clock() - t1) / (double) CLOCKS_PER_SEC);
+ for (int i = 0; i < 10; i++) printf("%lf\n", p[i]);
+
+ return 0;
+}
+
+#endif
Something went wrong with that request. Please try again.