Improve counter handling for CBRNGs #2122

rstub · 2018-04-11T12:38:27Z

This is a partial fix for #2007. For all three backends, the following changes have been made:

Use the full 64 bits of the counter
Separate counter and key/seed
Check for carry when increasing counter

Note that the CUDA backend is completely untested. Both CPU and OpenCL backend now pass the test programs mentioned in #2007 as well as the random unit tests. I was not able to add tests like these as unit tests due to their excessive run-time.

Still open:

Philox could use 128 bit for counting
Philox returns 4 and Threefry returns 2 uint values per counter, but the counter is increased with the number of elements produced
Philox on CPU does not really count within the loop. Instead the output from the previous round is used.

* Use the full 64 bits of the counter * Separate counter and key/seed * Check for carry when increasing counter (only Threefry)

* Use the full 64 bits of the counter * Separate counter and key/seed * Check for carry when increasing counter * Use unused counter values for second threefry round

* Use the full 64 bits of the counter * Separate counter and key/seed * Check for carry when increasing counter

umar456 · 2018-04-11T20:35:14Z

This looks great. Could you add the tests you mention in #2007 but disable them so they don't execute. I want to be able to enable these tests at some point for infrequent builds so we can track the quality of the RNG in future changes. I don't have a mechanism to do that just yet but I will add that to future versions.

This test repeatedly draws 2^20 random numbers and compares them to the first set of numbers. This fails if a RNG has a period of n * 2^20 with integer n <= 2^12, i.e. in particular 2^32.

This test repeatedly draws random numbers and generates a histogram from them. It calculates the chi^2 statistic for the individual step as well as for the accumulated random numbers. The test fails if two consecutive statistics are too large or too small. Too large means that the random numbers are not uniform enough. Too small means that they are "too uniform", since some amount of random noise is expected. Additional notes: * One should also test randn, but that would increase the run-time even more. * The failure conditions are fragile. In principle one should accept a larger range of chi^2 values together with more steps, possibly with different initial seeds. But this would increase run-time even more. * As a consequence of the fragile conditions, false positives can occur. For example, MERSENNE with double on CPU failed early in some tests.

rstub · 2018-04-12T12:18:57Z

I have added two (disabled) unit tests, which can be used as simple safe guards. Both give failure with current ArrayFire and succeed with this PR (on OpenCL, I did not test CPU).

However, I think one should use more advanced tests. Unfortunately, these do not fit into the form of a unit test. I am currently using

#include <cstdio>
#include <cstdint>
#include <arrayfire.h>

int main(int argc, char ** argv) {
  int backend = argc > 1 ? atoi(argv[1]) : 0;
  af::setBackend(static_cast<af::Backend>(backend));
  int device = argc > 2 ? atoi(argv[2]) : 0;
  af::setDevice(device);
  
  af::setSeed(0xfe47fe0cc078ec30ULL);
  int samples = 1024 * 1024;
  while (1) {
    af::array values = af::randu(samples, u32);
    uint32_t *pvalues = values.host<uint32_t>();
    fwrite((void*) pvalues, samples * sizeof(*pvalues), 1, stdout);
    free(pvalues);
  }
}

In conjunction with PractRand as described in http://www.pcg-random.org/posts/how-to-test-with-practrand.html, i.e. the output of this program is piped into PractRand. For ArrayFire 3.5.1 this fails early:

$ ./arrayfire | ./RNG_test stdin32
RNG_test using PractRand version 0.93
[...]
rng=RNG_stdin32, seed=0xa6369564
length= 16 gigabytes (2^34 bytes), time= 209 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low8/32]BCFN(2+3,13-0,T)         R=  -7.5  p =1-7.4e-4   unusual          
  ...and 171 test result(s) without anomalies

rng=RNG_stdin32, seed=0xa6369564
length= 32 gigabytes (2^35 bytes), time= 411 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN(2+0,13-0,T)                  R= +86.6  p =  7.7e-46    FAIL !!!       
  BCFN(2+1,13-0,T)                  R= +92.4  p =  6.1e-49    FAIL !!!!      
  BCFN(2+2,13-0,T)                  R= +90.5  p =  6.5e-48    FAIL !!!!      
  BCFN(2+3,13-0,T)                  R= +90.6  p =  5.6e-48    FAIL !!!!      
  BCFN(2+4,13-0,T)                  R= +92.8  p =  3.8e-49    FAIL !!!!      
[...]

Using this PR I am meanwhile at 32 GB without failures. I should repeat this for Threefry ...

I have no idea how to do something like this in a unit test, but I am happy to submit this code into the repository in any form you see fit.

rstub · 2018-04-12T18:06:15Z

BTW, I stopped the Philox test at 256 GB = 2^36 generated uints. While I was away from the computer the Threefry run went even further to 1 TB = 2^38 generated uints. So at least the OpenCL implementation seems to be ok.

9prady9 · 2018-04-13T05:54:19Z

@rstub @umar456 I think it would nice to have these high runtime tests separate from normal rand tests - in a separate sources file: randu_quality.cpp & randn_quality.cpp. It would also be very straightforward to enable them on certain build jobs now or in the future.

If we add something like below to our tests CMake file, we can do this in our nightly jobs right now.

if(AF_RUN_QUALITY_TESTS)
  make_test(SRC randu_quality.cpp)
  make_test(SRC randn_quality.cpp)
endif()

AF_RUN_QUALITY_TESTS can be something like AF_ADDITIONAL_MKL_LIBRARIES variable. But I think it would be nice to have it as an option(cmake option).

rstub · 2018-04-13T12:27:59Z

Currently there are only tests for randu, so it would be only one file. There would be no need to disable these tests then, right?

umar456 · 2018-04-13T14:15:38Z

For now, I think they are fine the way they are. We can change it later to handle it differently. @rstub Could you add your advanced test file to the tests folder? You don't have to do anything to enable or compile it. I can handle that later.

umar456

Tested this for CUDA on the Quadro GV100 and the following tests fail:

[  FAILED  ] RandomEngine/0.threefryRandomEngineUniformChi2, where TypeParam = float
[  FAILED  ] RandomEngine/1.threefryRandomEngineUniformChi2, where TypeParam = double

The output looks like this:

[ RUN      ] RandomEngine/0.threefryRandomEngineUniformChi2
/home/umar/devel/arrayfire/test/random.cpp:507: Failure
Expected: (total_chi2) < (upper), actual: 175.956 vs 173.875
at step: 1
/home/umar/devel/arrayfire/test/random.cpp:507: Failure
Expected: (total_chi2) < (upper), actual: 191.671 vs 173.875
at step: 2
/home/umar/devel/arrayfire/test/random.cpp:507: Failure
Expected: (total_chi2) < (upper), actual: 190.273 vs 173.875
at step: 3
...

I understand this is a partial update and you don't have to fix these errors but I would like it if you updated the test assertions so they are more informative.

umar456 · 2018-04-13T15:09:09Z

test/random.cpp

+        array step_hist = af::histogram(af::randu(elem, ty, r), bins, 0.0, 1.0);
+        T step_chi2 = chi2_statistic<T>(step_hist, expected);
+        bool step = step_chi2 > lower && step_chi2 < upper;
+        ASSERT_TRUE(step || prev_step);


This would be better tested like this

EXPECT_GT(step_chi2, lower) << "at step: " << i; EXPECT_LT(step_chi2, upper) << "at step: " << i; bool step = step_chi2 > lower && step_chi2 < upper;

This will give you a better context if a test fails for example on my machine the Chi2 test fails like this with this change:

/home/umar/devel/arrayfire/test/random.cpp:507: Failure Expected: (total_chi2) < (upper), actual: 173.926 vs 173.875 at step: 1 /home/umar/devel/arrayfire/test/random.cpp:507: Failure Expected: (total_chi2) < (upper), actual: 182.937 vs 173.875 at step: 2 /home/umar/devel/arrayfire/test/random.cpp:507: Failure Expected: (total_chi2) < (upper), actual: 197.998 vs 173.875 at step: 3 ...

Replacing the assertions covering step and prev_step with assertions covering only step might produce false positives. When testing RNGs some failures are to be expected and it is the question where to draw the line. However, the output from these assertions is indeed much nicer. I see two possibilities:

Only test the EXPECT when prev_step is already false:

if (!prev_step) { EXPECT_GT(step_chi2, lower) << "at step: " << i; EXPECT_LT(step_chi2, upper) << "at step: " << i; }

Accept a larger range of chi² values and do more steps.

Any preferences? At the moment I would prefer the first option, since it does not require retesting the failure condition.

Yeah, I think the first option is the way to go.

rstub · 2018-04-13T16:40:43Z

Failing right at the first rounds is a clear regression. I think I have found the issue and will push this in a few minutes.

umar456

All tests pass on my machine with the CUDA backend. Thank you for your contribution! I am excited to have proper tests for the random number generation. I will try to enable them soon.

* Improve counter handling for CBRNGs (CPU backend) * Use the full 64 bits of the counter * Separate counter and key/seed * Check for carry when increasing counter (only Threefry) * Improve counter handling for CBRNGs (OpenCL backend) * Use the full 64 bits of the counter * Separate counter and key/seed * Check for carry when increasing counter * Use unused counter values for second threefry round * Improve counter handling for CBRNGs (CUDA backend) * Use the full 64 bits of the counter * Separate counter and key/seed * Check for carry when increasing counter * Add (disabled) Test for RNG period This test repeatedly draws 2^20 random numbers and compares them to the first set of numbers. This fails if a RNG has a period of n * 2^20 with integer n <= 2^12, i.e. in particular 2^32. * Add (disabled) test for RNG quality This test repeatedly draws random numbers and generates a histogram from them. It calculates the chi^2 statistic for the individual step as well as for the accumulated random numbers. The test fails if two consecutive statistics are too large or too small. Too large means that the random numbers are not uniform enough. Too small means that they are "too uniform", since some amount of random noise is expected. Additional notes: * One should also test randn, but that would increase the run-time even more. * The failure conditions are fragile. In principle one should accept a larger range of chi^2 values together with more steps, possibly with different initial seeds. But this would increase run-time even more. * As a consequence of the fragile conditions, false positives can occur. For example, MERSENNE with double on CPU failed early in some tests. * Add program to test RNGs with PractRand

rstub added 4 commits April 11, 2018 13:37

Improve counter handling for CBRNGs (CPU backend)

ce5057e

* Use the full 64 bits of the counter * Separate counter and key/seed * Check for carry when increasing counter (only Threefry)

Improve counter handling for CBRNGs (OpenCL backend)

dfccbb8

* Use the full 64 bits of the counter * Separate counter and key/seed * Check for carry when increasing counter * Use unused counter values for second threefry round

Improve counter handling for CBRNGs (CUDA backend)

b53053d

* Use the full 64 bits of the counter * Separate counter and key/seed * Check for carry when increasing counter

Fix typos in function signatures

bcbfe81

rstub added 2 commits April 12, 2018 13:46

Add (disabled) Test for RNG period

6d73062

This test repeatedly draws 2^20 random numbers and compares them to the first set of numbers. This fails if a RNG has a period of n * 2^20 with integer n <= 2^12, i.e. in particular 2^32.

Add program to test RNGs with PractRand

376b0c7

umar456 requested changes Apr 13, 2018

View reviewed changes

rstub added 2 commits April 13, 2018 18:44

Use unused counter values for second Threefry round (CUDA)

e27e698

Adjust assertions to provider more context

123dc7a

umar456 approved these changes Apr 13, 2018

View reviewed changes

umar456 merged commit b684a5b into arrayfire:master Apr 13, 2018

umar456 added the backport label Apr 19, 2018

mlloreda added this to the v3.6.0 milestone May 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve counter handling for CBRNGs #2122

Improve counter handling for CBRNGs #2122

Uh oh!

rstub commented Apr 11, 2018

Uh oh!

umar456 commented Apr 11, 2018

Uh oh!

rstub commented Apr 12, 2018

Uh oh!

rstub commented Apr 12, 2018

Uh oh!

9prady9 commented Apr 13, 2018

Uh oh!

rstub commented Apr 13, 2018

Uh oh!

umar456 commented Apr 13, 2018

Uh oh!

umar456 left a comment

Uh oh!

umar456 Apr 13, 2018

Uh oh!

rstub Apr 13, 2018

Uh oh!

umar456 Apr 13, 2018

Uh oh!

rstub commented Apr 13, 2018

Uh oh!

umar456 left a comment

Uh oh!

Uh oh!

Improve counter handling for CBRNGs #2122

Improve counter handling for CBRNGs #2122

Uh oh!

Conversation

rstub commented Apr 11, 2018

Uh oh!

umar456 commented Apr 11, 2018

Uh oh!

rstub commented Apr 12, 2018

Uh oh!

rstub commented Apr 12, 2018

Uh oh!

9prady9 commented Apr 13, 2018

Uh oh!

rstub commented Apr 13, 2018

Uh oh!

umar456 commented Apr 13, 2018

Uh oh!

umar456 left a comment

Choose a reason for hiding this comment

Uh oh!

umar456 Apr 13, 2018

Choose a reason for hiding this comment

Uh oh!

rstub Apr 13, 2018

Choose a reason for hiding this comment

Uh oh!

umar456 Apr 13, 2018

Choose a reason for hiding this comment

Uh oh!

rstub commented Apr 13, 2018

Uh oh!

umar456 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!