-
Notifications
You must be signed in to change notification settings - Fork 548
Improve counter handling for CBRNGs #2122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* Use the full 64 bits of the counter * Separate counter and key/seed * Check for carry when increasing counter (only Threefry)
* Use the full 64 bits of the counter * Separate counter and key/seed * Check for carry when increasing counter * Use unused counter values for second threefry round
* Use the full 64 bits of the counter * Separate counter and key/seed * Check for carry when increasing counter
This looks great. Could you add the tests you mention in #2007 but disable them so they don't execute. I want to be able to enable these tests at some point for infrequent builds so we can track the quality of the RNG in future changes. I don't have a mechanism to do that just yet but I will add that to future versions. |
This test repeatedly draws 2^20 random numbers and compares them to the first set of numbers. This fails if a RNG has a period of n * 2^20 with integer n <= 2^12, i.e. in particular 2^32.
This test repeatedly draws random numbers and generates a histogram from them. It calculates the chi^2 statistic for the individual step as well as for the accumulated random numbers. The test fails if two consecutive statistics are too large or too small. Too large means that the random numbers are not uniform enough. Too small means that they are "too uniform", since some amount of random noise is expected. Additional notes: * One should also test randn, but that would increase the run-time even more. * The failure conditions are fragile. In principle one should accept a larger range of chi^2 values together with more steps, possibly with different initial seeds. But this would increase run-time even more. * As a consequence of the fragile conditions, false positives can occur. For example, MERSENNE with double on CPU failed early in some tests.
I have added two (disabled) unit tests, which can be used as simple safe guards. Both give failure with current ArrayFire and succeed with this PR (on OpenCL, I did not test CPU). However, I think one should use more advanced tests. Unfortunately, these do not fit into the form of a unit test. I am currently using
In conjunction with PractRand as described in http://www.pcg-random.org/posts/how-to-test-with-practrand.html, i.e. the output of this program is piped into PractRand. For ArrayFire 3.5.1 this fails early:
Using this PR I am meanwhile at 32 GB without failures. I should repeat this for Threefry ... I have no idea how to do something like this in a unit test, but I am happy to submit this code into the repository in any form you see fit. |
BTW, I stopped the Philox test at 256 GB = 2^36 generated uints. While I was away from the computer the Threefry run went even further to 1 TB = 2^38 generated uints. So at least the OpenCL implementation seems to be ok. |
@rstub @umar456 I think it would nice to have these high runtime tests separate from normal rand tests - in a separate sources file: If we add something like below to our tests CMake file, we can do this in our nightly jobs right now. if(AF_RUN_QUALITY_TESTS)
make_test(SRC randu_quality.cpp)
make_test(SRC randn_quality.cpp)
endif()
|
Currently there are only tests for |
For now, I think they are fine the way they are. We can change it later to handle it differently. @rstub Could you add your advanced test file to the tests folder? You don't have to do anything to enable or compile it. I can handle that later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested this for CUDA on the Quadro GV100 and the following tests fail:
[ FAILED ] RandomEngine/0.threefryRandomEngineUniformChi2, where TypeParam = float
[ FAILED ] RandomEngine/1.threefryRandomEngineUniformChi2, where TypeParam = double
The output looks like this:
[ RUN ] RandomEngine/0.threefryRandomEngineUniformChi2
/home/umar/devel/arrayfire/test/random.cpp:507: Failure
Expected: (total_chi2) < (upper), actual: 175.956 vs 173.875
at step: 1
/home/umar/devel/arrayfire/test/random.cpp:507: Failure
Expected: (total_chi2) < (upper), actual: 191.671 vs 173.875
at step: 2
/home/umar/devel/arrayfire/test/random.cpp:507: Failure
Expected: (total_chi2) < (upper), actual: 190.273 vs 173.875
at step: 3
...
I understand this is a partial update and you don't have to fix these errors but I would like it if you updated the test assertions so they are more informative.
test/random.cpp
Outdated
array step_hist = af::histogram(af::randu(elem, ty, r), bins, 0.0, 1.0); | ||
T step_chi2 = chi2_statistic<T>(step_hist, expected); | ||
bool step = step_chi2 > lower && step_chi2 < upper; | ||
ASSERT_TRUE(step || prev_step); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be better tested like this
EXPECT_GT(step_chi2, lower) << "at step: " << i;
EXPECT_LT(step_chi2, upper) << "at step: " << i;
bool step = step_chi2 > lower && step_chi2 < upper;
This will give you a better context if a test fails for example on my machine the Chi2 test fails like this with this change:
/home/umar/devel/arrayfire/test/random.cpp:507: Failure
Expected: (total_chi2) < (upper), actual: 173.926 vs 173.875
at step: 1
/home/umar/devel/arrayfire/test/random.cpp:507: Failure
Expected: (total_chi2) < (upper), actual: 182.937 vs 173.875
at step: 2
/home/umar/devel/arrayfire/test/random.cpp:507: Failure
Expected: (total_chi2) < (upper), actual: 197.998 vs 173.875
at step: 3
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replacing the assertions covering step
and prev_step
with assertions covering only step
might produce false positives. When testing RNGs some failures are to be expected and it is the question where to draw the line. However, the output from these assertions is indeed much nicer. I see two possibilities:
-
Only test the
EXPECT
whenprev_step
is alreadyfalse
:if (!prev_step) { EXPECT_GT(step_chi2, lower) << "at step: " << i; EXPECT_LT(step_chi2, upper) << "at step: " << i; }
-
Accept a larger range of chi² values and do more steps.
Any preferences? At the moment I would prefer the first option, since it does not require retesting the failure condition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the first option is the way to go.
Failing right at the first rounds is a clear regression. I think I have found the issue and will push this in a few minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All tests pass on my machine with the CUDA backend. Thank you for your contribution! I am excited to have proper tests for the random number generation. I will try to enable them soon.
* Improve counter handling for CBRNGs (CPU backend) * Use the full 64 bits of the counter * Separate counter and key/seed * Check for carry when increasing counter (only Threefry) * Improve counter handling for CBRNGs (OpenCL backend) * Use the full 64 bits of the counter * Separate counter and key/seed * Check for carry when increasing counter * Use unused counter values for second threefry round * Improve counter handling for CBRNGs (CUDA backend) * Use the full 64 bits of the counter * Separate counter and key/seed * Check for carry when increasing counter * Add (disabled) Test for RNG period This test repeatedly draws 2^20 random numbers and compares them to the first set of numbers. This fails if a RNG has a period of n * 2^20 with integer n <= 2^12, i.e. in particular 2^32. * Add (disabled) test for RNG quality This test repeatedly draws random numbers and generates a histogram from them. It calculates the chi^2 statistic for the individual step as well as for the accumulated random numbers. The test fails if two consecutive statistics are too large or too small. Too large means that the random numbers are not uniform enough. Too small means that they are "too uniform", since some amount of random noise is expected. Additional notes: * One should also test randn, but that would increase the run-time even more. * The failure conditions are fragile. In principle one should accept a larger range of chi^2 values together with more steps, possibly with different initial seeds. But this would increase run-time even more. * As a consequence of the fragile conditions, false positives can occur. For example, MERSENNE with double on CPU failed early in some tests. * Add program to test RNGs with PractRand
This is a partial fix for #2007. For all three backends, the following changes have been made:
Note that the CUDA backend is completely untested. Both CPU and OpenCL backend now pass the test programs mentioned in #2007 as well as the
random
unit tests. I was not able to add tests like these as unit tests due to their excessive run-time.Still open:
uint
values per counter, but thecounter
is increased with the number ofelements
produced