Skip to content

Conversation

rstub
Copy link
Contributor

@rstub rstub commented Apr 11, 2018

This is a partial fix for #2007. For all three backends, the following changes have been made:

  • Use the full 64 bits of the counter
  • Separate counter and key/seed
  • Check for carry when increasing counter

Note that the CUDA backend is completely untested. Both CPU and OpenCL backend now pass the test programs mentioned in #2007 as well as the random unit tests. I was not able to add tests like these as unit tests due to their excessive run-time.

Still open:

  • Philox could use 128 bit for counting
  • Philox returns 4 and Threefry returns 2 uint values per counter, but the counter is increased with the number of elements produced
  • Philox on CPU does not really count within the loop. Instead the output from the previous round is used.

rstub added 4 commits April 11, 2018 13:37
* Use the full 64 bits of the counter
* Separate counter and key/seed
* Check for carry when increasing counter (only Threefry)
* Use the full 64 bits of the counter
* Separate counter and key/seed
* Check for carry when increasing counter
* Use unused counter values for second threefry round
* Use the full 64 bits of the counter
* Separate counter and key/seed
* Check for carry when increasing counter
@umar456
Copy link
Member

umar456 commented Apr 11, 2018

This looks great. Could you add the tests you mention in #2007 but disable them so they don't execute. I want to be able to enable these tests at some point for infrequent builds so we can track the quality of the RNG in future changes. I don't have a mechanism to do that just yet but I will add that to future versions.

rstub added 2 commits April 12, 2018 13:46
This test repeatedly draws 2^20 random numbers and compares them to the
first set of numbers. This fails if a RNG has a period of n * 2^20 with
integer n <= 2^12, i.e. in particular 2^32.
This test repeatedly draws random numbers and generates a histogram from
them. It calculates the chi^2 statistic for the individual step as well
as for the accumulated random numbers. The test fails if two consecutive
statistics are too large or too small. Too large means that the random
numbers are not uniform enough. Too small means that they are "too
uniform", since some amount of random noise is expected.

Additional notes:
* One should also test randn, but that would increase the run-time even
  more.
* The failure conditions are fragile. In principle one should accept a
  larger range of chi^2 values together with more steps, possibly with
  different initial seeds. But this would increase run-time even more.
* As a consequence of the fragile conditions, false positives can occur.
  For example, MERSENNE with double on CPU failed early in some tests.
@rstub
Copy link
Contributor Author

rstub commented Apr 12, 2018

I have added two (disabled) unit tests, which can be used as simple safe guards. Both give failure with current ArrayFire and succeed with this PR (on OpenCL, I did not test CPU).

However, I think one should use more advanced tests. Unfortunately, these do not fit into the form of a unit test. I am currently using

#include <cstdio>
#include <cstdint>
#include <arrayfire.h>

int main(int argc, char ** argv) {
  int backend = argc > 1 ? atoi(argv[1]) : 0;
  af::setBackend(static_cast<af::Backend>(backend));
  int device = argc > 2 ? atoi(argv[2]) : 0;
  af::setDevice(device);
  
  af::setSeed(0xfe47fe0cc078ec30ULL);
  int samples = 1024 * 1024;
  while (1) {
    af::array values = af::randu(samples, u32);
    uint32_t *pvalues = values.host<uint32_t>();
    fwrite((void*) pvalues, samples * sizeof(*pvalues), 1, stdout);
    free(pvalues);
  }
}

In conjunction with PractRand as described in http://www.pcg-random.org/posts/how-to-test-with-practrand.html, i.e. the output of this program is piped into PractRand. For ArrayFire 3.5.1 this fails early:

$ ./arrayfire | ./RNG_test stdin32
RNG_test using PractRand version 0.93
[...]
rng=RNG_stdin32, seed=0xa6369564
length= 16 gigabytes (2^34 bytes), time= 209 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low8/32]BCFN(2+3,13-0,T)         R=  -7.5  p =1-7.4e-4   unusual          
  ...and 171 test result(s) without anomalies

rng=RNG_stdin32, seed=0xa6369564
length= 32 gigabytes (2^35 bytes), time= 411 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN(2+0,13-0,T)                  R= +86.6  p =  7.7e-46    FAIL !!!       
  BCFN(2+1,13-0,T)                  R= +92.4  p =  6.1e-49    FAIL !!!!      
  BCFN(2+2,13-0,T)                  R= +90.5  p =  6.5e-48    FAIL !!!!      
  BCFN(2+3,13-0,T)                  R= +90.6  p =  5.6e-48    FAIL !!!!      
  BCFN(2+4,13-0,T)                  R= +92.8  p =  3.8e-49    FAIL !!!!      
[...]

Using this PR I am meanwhile at 32 GB without failures. I should repeat this for Threefry ...

I have no idea how to do something like this in a unit test, but I am happy to submit this code into the repository in any form you see fit.

@rstub
Copy link
Contributor Author

rstub commented Apr 12, 2018

BTW, I stopped the Philox test at 256 GB = 2^36 generated uints. While I was away from the computer the Threefry run went even further to 1 TB = 2^38 generated uints. So at least the OpenCL implementation seems to be ok.

@9prady9
Copy link
Member

9prady9 commented Apr 13, 2018

@rstub @umar456 I think it would nice to have these high runtime tests separate from normal rand tests - in a separate sources file: randu_quality.cpp & randn_quality.cpp. It would also be very straightforward to enable them on certain build jobs now or in the future.

If we add something like below to our tests CMake file, we can do this in our nightly jobs right now.

if(AF_RUN_QUALITY_TESTS)
  make_test(SRC randu_quality.cpp)
  make_test(SRC randn_quality.cpp)
endif()

AF_RUN_QUALITY_TESTS can be something like AF_ADDITIONAL_MKL_LIBRARIES variable. But I think it would be nice to have it as an option(cmake option).

@rstub
Copy link
Contributor Author

rstub commented Apr 13, 2018

Currently there are only tests for randu, so it would be only one file. There would be no need to disable these tests then, right?

@umar456
Copy link
Member

umar456 commented Apr 13, 2018

For now, I think they are fine the way they are. We can change it later to handle it differently. @rstub Could you add your advanced test file to the tests folder? You don't have to do anything to enable or compile it. I can handle that later.

Copy link
Member

@umar456 umar456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this for CUDA on the Quadro GV100 and the following tests fail:

[  FAILED  ] RandomEngine/0.threefryRandomEngineUniformChi2, where TypeParam = float
[  FAILED  ] RandomEngine/1.threefryRandomEngineUniformChi2, where TypeParam = double

The output looks like this:

[ RUN      ] RandomEngine/0.threefryRandomEngineUniformChi2
/home/umar/devel/arrayfire/test/random.cpp:507: Failure
Expected: (total_chi2) < (upper), actual: 175.956 vs 173.875
at step: 1
/home/umar/devel/arrayfire/test/random.cpp:507: Failure
Expected: (total_chi2) < (upper), actual: 191.671 vs 173.875
at step: 2
/home/umar/devel/arrayfire/test/random.cpp:507: Failure
Expected: (total_chi2) < (upper), actual: 190.273 vs 173.875
at step: 3
...

I understand this is a partial update and you don't have to fix these errors but I would like it if you updated the test assertions so they are more informative.

test/random.cpp Outdated
array step_hist = af::histogram(af::randu(elem, ty, r), bins, 0.0, 1.0);
T step_chi2 = chi2_statistic<T>(step_hist, expected);
bool step = step_chi2 > lower && step_chi2 < upper;
ASSERT_TRUE(step || prev_step);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better tested like this

        EXPECT_GT(step_chi2, lower) << "at step: " << i;
        EXPECT_LT(step_chi2, upper) << "at step: " << i;
        bool step = step_chi2 > lower && step_chi2 < upper;

This will give you a better context if a test fails for example on my machine the Chi2 test fails like this with this change:


/home/umar/devel/arrayfire/test/random.cpp:507: Failure
Expected: (total_chi2) < (upper), actual: 173.926 vs 173.875
at step: 1
/home/umar/devel/arrayfire/test/random.cpp:507: Failure
Expected: (total_chi2) < (upper), actual: 182.937 vs 173.875
at step: 2
/home/umar/devel/arrayfire/test/random.cpp:507: Failure
Expected: (total_chi2) < (upper), actual: 197.998 vs 173.875
at step: 3
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replacing the assertions covering step and prev_step with assertions covering only step might produce false positives. When testing RNGs some failures are to be expected and it is the question where to draw the line. However, the output from these assertions is indeed much nicer. I see two possibilities:

  • Only test the EXPECT when prev_step is already false:

      if (!prev_step) {      
          EXPECT_GT(step_chi2, lower) << "at step: " << i;
          EXPECT_LT(step_chi2, upper) << "at step: " << i;
      }
    
  • Accept a larger range of chi² values and do more steps.

Any preferences? At the moment I would prefer the first option, since it does not require retesting the failure condition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the first option is the way to go.

@rstub
Copy link
Contributor Author

rstub commented Apr 13, 2018

Failing right at the first rounds is a clear regression. I think I have found the issue and will push this in a few minutes.

Copy link
Member

@umar456 umar456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All tests pass on my machine with the CUDA backend. Thank you for your contribution! I am excited to have proper tests for the random number generation. I will try to enable them soon.

@umar456 umar456 merged commit b684a5b into arrayfire:master Apr 13, 2018
@mlloreda mlloreda added this to the v3.6.0 milestone May 2, 2018
syurkevi pushed a commit to syurkevi/arrayfire that referenced this pull request Jul 26, 2018
* Improve counter handling for CBRNGs (CPU backend)

* Use the full 64 bits of the counter
* Separate counter and key/seed
* Check for carry when increasing counter (only Threefry)

* Improve counter handling for CBRNGs (OpenCL backend)

* Use the full 64 bits of the counter
* Separate counter and key/seed
* Check for carry when increasing counter
* Use unused counter values for second threefry round

* Improve counter handling for CBRNGs (CUDA backend)

* Use the full 64 bits of the counter
* Separate counter and key/seed
* Check for carry when increasing counter

* Add (disabled) Test for RNG period

This test repeatedly draws 2^20 random numbers and compares them to the
first set of numbers. This fails if a RNG has a period of n * 2^20 with
integer n <= 2^12, i.e. in particular 2^32.

* Add (disabled) test for RNG quality

This test repeatedly draws random numbers and generates a histogram from
them. It calculates the chi^2 statistic for the individual step as well
as for the accumulated random numbers. The test fails if two consecutive
statistics are too large or too small. Too large means that the random
numbers are not uniform enough. Too small means that they are "too
uniform", since some amount of random noise is expected.

Additional notes:
* One should also test randn, but that would increase the run-time even
  more.
* The failure conditions are fragile. In principle one should accept a
  larger range of chi^2 values together with more steps, possibly with
  different initial seeds. But this would increase run-time even more.
* As a consequence of the fragile conditions, false positives can occur.
  For example, MERSENNE with double on CPU failed early in some tests.

* Add program to test RNGs with PractRand
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants