Change random test shuffling technique #1908

jbytheway · 2020-04-10T00:41:45Z

Description

Previously a random test ordering was obtained by applying std::shuffle to the tests in declaration order. This has two problems:

It depends on the declaration order, so the order in which the tests will be run will be platform-specific.
When trying to debug accidental inter-test dependencies, it is helpful to be able to find a minimal subset of tests which exhibits the issue. However, any change to the set of tests being run will completely change the test ordering, making it difficult or impossible to reduce the set of tests being run in any reasonably efficient manner.

Therefore, change the randomization approach to resolve both these issues.

Generate a random value based on the user-provided RNG seed. Convert every test case to an integer by hashing a combination of that value with the test name. Sort the test cases by this integer.

The test names and RNG are platform-independent, so this should be consistent across platforms. Also, removing one test does not change the integer value associated with the remaining tests, so they remain in the same order.

To hash, use the FNV-1a hash, except with the basis being our randomly selected value rather than the fixed basis set in the algorithm. Cannot use std::hash, because it is important that the result be platform-independent.

jbytheway · 2020-04-10T14:12:17Z

After reading some of the commit history in this area (specifically 535da5c) I realise that this is not entirely platform independent yet, because std::uniform_int_distribution may vary. But at least the linker no longer affects it; it's a step in the right direction.

codecov · 2020-04-10T15:19:42Z

Codecov Report

Merging #1908 into master will increase coverage by 0.33%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1908      +/-   ##
==========================================
+ Coverage   86.49%   86.82%   +0.33%     
==========================================
  Files         131      131              
  Lines        3900     3922      +22     
==========================================
+ Hits         3373     3405      +32     
+ Misses        527      517      -10

jbytheway · 2020-04-10T16:48:07Z

I'd be happy to add a test for this, but after a brief glance at your testing infrastructure it wasn't clear to me where a test of command-line parameter like this belongs.

include/internal/catch_test_case_registry_impl.cpp

horenmar · 2020-04-10T19:36:09Z

I like the general idea, but I will have to think about the approach a bit.

I have two misgivings right now:

calculating two hashes on every comparison can prove slow when used for codebase with significant number of test cases
I am not sure how to check how much randomness there will be in the order of the tests.

The first one should be easy enough to fix -- we can compute each hash once, sort those and then project the order back to test cases -- or we can benchmark say 100k test cases and see what it does.

The second one is obviously a bit harder and I will have to think about it first.

As to tests, generally everything possible should be tested via CTest. I do not see how to easily test this specific feature using just the facilities provided by CTest, so it will likely require a Python script that performs the actual test, and reports the result back.

The way I see it, the test should work something like

Invoke the SelfTest executable with --list-tests --order rand --rng-seed <magic-value>, record the order of listed tests
Invoke the SelfTest executable again, but limit the listing to a subset of tests, e.g. using [approx] and [matchers] tags. Check that the listed tests are in the same relative order as in step 1.
Do this once again, with just a single tag, and check once again.

The script will also need some sanity checks to avoid false negatives. The ones I can think of on top of my head are that the number of tests returned in step 1 should more than step 2, and step 2 should return more tests than step 3, and that there should always be at least 2 returned tests.

jbytheway · 2020-04-11T01:05:23Z

I've added a test and rewritten the sorting code to only compute O(n) hashes rather than O(n log(n)).

Regarding your other concern: I see no reason to believe there will be any less randomness than there was with std::shuffle. They're both derived from the same seed. The limited entropy in that seed will be the limiting factor (currently that's just 32 bits).

Should I move struct HashTest into an internal namespace of some kind?

horenmar · 2020-04-11T16:17:40Z

My misgiving with this approach is that std::shuffle guarantees that every permutation of the input is equally likely (at least given true URBG and properly implemented uniform distribution), while I do not know the statistical properties of randomly-seeded FNV1a 😃

That is why I want to play with it a bit first -- generate some random seeds, check if there are suspicious results.

On a related note, to ensure stability properly, the code needs to handle hash collisions. I suggest just using test case name as the fallback comparison if hashes happen to be the same.

include/internal/catch_test_case_registry_impl.cpp

Previously a random test ordering was obtained by applying std::shuffle to the tests in declaration order. This has two problems: - It depends on the declaration order, so the order in which the tests will be run will be platform-specific. - When trying to debug accidental inter-test dependencies, it is helpful to be able to find a minimal subset of tests which exhibits the issue. However, any change to the set of tests being run will completely change the test ordering, making it difficult or impossible to reduce the set of tests being run in any reasonably efficient manner. Therefore, change the randomization approach to resolve both these issues. Generate a random value based on the user-provided RNG seed. Convert every test case to an integer by hashing a combination of that value with the test name. Sort the test cases by this integer. The test names and RNG are platform-independent, so this should be consistent across platforms. Also, removing one test does not change the integer value associated with the remaining tests, so they remain in the same order. To hash, use the FNV-1a hash, except with the basis being our randomly selected value rather than the fixed basis set in the algorithm. Cannot use std::hash, because it is important that the result be platform-independent.

jbytheway · 2020-04-12T01:20:50Z

My misgiving with this approach is that std::shuffle guarantees that every permutation of the input is equally likely (at least given true URBG and properly implemented uniform distribution), while I do not know the statistical properties of randomly-seeded FNV1a smiley

That is why I want to play with it a bit first -- generate some random seeds, check if there are suspicious results.

FWIW, I have added a test which verifies that the seeds from 1 to 100 generate different orderings. Obviously that's not proof of much, but it does at least demonstrate that the distribution isn't utterly awful.

On a related note, to ensure stability properly, the code needs to handle hash collisions. I suggest just using test case name as the fallback comparison if hashes happen to be the same.

Done.

horenmar · 2020-04-14T10:53:41Z

As I said before, thanks for this -- the idea is quite useful. I am going to do a little further refactoring later, but it is nothing that stops this from being merged.

--- Improvements --- * Running tests in random order (`--order rand`) has been reworked significantly (#1908) * Given same seed, all platforms now produce the same order * Given same seed, the relative order of tests does not change if you select only a subset of them * Vector matchers support custom allocators (#1909) * `|` and `&` (bitwise or and bitwise and) are now supported in `CHECK` and `REQUIRE` * The resulting type must be convertible to `bool` --- Fixes --- * Fixed computation of benchmarking column widths in ConsoleReporter (#1885, #1886) * Suppressed clang-tidy's `cppcoreguidelines-pro-type-vararg` in assertions (#1901) * It was a false positive trigered by the new warning support workaround * Fixed bug in test specification parser handling of OR'd patterns using escaping (#1905) --- Miscellaneous --- * Worked around IBM XL's codegen bug (#1907) * It would emit code for _destructors_ of temporaries in an unevaluated context * Improved detection of stdlib's support for `std::uncaught_exceptions` (#1911)

This workaround was added by #27165 The need for it was removed by catchorg/Catch2#1908 and the import of the new code.

jbytheway force-pushed the use_more_helpful_random_test_ordering branch from c7cd0d5 to 293ad88 Compare April 10, 2020 14:03

jbytheway mentioned this pull request Apr 10, 2020

Use more platform-independent random test ordering CleverRaven/Cataclysm-DDA#39441

Merged

horenmar reviewed Apr 10, 2020

View reviewed changes

include/internal/catch_test_case_registry_impl.cpp Show resolved Hide resolved

horenmar added the Tweak label Apr 10, 2020

jbytheway force-pushed the use_more_helpful_random_test_ordering branch from 293ad88 to 2e7eb0b Compare April 11, 2020 00:58

jbytheway force-pushed the use_more_helpful_random_test_ordering branch 3 times, most recently from 08515cc to 4991e0e Compare April 11, 2020 15:05

jbytheway force-pushed the use_more_helpful_random_test_ordering branch from 4991e0e to 4dd47c4 Compare April 11, 2020 17:15

horenmar reviewed Apr 11, 2020

View reviewed changes

include/internal/catch_test_case_registry_impl.cpp Outdated Show resolved Hide resolved

jbytheway added 2 commits April 11, 2020 21:18

Add test for consistent random ordering

f5d0139

jbytheway force-pushed the use_more_helpful_random_test_ordering branch from 4dd47c4 to f5d0139 Compare April 12, 2020 01:18

horenmar merged commit da9e3ee into catchorg:master Apr 14, 2020

jbytheway deleted the use_more_helpful_random_test_ordering branch April 14, 2020 12:43

kevingranade added a commit to CleverRaven/Cataclysm-DDA that referenced this pull request Sep 11, 2020

Remove obsolete workaround for stdlib shuffle bug

8184f48

This workaround was added by #27165 The need for it was removed by catchorg/Catch2#1908 and the import of the new code.

kevingranade mentioned this pull request Sep 11, 2020

Remove obsolete workaround for stdlib shuffle bug CleverRaven/Cataclysm-DDA#44121

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change random test shuffling technique #1908

Change random test shuffling technique #1908

jbytheway commented Apr 10, 2020

jbytheway commented Apr 10, 2020

codecov bot commented Apr 10, 2020 •

edited

jbytheway commented Apr 10, 2020

horenmar commented Apr 10, 2020

jbytheway commented Apr 11, 2020 •

edited

horenmar commented Apr 11, 2020

jbytheway commented Apr 12, 2020

horenmar commented Apr 14, 2020

Change random test shuffling technique #1908

Change random test shuffling technique #1908

Conversation

jbytheway commented Apr 10, 2020

Description

jbytheway commented Apr 10, 2020

codecov bot commented Apr 10, 2020 • edited

Codecov Report

jbytheway commented Apr 10, 2020

horenmar commented Apr 10, 2020

jbytheway commented Apr 11, 2020 • edited

horenmar commented Apr 11, 2020

jbytheway commented Apr 12, 2020

horenmar commented Apr 14, 2020

codecov bot commented Apr 10, 2020 •

edited

jbytheway commented Apr 11, 2020 •

edited