Tests sorted according to random index instead of random hash #2070

loximann · 2020-10-23T09:59:27Z

Description

Use a random number instead of a hash to sort the tests.

Motivation

In v2.x, test cases with names differing only in the last character are almost always run one after the other. For example, for 4 tests called "a1", "a2", "b1" and "b2", only 8 combinations out of the possible 24 were observed in 1000 runs with different seeds.

horenmar · 2020-10-23T19:45:54Z

Something like this was one of my big reservations about the change.

However, uniform_int_distribution is problematic, as it is not reproducible across different platforms. At first glance, I also don't see how this change keeps the random order subset-invariant (this means that if a subset of all tests is selected, that subset will have the same relative order given same seed).

codecov · 2020-10-23T21:02:24Z

Codecov Report

Merging #2070 (d654b2c) into v2.x (8dd25b0) will increase coverage by 0.05%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             v2.x    #2070      +/-   ##
==========================================
+ Coverage   88.75%   88.79%   +0.05%     
==========================================
  Files         138      138              
  Lines        5651     5683      +32     
==========================================
+ Hits         5015     5046      +31     
- Misses        636      637       +1

loximann · 2020-10-26T10:36:47Z

About reproducibility I am not completely sure? std::minstd_rand is defined to have the same parameters always http://www.cplusplus.com/reference/random/minstd_rand/, so it should be reproducible. std::uniform_int_distribution is just a mapping from the output of the engine to an int, but sure, it would be safer to cast the output of the engine to a uint64_t.

In any case, you are absolutely right that subset invariance is not preserved. My bad.

So scrape that. I pushed a new proposal, which is kind of hacky, but it works preserving the current features. It basically adds always the same string to all test cases, in this way the spread of the hashes improves a lot.

horenmar · 2020-10-26T11:39:15Z

The transformation within uniform_int_distribution is unspecified, so every implementation does its own thing.

e: I will also need to investigate how come that your original PR passed tests, because there should be a test for the subset invariance.

loximann · 2020-10-26T11:45:11Z

Yeah, I was going to ask about that, because I ran the tests and I thought that this would be checked.

But thanks a lot for bothering to illustrate the issue with std::uniform_int_distribution! :-D Good to know.

loximann · 2020-10-26T14:47:35Z

Hm, how can I get the .clang-tidy file used by Travis? It would be faster to get that sorted locally.

If the approach I am suggesting makes sense, of course.

horenmar · 2020-10-27T21:57:52Z

Catch2 doesn't use clang-tidy. It is just a plain warning on specific version of Clang.

loximann · 2020-10-28T11:33:28Z

OK, addressed, it's building now.

Would it make sense to mention that clang should be used in contributing.md?

horenmar · 2020-10-29T13:32:02Z

Eh, the CI also uses GCC and MSVC, all of them across bunch of different versions.

Also I will have to look at a better way of doing this, the current proposal would have significant runtime overhead... I have 2 ideas right now

Actually use the generated bytes from RNG not to seed the hash, but instead to append them at the end.
Significantly shorter append string (also avoid having it as std::string and relying on optimizer to avoid the allocation if it can)

@jbytheway Ping.

loximann · 2020-10-29T14:01:50Z

Sure, I would also use a shorter string. 2^64/1099511628211 ~ 2^16, so if this estimate is any reasonable (which I don't know 😬) two characters should suffice. I did a quick test appending "abcd" and it seems to do the trick, the distribution of the 24 combinations seems reasonably uniform...

jbytheway · 2020-10-30T01:38:08Z

Yeah, I'd expect mixing in just a couple of additional characters ought to suffice. But since these are arbitrary extra mixing, it's probably better to use full uint64_t values rather than just chars. That is probably just as fast and ought to mix better? You might get away with adding just one xor-and-multiply?

loximann · 2020-10-30T12:18:14Z

That sounds better, yeah. Any suggestions for the value of the constant?

loximann · 2020-10-30T13:01:28Z

I played around with two large random numbers, and the spread was not great. I get better results with 4 numbers, even if they are not very large. I'll look into this and suggest a new proposal, unless someone has a clear idea :-D

loximann · 2020-10-30T17:26:25Z

OK, so here is a new proposal! The original authors suggest one recipe to improve dispersion: XOR the upper half of the hash with the lower half. This of course reduces the hash to 32 bits, but for the current purpose I guess that it is more than enough. This is discussed in the wikipedia article on the algorithm:
https://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function#Non-cryptographic_hash
However I was playing with it, and XOring didn't cut it. I tried multiplying, and that seems to work. As this is outside my area of expertise, I checked in with Fowler (the F in FVN-1a), let's see what he has to say...

horenmar · 2020-11-07T20:25:32Z

Okay, I think there is some perf budget that can be given to the hashing as an extra -- I am thinking 4 numbers drawn from the PRNG. Then there are 2 things to do

Add that extra "suffix" to the hashing
Find out why the initial PR was not failed by CI.

I am going to open a new issue for 2). @loximann Do you want to make that change, or should I do it?

loximann · 2020-11-08T13:32:20Z

Sorry, which change do you mean? I can investigate 2, and for 1 I was thinking that now that the request seems reasonable, I can do a more proper analysis of the different alternatives before deciding for one or another. How does this sound?

horenmar · 2020-11-08T20:22:17Z

Right, sorry. I meant the change in 1, and I would definitely like to see a more thorough analysis.

loximann · 2020-11-09T11:34:09Z

Good good, I'll get on with it! I'll post my results here at some point this week.

loximann · 2020-11-13T19:23:53Z

Alright, so this little thing turned out being much more fun than expected, and I ended up doing this unnecessarily deep analysis:
https://gitlab.com/loximann/testing-hashers-for-shuffling-strings

I tested a bunch of possibilities, and the one that worked the best was the original FNV-1a algorithm, fusing the seed as suffix, followed by multiplication folding (multiplying the lower 32 bits with the upper 32 bits). Speed-wise it looks OK, and I would expect that for up to ~100 000 test cases it should work perfectly fine. Even for more test cases, when hash collisions are sure to happen, I would wager that the impact should be negligible for the needs of randomization in Catch2.

I look forward to your comments, if there is anything else you think I should check, please let me know.

horenmar · 2020-11-13T20:45:38Z

@loximann That looks really nice, good work. 👍

On the code:

You have bit of a type confusion in there. Making the hasher return hash_t when it is typedefed as 64bit unsigned int and the implementation transforms the value into 32bit unsigned int causes warnings (rightfully), which is what fails the builds.

Some notes on the text:

The graphs are not shown inline, not sure why.
Catch2's microbenchmarking support would give you the error bars and other statistical measures 😉
I think it is interesting enough that you should link it somewhere
- I recommend giving it a quick look under a spellcheck first
- I would also note that std::default_random_engine is platform-dependent (if you were to run your tests on Windows, you would probably find it much slower).

loximann · 2020-11-13T20:54:28Z

Thank you! And thanks for the nice comments!

I'll definitely check out the microbenchmarking facilities, i had no idea they existed and they'll come in handy. Actually, I'm calculating uncertainties, but they are much smaller than the variability I observe from run to run, so that's why I decided to skip them.

And I'll fix the build... I thought I had sparkled enough static_casts around, but I guess I missed a spot.

Any recommendations other than default_random_engine? minstd_rnd?

loximann · 2020-11-16T10:43:16Z

OK, this looks promising. What is the policy for the final merge? Should I squash the commits? Anything else?

loximann · 2020-11-16T14:58:58Z

Hm, I am not quite sure what broke the last build attempt? The log only shows "An error occurred while generating the build script.".

horenmar · 2020-11-16T16:42:12Z

Just random TravisCI breakage. As for PR, the preference is for small atomic commit, which for this one should be 1 I guess.

horenmar · 2020-11-17T20:21:09Z

I also did some investigations into the (not)failing tests, and I've got nothing. If I intentionally break the ordering by doing plain shuffle, then it breaks.

Either the passing commit was lucky, and the seeds just happened to work, or there is some weirder edge case that is harder to reproduce.

loximann · 2020-11-18T12:18:19Z

Yay!

Could you point me to which test case is checking that? I could have a look.

horenmar · 2020-11-18T12:25:24Z

Catch2/projects/CMakeLists.txt

Lines 483 to 484 in dc7e705

    
           add_test(NAME RandomTestOrdering COMMAND ${PYTHON_EXECUTABLE} 
        
             ${CATCH_DIR}/projects/TestScripts/testRandomOrder.py $<TARGET_FILE:SelfTest>)

https://github.com/catchorg/Catch2/blob/v2.x/projects/TestScripts/testRandomOrder.py

loximann force-pushed the v2.x branch from 88261f4 to 3b917ec Compare October 26, 2020 10:34

horenmar added the Tweak request label Nov 7, 2020

loximann force-pushed the v2.x branch from 7b26e33 to 605a0ea Compare November 13, 2020 19:13

Modified hash to make it more independent of the choice of test names.

d654b2c

loximann force-pushed the v2.x branch from c0ec281 to d654b2c Compare November 17, 2020 09:17

horenmar merged commit 3ba7455 into catchorg:v2.x Nov 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests sorted according to random index instead of random hash #2070

Tests sorted according to random index instead of random hash #2070

loximann commented Oct 23, 2020

horenmar commented Oct 23, 2020

codecov bot commented Oct 23, 2020 •

edited

loximann commented Oct 26, 2020

horenmar commented Oct 26, 2020 •

edited

loximann commented Oct 26, 2020

loximann commented Oct 26, 2020

horenmar commented Oct 27, 2020

loximann commented Oct 28, 2020

horenmar commented Oct 29, 2020

loximann commented Oct 29, 2020

jbytheway commented Oct 30, 2020

loximann commented Oct 30, 2020

loximann commented Oct 30, 2020

loximann commented Oct 30, 2020

horenmar commented Nov 7, 2020

loximann commented Nov 8, 2020

horenmar commented Nov 8, 2020

loximann commented Nov 9, 2020

loximann commented Nov 13, 2020

horenmar commented Nov 13, 2020

loximann commented Nov 13, 2020

loximann commented Nov 16, 2020

loximann commented Nov 16, 2020

horenmar commented Nov 16, 2020

horenmar commented Nov 17, 2020

loximann commented Nov 18, 2020

horenmar commented Nov 18, 2020

Tests sorted according to random index instead of random hash #2070

Tests sorted according to random index instead of random hash #2070

Conversation

loximann commented Oct 23, 2020

Description

Motivation

horenmar commented Oct 23, 2020

codecov bot commented Oct 23, 2020 • edited

Codecov Report

loximann commented Oct 26, 2020

horenmar commented Oct 26, 2020 • edited

loximann commented Oct 26, 2020

loximann commented Oct 26, 2020

horenmar commented Oct 27, 2020

loximann commented Oct 28, 2020

horenmar commented Oct 29, 2020

loximann commented Oct 29, 2020

jbytheway commented Oct 30, 2020

loximann commented Oct 30, 2020

loximann commented Oct 30, 2020

loximann commented Oct 30, 2020

horenmar commented Nov 7, 2020

loximann commented Nov 8, 2020

horenmar commented Nov 8, 2020

loximann commented Nov 9, 2020

loximann commented Nov 13, 2020

horenmar commented Nov 13, 2020

loximann commented Nov 13, 2020

loximann commented Nov 16, 2020

loximann commented Nov 16, 2020

horenmar commented Nov 16, 2020

horenmar commented Nov 17, 2020

loximann commented Nov 18, 2020

horenmar commented Nov 18, 2020

codecov bot commented Oct 23, 2020 •

edited

horenmar commented Oct 26, 2020 •

edited