-
Notifications
You must be signed in to change notification settings - Fork 66
Testsuite consolidation #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #79 +/- ##
========================================
Coverage 96.46% 96.46%
========================================
Files 32 32
Lines 3109 3109
========================================
Hits 2999 2999
Misses 110 110
Continue to review full report at Codecov.
|
|
My first (and only) finding:
From my POV, we should keep the hash tests in the individual test programs for those methods, at least as long as there are implementations for them from third parties. If we remove the indivdual hash tests, without actually checking the computed output matches what we currently assume or believe to be the correct result, we do not have any direct indication whether some (possibly needed) change or update (typos may sneak in at any time) in the (gost-)yescrypt codebase changes the actual computed hash or not. |
besser82
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this, except for my remarks in #79 (comment).
7e2312b to
49fd0c2
Compare
|
Rebased on recent develop branch. |
Most of the hashing methods had their own test-crypt-xxx.c program that ran a short sequence of “known answer” tests: crypt(phrase, salt) = expected for fixed values of phrase, salt, and expected. The code involved was very repetitive, taken as a whole, and many of the programs were not very thorough. Consolidate all of these programs into a single program, test-crypt-kat.c (kat = known answer test); test all the hashing methods against the union of all the old programs’ input phrases; test all four supported crypt* APIs for each case; test that for each hash <- crypt(phrase, salt), hash == crypt(phrase, hash) as well. The known answers are generated from a table of combinations, using a Python program that uses an independent implementation of all the hashing methods (passlib <https://passlib.readthedocs.io>, forced to use its internal pure-Python reference implementations instead of C accelerators that may have too much code in common with libxcrypt’s implementations). This program is very slow, and passlib is not part of the Python standard library, and we don’t currently depend on Python during the build at all, so it is not run during a normal build. You have to run it by hand if you change it, and check in the output (test-crypt-kat.inc). passlib currently can’t calculate yescrypt or gost-yescrypt hashes, so we don’t have known answers to compare against for those, but we do still crank all of the passphrases through the algorithm and make sure the hash == crypt(phrase, hash) invariant holds for them. test-crypt-gost-yescrypt.c performs some extra, GY-specific tests as well as known-answer black box tests; that part of it is preserved. It is necessary to increase the timeout for running the test suite under valgrind on Travis, from 10 minutes to 60 minutes. This can’t be done in the documented manner because the “command” you’re supposed to use, travis_wait, is a bash function available in the parent script but not in our .travis_script.sh; it is necessary to replicate that logic in our script.
test-crypt-kat is the slowest test in the test suite and there’s no
reasonable way to reduce the amount of work it does, but we can apply
some coarse parallelism: test crypt_rn, crypt_ra, crypt_r, and crypt
in four separate threads. This also verifies that crypt_r{,a,n} don’t
use any global resources.
On my desktop computer test-crypt-kat goes from 55 to 21 seconds of
wall-clock time.
calc_hashes_recrypt depends on the data computed by
calc_hashes_crypt_rn; folding them together removes a non-obvious
ordering constraint from main.
After computing all of the hashes as it used to do, it loops through all of them and makes sure that no two hashes for different phrases are equal, except in known, expected cases (e.g. descrypt dropping the 8th bit and truncating the input).
49fd0c2 to
bd1aac9
Compare
|
Added |
|
Thanks. I have a plan for what to do about yescrypt but it’s going to take
me a couple more days to execute.
… |
These are calculated using ctypes to call crypt_ra in the just-built libcrypt.so. This does not test against an independent implementation, but it does test stability of hashes (the hash of a passphrase today is the same as it was some time ago), which is even more important. There is now also a Makefile target to regenerate test-crypt-kat.inc, since test-crypt-kat-gen.py now needs to be run from a build tree.
|
@besser82 Before merging, I want to ask again whether you think it is worthwhile to test that all four of |
Sounds good!
I think it is a good idea to check whether all of the crypt functions are acting the same. For completeness, we should include a consistency test for |
|
@zackw I'll add the cosistency test for |
Most of the hashing methods had their own test-crypt-xxx.c program that ran a short sequence of “known answer” tests:
crypt(phrase, salt) = expectedfor fixed values of phrase, salt, and expected. The code involved was very repetitive, taken as a whole, and many of the programs were not very thorough.This patchset consolidates all of those programs into a single one that is much more thorough. It may actually be too thorough - it takes 20 seconds to run, on my computer, and that's after I parallelized it. Under
valgrindit takes upward of 20 minutes to run, which necessitated some hacking around Travis's default timeouts. We could speed it up by dropping the checks thatcrypt,crypt_r,crypt_rn, andcrypt_raall produce the same output for the same input. If we look inside the black box, we know from the structure of the code that this will always be true for valid inputs (because the other three are wrappers aroundcrypt_rn). I am arguing with myself about whether it is safe to know that for testing purposes, and would appreciate a second opinion.Please also check my logic regarding the somewhat weaker treatment of (gost-)yescrypt than the other hashes (see the individual commit comments, and the notes about yescrypt in
test-crypt-kat-gen.py).