Testsuite consolidation #79

zackw · 2019-02-01T23:50:45Z

Most of the hashing methods had their own test-crypt-xxx.c program that ran a short sequence of “known answer” tests: crypt(phrase, salt) = expected for fixed values of phrase, salt, and expected. The code involved was very repetitive, taken as a whole, and many of the programs were not very thorough.

This patchset consolidates all of those programs into a single one that is much more thorough. It may actually be too thorough - it takes 20 seconds to run, on my computer, and that's after I parallelized it. Under valgrind it takes upward of 20 minutes to run, which necessitated some hacking around Travis's default timeouts. We could speed it up by dropping the checks that crypt, crypt_r, crypt_rn, and crypt_ra all produce the same output for the same input. If we look inside the black box, we know from the structure of the code that this will always be true for valid inputs (because the other three are wrappers around crypt_rn). I am arguing with myself about whether it is safe to know that for testing purposes, and would appreciate a second opinion.

Please also check my logic regarding the somewhat weaker treatment of (gost-)yescrypt than the other hashes (see the individual commit comments, and the notes about yescrypt in test-crypt-kat-gen.py).

codecov · 2019-02-02T01:30:31Z

Codecov Report

Merging #79 into develop will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff            @@
##           develop      #79   +/-   ##
========================================
  Coverage    96.46%   96.46%           
========================================
  Files           32       32           
  Lines         3109     3109           
========================================
  Hits          2999     2999           
  Misses         110      110

Impacted Files	Coverage Δ
crypt-md5.c	`95.23% <0%> (-1.59%)`	⬇️
crypt-sha256.c	`97.02% <0%> (-1%)`	⬇️
crypt-sha512.c	`97.34% <0%> (-0.89%)`	⬇️
alg-yescrypt-opt.c	`97.8% <0%> (+0.21%)`	⬆️
crypt-sunmd5.c	`99.07% <0%> (+0.92%)`	⬆️
crypt-nthash.c	`89.28% <0%> (+3.57%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34f216c...77d8cf8. Read the comment docs.

besser82 · 2019-02-02T12:21:29Z

My first (and only) finding: test-crypt-kat should be added to .gitignore.

Please also check my logic regarding the somewhat weaker treatment of (gost-)yescrypt than the other hashes (see the individual commit comments, and the notes about yescrypt in test-crypt-kat-gen.py).

From my POV, we should keep the hash tests in the individual test programs for those methods, at least as long as there are implementations for them from third parties. If we remove the indivdual hash tests, without actually checking the computed output matches what we currently assume or believe to be the correct result, we do not have any direct indication whether some (possibly needed) change or update (typos may sneak in at any time) in the (gost-)yescrypt codebase changes the actual computed hash or not.

besser82

I'm fine with this, except for my remarks in #79 (comment).

besser82 · 2019-02-03T20:26:30Z

Rebased on recent develop branch.

Most of the hashing methods had their own test-crypt-xxx.c program that ran a short sequence of “known answer” tests: crypt(phrase, salt) = expected for fixed values of phrase, salt, and expected. The code involved was very repetitive, taken as a whole, and many of the programs were not very thorough. Consolidate all of these programs into a single program, test-crypt-kat.c (kat = known answer test); test all the hashing methods against the union of all the old programs’ input phrases; test all four supported crypt* APIs for each case; test that for each hash <- crypt(phrase, salt), hash == crypt(phrase, hash) as well. The known answers are generated from a table of combinations, using a Python program that uses an independent implementation of all the hashing methods (passlib <https://passlib.readthedocs.io>, forced to use its internal pure-Python reference implementations instead of C accelerators that may have too much code in common with libxcrypt’s implementations). This program is very slow, and passlib is not part of the Python standard library, and we don’t currently depend on Python during the build at all, so it is not run during a normal build. You have to run it by hand if you change it, and check in the output (test-crypt-kat.inc). passlib currently can’t calculate yescrypt or gost-yescrypt hashes, so we don’t have known answers to compare against for those, but we do still crank all of the passphrases through the algorithm and make sure the hash == crypt(phrase, hash) invariant holds for them. test-crypt-gost-yescrypt.c performs some extra, GY-specific tests as well as known-answer black box tests; that part of it is preserved. It is necessary to increase the timeout for running the test suite under valgrind on Travis, from 10 minutes to 60 minutes. This can’t be done in the documented manner because the “command” you’re supposed to use, travis_wait, is a bash function available in the parent script but not in our .travis_script.sh; it is necessary to replicate that logic in our script.

test-crypt-kat is the slowest test in the test suite and there’s no reasonable way to reduce the amount of work it does, but we can apply some coarse parallelism: test crypt_rn, crypt_ra, crypt_r, and crypt in four separate threads. This also verifies that crypt_r{,a,n} don’t use any global resources. On my desktop computer test-crypt-kat goes from 55 to 21 seconds of wall-clock time. calc_hashes_recrypt depends on the data computed by calc_hashes_crypt_rn; folding them together removes a non-obvious ordering constraint from main.

After computing all of the hashes as it used to do, it loops through all of them and makes sure that no two hashes for different phrases are equal, except in known, expected cases (e.g. descrypt dropping the 8th bit and truncating the input).

besser82 · 2019-02-03T20:57:02Z

Added test-crypt-kat to .gitignore.
Reverted the truncation for NTHASH.

zackw · 2019-02-03T20:59:14Z

Thanks. I have a plan for what to do about yescrypt but it’s going to take me a couple more days to execute.

…

These are calculated using ctypes to call crypt_ra in the just-built libcrypt.so. This does not test against an independent implementation, but it does test stability of hashes (the hash of a passphrase today is the same as it was some time ago), which is even more important. There is now also a Makefile target to regenerate test-crypt-kat.inc, since test-crypt-kat-gen.py now needs to be run from a build tree.

zackw · 2019-02-04T14:23:00Z

@besser82
OK, I'm now generating (gost-)yescrypt hashes using ctypes to call into our libcrypt.so, which will, as you suggest, verify hash stability - the behavior now is the same as the behavior as of the last time test-crypt-kat.inc was regenerated (and git will tell us if a change to the generator has unexpectedly changed the expected hashes).

Before merging, I want to ask again whether you think it is worthwhile to test that all four of crypt, crypt_r, crypt_rn, and crypt_ra produce the same output for the same input for all of the test patterns. This is the major reason test-crypt-kat is so slow and there's an argument that it's pointless since we know they all just call do_crypt.

besser82 · 2019-02-04T14:34:46Z

@besser82
OK, I'm now generating (gost-)yescrypt hashes using ctypes to call into our libcrypt.so, which will, as you suggest, verify hash stability - the behavior now is the same as the behavior as of the last time test-crypt-kat.inc was regenerated (and git will tell us if a change to the generator has unexpectedly changed the expected hashes).

Sounds good!

Before merging, I want to ask again whether you think it is worthwhile to test that all four of crypt, crypt_r, crypt_rn, and crypt_ra produce the same output for the same input for all of the test patterns. This is the major reason test-crypt-kat is so slow and there's an argument that it's pointless since we know they all just call do_crypt.

I think it is a good idea to check whether all of the crypt functions are acting the same. For completeness, we should include a consistency test for fcrypt (if built as an alias for crypt) as well as you suggested in my PR about the optional replacement stubs.

besser82 · 2019-02-04T21:29:40Z

@zackw I'll add the cosistency test for fcrypt in a new PR. Merging this on CLI for now.

zackw requested a review from besser82 February 1, 2019 23:50

besser82 approved these changes Feb 2, 2019

View reviewed changes

besser82 force-pushed the zack/testsuite-consolidation branch from 7e2312b to 49fd0c2 Compare February 3, 2019 20:26

zackw added 3 commits February 3, 2019 21:28

Add collision tests to test-crypt-kat.

bd1aac9

After computing all of the hashes as it used to do, it loops through all of them and makes sure that no two hashes for different phrases are equal, except in known, expected cases (e.g. descrypt dropping the 8th bit and truncating the input).

besser82 force-pushed the zack/testsuite-consolidation branch from 49fd0c2 to bd1aac9 Compare February 3, 2019 20:56

besser82 merged commit 77d8cf8 into develop Feb 4, 2019

besser82 deleted the zack/testsuite-consolidation branch February 4, 2019 21:30

besser82 mentioned this pull request Feb 4, 2019

Add cosistency test for fcrypt. #81

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Testsuite consolidation #79

Testsuite consolidation #79

Uh oh!

zackw commented Feb 1, 2019

Uh oh!

codecov bot commented Feb 2, 2019 •

edited

Loading

Uh oh!

besser82 commented Feb 2, 2019 •

edited

Loading

Uh oh!

besser82 left a comment

Uh oh!

besser82 commented Feb 3, 2019

Uh oh!

besser82 commented Feb 3, 2019

Uh oh!

zackw commented Feb 3, 2019 via email

Uh oh!

zackw commented Feb 4, 2019

Uh oh!

besser82 commented Feb 4, 2019

Uh oh!

besser82 commented Feb 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Testsuite consolidation #79

Testsuite consolidation #79

Uh oh!

Conversation

zackw commented Feb 1, 2019

Uh oh!

codecov bot commented Feb 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

besser82 commented Feb 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

besser82 left a comment

Choose a reason for hiding this comment

Uh oh!

besser82 commented Feb 3, 2019

Uh oh!

besser82 commented Feb 3, 2019

Uh oh!

zackw commented Feb 3, 2019 via email

Uh oh!

zackw commented Feb 4, 2019

Uh oh!

besser82 commented Feb 4, 2019

Uh oh!

besser82 commented Feb 4, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Feb 2, 2019 •

edited

Loading

besser82 commented Feb 2, 2019 •

edited

Loading