Skip to content

Conversation

@zackw
Copy link
Collaborator

@zackw zackw commented Feb 1, 2019

Most of the hashing methods had their own test-crypt-xxx.c program that ran a short sequence of “known answer” tests: crypt(phrase, salt) = expected for fixed values of phrase, salt, and expected. The code involved was very repetitive, taken as a whole, and many of the programs were not very thorough.

This patchset consolidates all of those programs into a single one that is much more thorough. It may actually be too thorough - it takes 20 seconds to run, on my computer, and that's after I parallelized it. Under valgrind it takes upward of 20 minutes to run, which necessitated some hacking around Travis's default timeouts. We could speed it up by dropping the checks that crypt, crypt_r, crypt_rn, and crypt_ra all produce the same output for the same input. If we look inside the black box, we know from the structure of the code that this will always be true for valid inputs (because the other three are wrappers around crypt_rn). I am arguing with myself about whether it is safe to know that for testing purposes, and would appreciate a second opinion.

Please also check my logic regarding the somewhat weaker treatment of (gost-)yescrypt than the other hashes (see the individual commit comments, and the notes about yescrypt in test-crypt-kat-gen.py).

@zackw zackw requested a review from besser82 February 1, 2019 23:50
@codecov
Copy link

codecov bot commented Feb 2, 2019

Codecov Report

Merging #79 into develop will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop      #79   +/-   ##
========================================
  Coverage    96.46%   96.46%           
========================================
  Files           32       32           
  Lines         3109     3109           
========================================
  Hits          2999     2999           
  Misses         110      110
Impacted Files Coverage Δ
crypt-md5.c 95.23% <0%> (-1.59%) ⬇️
crypt-sha256.c 97.02% <0%> (-1%) ⬇️
crypt-sha512.c 97.34% <0%> (-0.89%) ⬇️
alg-yescrypt-opt.c 97.8% <0%> (+0.21%) ⬆️
crypt-sunmd5.c 99.07% <0%> (+0.92%) ⬆️
crypt-nthash.c 89.28% <0%> (+3.57%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 34f216c...77d8cf8. Read the comment docs.

@besser82
Copy link
Owner

besser82 commented Feb 2, 2019

My first (and only) finding: test-crypt-kat should be added to .gitignore.

Please also check my logic regarding the somewhat weaker treatment of (gost-)yescrypt than the other hashes (see the individual commit comments, and the notes about yescrypt in test-crypt-kat-gen.py).

From my POV, we should keep the hash tests in the individual test programs for those methods, at least as long as there are implementations for them from third parties. If we remove the indivdual hash tests, without actually checking the computed output matches what we currently assume or believe to be the correct result, we do not have any direct indication whether some (possibly needed) change or update (typos may sneak in at any time) in the (gost-)yescrypt codebase changes the actual computed hash or not.

Copy link
Owner

@besser82 besser82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this, except for my remarks in #79 (comment).

@besser82 besser82 force-pushed the zack/testsuite-consolidation branch from 7e2312b to 49fd0c2 Compare February 3, 2019 20:26
@besser82
Copy link
Owner

besser82 commented Feb 3, 2019

Rebased on recent develop branch.

zackw added 3 commits February 3, 2019 21:28
Most of the hashing methods had their own test-crypt-xxx.c program
that ran a short sequence of “known answer” tests:
crypt(phrase, salt) = expected for fixed values of phrase, salt, and
expected.  The code involved was very repetitive, taken as a whole,
and many of the programs were not very thorough.

Consolidate all of these programs into a single program,
test-crypt-kat.c (kat = known answer test); test all the hashing
methods against the union of all the old programs’ input phrases;
test all four supported crypt* APIs for each case; test that for
each hash <- crypt(phrase, salt), hash == crypt(phrase, hash) as well.

The known answers are generated from a table of combinations, using
a Python program that uses an independent implementation of all the
hashing methods (passlib <https://passlib.readthedocs.io>, forced
to use its internal pure-Python reference implementations instead of
C accelerators that may have too much code in common with libxcrypt’s
implementations).  This program is very slow, and passlib is not part
of the Python standard library, and we don’t currently depend on
Python during the build at all, so it is not run during a normal
build.  You have to run it by hand if you change it, and check in the
output (test-crypt-kat.inc).

passlib currently can’t calculate yescrypt or gost-yescrypt hashes,
so we don’t have known answers to compare against for those, but we
do still crank all of the passphrases through the algorithm and make
sure the hash == crypt(phrase, hash) invariant holds for them.

test-crypt-gost-yescrypt.c performs some extra, GY-specific tests
as well as known-answer black box tests; that part of it is preserved.

It is necessary to increase the timeout for running the test suite
under valgrind on Travis, from 10 minutes to 60 minutes.  This can’t
be done in the documented manner because the “command” you’re supposed
to use, travis_wait, is a bash function available in the parent script
but not in our .travis_script.sh; it is necessary to replicate that
logic in our script.
test-crypt-kat is the slowest test in the test suite and there’s no
reasonable way to reduce the amount of work it does, but we can apply
some coarse parallelism: test crypt_rn, crypt_ra, crypt_r, and crypt
in four separate threads.  This also verifies that crypt_r{,a,n} don’t
use any global resources.

On my desktop computer test-crypt-kat goes from 55 to 21 seconds of
wall-clock time.

calc_hashes_recrypt depends on the data computed by
calc_hashes_crypt_rn; folding them together removes a non-obvious
ordering constraint from main.
After computing all of the hashes as it used to do, it loops through
all of them and makes sure that no two hashes for different phrases
are equal, except in known, expected cases (e.g. descrypt dropping the
8th bit and truncating the input).
@besser82 besser82 force-pushed the zack/testsuite-consolidation branch from 49fd0c2 to bd1aac9 Compare February 3, 2019 20:56
@besser82
Copy link
Owner

besser82 commented Feb 3, 2019

Added test-crypt-kat to .gitignore.
Reverted the truncation for NTHASH.

@zackw
Copy link
Collaborator Author

zackw commented Feb 3, 2019 via email

These are calculated using ctypes to call crypt_ra in the just-built
libcrypt.so.  This does not test against an independent
implementation, but it does test stability of hashes (the hash of a
passphrase today is the same as it was some time ago), which is even
more important.

There is now also a Makefile target to regenerate test-crypt-kat.inc,
since test-crypt-kat-gen.py now needs to be run from a build tree.
@zackw
Copy link
Collaborator Author

zackw commented Feb 4, 2019

@besser82
OK, I'm now generating (gost-)yescrypt hashes using ctypes to call into our libcrypt.so, which will, as you suggest, verify hash stability - the behavior now is the same as the behavior as of the last time test-crypt-kat.inc was regenerated (and git will tell us if a change to the generator has unexpectedly changed the expected hashes).

Before merging, I want to ask again whether you think it is worthwhile to test that all four of crypt, crypt_r, crypt_rn, and crypt_ra produce the same output for the same input for all of the test patterns. This is the major reason test-crypt-kat is so slow and there's an argument that it's pointless since we know they all just call do_crypt.

@besser82
Copy link
Owner

besser82 commented Feb 4, 2019

@besser82
OK, I'm now generating (gost-)yescrypt hashes using ctypes to call into our libcrypt.so, which will, as you suggest, verify hash stability - the behavior now is the same as the behavior as of the last time test-crypt-kat.inc was regenerated (and git will tell us if a change to the generator has unexpectedly changed the expected hashes).

Sounds good!

Before merging, I want to ask again whether you think it is worthwhile to test that all four of crypt, crypt_r, crypt_rn, and crypt_ra produce the same output for the same input for all of the test patterns. This is the major reason test-crypt-kat is so slow and there's an argument that it's pointless since we know they all just call do_crypt.

I think it is a good idea to check whether all of the crypt functions are acting the same. For completeness, we should include a consistency test for fcrypt (if built as an alias for crypt) as well as you suggested in my PR about the optional replacement stubs.

@besser82
Copy link
Owner

besser82 commented Feb 4, 2019

@zackw I'll add the cosistency test for fcrypt in a new PR. Merging this on CLI for now.

@besser82 besser82 merged commit 77d8cf8 into develop Feb 4, 2019
@besser82 besser82 deleted the zack/testsuite-consolidation branch February 4, 2019 21:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants