-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypt.5 .hash's reporting of effective hash size is wrong #29
Comments
Check my logic here: Traditional DES recognizes only 256 distinct input passwords. (This is captured by the "maximum password length" entry in the table.) Those passwords are translated directly to DES cipher keys. The input block is fixed. It follows that, for any one salt, traditional DES crypt has only 256 possible outputs, even though its output is 64 bits long. If there is an output collision, that implies a weakness in the DES block cipher primitive (two keys produce the same output for the same input). The practical consequence of this is that a complete rainbow table for traditional DES requires 256 × 212 entries. BSDI enlarges the salt, but uses an iterated Merkle-Dåmgard-ish construction to map unlimited-length passphrases to the 256 possible DES keys, which means that for each passphrase longer than 8 characters, there is a passphrase of no more than 8 characters that collides with it (although the short side of the collision has a good chance of being untypeable). A complete BSDI rainbow table would have 256 × 224 × 223 entries (the additional factor comes from the variable round count). bigcrypt does somewhat the opposite: the salt stays 12 bits long, but the output length is enlarged proportional to the length of the input, using a construction reminiscent of CBC mode (each block of hash provides the salt for the next eight characters of password). This means brute-force guessing is parallelizable (each block can be cracked independently) and it also means a complete rainbow table for bigcrypt needs only 256 × 212 entries even though the output could contain many more than 56 significant bits. You would simply look up each block independently. |
Gah, accidentally hit the close button while trying to edit. Now, the question in my mind is, what are we trying to document here? Terminology aside, I think it is simply that, for the DES variants and only the DES variants, the output is longer than required for the number of possible outputs, and therefore brute-force attacks are easier / rainbow tables are smaller than you might think from inspecting a leaked shadow file. With me so far? |
While "what are we trying to document here?" is an important question, whatever we document better be correct - and right now it is not, hence this issue. I disagree that "for the DES variants [...] the output is longer than required for the number of possible outputs". They would have been somewhat worse if their output size were 56 rather than 64 bits. First, given a fixed salt, a 56-to-56 hash would produce only ~63.2% (or 1-1/e) of the 2^56 different values, whereas a 56-to-64 hash would produce ~99.8% of them. Second, with salts the input space is 56+12 bits, which is larger than 64. Now, I agree the salts helping utilize more of the output space, and occasional hash value collisions, are both mostly irrelevant to many realistic password cracking attacks, such as with tables. (BTW, I'd rather not go into discussing rainbow tables here - they're complicated "compressing" data structures (where the input number of "entries" is typically orders of magnitude higher than what's recorded), and they're also probabilistic. For the purpose of this discussion and to have number of "entries" more precisely defined, I'll assume you meant simpler lookup tables.) I agree that for attacks the input key space matters more than the output hash size does. (And distribution of actual passwords over that space matters even more.) However, we should and we do have separate fields for these, and we need to put the correct information in them. We shouldn't confuse input key size or its pipe width with output hash size. That's an artificial over-simplification, and it's incorrect. |
Regarding "brute-force attacks are easier / rainbow tables are smaller than you might think from inspecting a leaked shadow file", only someone unfamiliar with password cracking would try to draw conclusions about those things from hash sizes. The only time I recall anyone seriously mention something like this was in Ulrich's |
Solar Designer points out that a phrase used to describe DES-crypt, “Hash size 64 bits (effectively 56),” is misleading. Revert to separate “hash size” and “effective key size” table entries as is used in the Openwall version of this manpage, but omit the “effective key size” table entry if it would be the same number as “hash size.” Correct the `.hash` line for yescrypt as discussed in the review of pull request #27. Also swap two of the entries in the list of attacks that salt defeats, for better prose flow; reference RFC 4648 when talking about ‘the common “base64” encoding’ that hashes don’t use; and tighten up the implementation of the `.hash` macro.
As a quick fix (since I think @besser82 may want to get 4.2.0 out the door soon), I have reintroduced the "effective key size" entry for DES only. We now have, for instance,
versus
Down the road, I have realized that my actual subconscious problem with the terms generated by the And reading your commentary has revealed to me that I don't understand the password-cracking state of the art well enough to write those definitions. I'm a systems programmer at heart, not a cryptographer. Care to have a whack at it? |
Your changes look good to me, thanks! As to defining the terms, I think most are self-explanatory whereas "Effective key size" would take too much space to define (more precisely than these words imply) whereas it's only relevant to a couple of old hashing methods, or maybe even to only one - see below. So I'd rather not do that. As an option, we can leave the only mention of "Effective key size" as:
for BSDI-style hashes. It's not necessary to include this for "Traditional DES-based", where it's implied by:
Also, we lost the "up to" along with making those changes to the macro and its uses. Perhaps we should restore that since 56 bits is only reached for passwords of length 8+. |
Tagging as both "help wanted" and "need more information". I still think we should define the terms we use, but it's still not clear to me how to describe DES, and I still don't understand most of what @solardiz was trying to communicate. I want to find a new contributor who is expert in password hashing and get them to read over the manpage and the discussion history here with fresh eyes, and I want to find a skilled sysadmin who is not expert in password hashing, get them to read the manpage and tell us what parts they understand and what terms they want clarified. |
I think my current corrections in #185 are sufficient. Sure someone (e.g., a sysadmin) could have questions about the exact meaning of "effective key size", but that is not something we can reasonably fully explain in a man page. So I suggest simply merging that PR and then closing this issue. |
As discussed in #27 (comment), the
.hash
macro has been changed somewhere between my revision in Owl (and its later life in ALT Linux?) and libxcrypt's to change the reporting for traditional DES-based hashes from:to:
(There are other differences as well, but those look OK.)
The above change is wrong because a limit on key size doesn't reduce the effective hash size as claimed.
First, a cryptographic 56-bit key to 56-bit hash function would likely produce far more collisions (over its full 56-bit key space) than a cryptographic 56-bit to 64-bit hash function is expected to. If this isn't immediately obvious, please consider extreme cases such as a 1-bit to 1-bit vs. a 1-bit to 64-bit hash function (in the former case, 2 out of 4 possible functions would produce collisions half of the time, whereas in the latter an extremely low percentage of the possible functions would produce collisions), or an 8-bit to 8-bit (expect only ~162 different outputs over the 256 input keys space) vs. an 8-bit to 64-bit hash function (expect no collisions).
Second, even with the key input limited to 56 bits, almost the full 2^64 variety of 64-bit hash outputs are likely reachable via varying key+salt inputs.
The same issue is probably also present for BSDI-style DES-based hashes, where the password input length is not limited to 8 characters, but internally the key size is effectively limited to 56 bits, creating a "narrow pipe" within the hash function (except for the salt input, which is also treated separately). This "narrow pipe" is similarly not the same as an effectively reduced hash output size.
Perhaps this change to
.hash
should be reverted or revised, and some uses of.hash
may need to be revised accordingly. I understand that the lines:as would be reported by my original
.hash
for most hash types didn't look pretty. Maybe we can have the macro omit this reporting for hashes where this is the case, only leaving the reporting for the few special-case hashes?The text was updated successfully, but these errors were encountered: