Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashing ascii values vs bytes with SHA512 #697

Closed
4 tasks done
ozydingo opened this issue Feb 22, 2020 · 3 comments · Fixed by #698
Closed
4 tasks done

Hashing ascii values vs bytes with SHA512 #697

ozydingo opened this issue Feb 22, 2020 · 3 comments · Fixed by #698

Comments

@ozydingo
Copy link
Contributor

ozydingo commented Feb 22, 2020

ISSUES THAT DO NOT FOLLOW THIS TEMPLATE WILL BE CLOSED IMMEDIATELY.

  • This is not a usage question.
    • Our volunteers' time is limited, so please ask usage questions on
      StackOverflow.
  • This is not a security issue.
  • This bug is reproducible with a clean install of authlogic
  • I am committed to fixing this in a reasonable amount of time, and
    responding promptly to feedback.

Expected Behavior

Hashed passwords should match standard implementations of the SHA512 algorithm. If they do, it would be possible to import hashed passwords into another authentication verification system and still allow valid authentication.

Actual Behavior

The hashing algorithm used by Authlogic needs to be modified, as described below, for hashed passwords to match Google Firebase's method. Doing so results in successful imports.

The SHA512 crypto provider in Authlogic uses the following method to hash passwords:

stretches.times { digest = Digest::SHA512.hexdigest(digest) }

However, the output of Digest::SHA512.hexdigest is an ascii string representing hex byte values. Each round / stretch should, as I understand it, hash the bytes, not the string displaying those bytes as hex.

That is, "01" is hex for 1, but as a string fed directly into hexdigest it actually represent the values [48, 49] (hex [30 31]) as per the ascii encoding. The bytes represented by the hex characters can be obtained with the Array#pack method.

2.3.3 :193 > "01".bytes
 => [48, 49]
2.3.3 :194 > ["01"].pack("H*").bytes
 => [1]

Similarly, when Digest::SHA512.hexdigest returns "da..." that "da", if my understanding is correct, ought to be treated as the hex byte DA and not as the ascii string "da", which is [100, 97].

2.3.3 :199 > ["da"].pack("H*").bytes
 => [218]
2.3.3 :200 > 218.to_s(16)
 => "da"

Google uses the bytes method to hash passwords, and this disconnect is demonstrated in this issue I opened in google's firebase (node) repo. (Skip the long javascript code block and read hiranya911's second response and my comments below, demonstrating the two methods of hashing side by side.) By using the bytes method, I can produce hashed passwords that are successfully imported into Firebase.

The matching ruby method to hash and rehash the password might look more like

digest = Digest::SHA512.hexdigest([digest].pack("H*"))

This is an easy change to made to authlogic's crypto providers, but it's obviously a breaking one. One suggestion to implement this method of hashing would be to provide a new set of SHA-based crypto providers without removing the old ones, e.g. Authlogic::CryptoProviders::Sha512FromBytes. That way authlogic users could optionally configure these crypto providers, but existing users could continue to use what they do.

So here i wanted to get feedback: is this an addition that makes sense to add to Authlogic? What class, configuration, and versioning structure would make sense to roll this out? Just create the new crypto provider classes and allow them to be configured?

@jaredbeck
Copy link
Collaborator

Hi Andrew. Interesting issue! Thanks for taking this on, and for the solid write-up.

Hashed passwords should match standard implementations of the SHA512 algorithm.

I'm not familiar with the standards for key-stretching so I quickly read some docs for pgcrypto, crypt(3), and PBKDF2. I'd be interested in any other suggested reading.

In the interest of me learning the right terminology, it's my understanding that we are using a standard implementation of SHA512 (from ruby's stdlib) but it sounds like this issue is about our key stretching. Is that the right term to use?

However, the output of Digest::SHA512.hexdigest is an ascii string representing hex byte values.

Yes, the docs and my own tests confirm this.

Each round / stretch should, as I understand it, hash the bytes, not the string displaying those bytes as hex.

I expect you are right. pgcrypto's digest function, for example, returns a binary string (bytea). I expect that crypt calls digest repeatedly, not encodeing the bytea into text until the end.

One suggestion to implement this method of hashing would be to provide a new set of SHA-based crypto providers without removing the old ones, e.g. Authlogic::CryptoProviders::Sha512FromBytes.

Yes, I had the same thought. Another possible naming scheme would be eg. Authlogic::CryptoProviders::V2::SHA512. I think I prefer this because it's obvious (or at least suggests) that V2::SHA512 is preferable to SHA512. What do you think?

We should take this opportunity to correct the inconsistent capitalization; SHA is an acronym (Secure Hash Algorithm) and should be capitalized accordingly.

Transition

Users will transition seamlessly with something like:

acts_as_authentic do |c|
    c.crypto_provider = Authlogic::CryptoProviders::V2::SHA512
    c.transition_from_crypto_providers Authlogic::CryptoProviders::Sha512
  end

The term "Crypto Provider"

Incidentally, I've never seen the term "crypto provider" used anywhere else. Postgres would call our providers "Password Hashing Functions" (as compared to "General Hashing Functions"). I'd like to know what terms are standard, and move towards such terms, but that's outside the scope of this issue.

@ozydingo
Copy link
Contributor Author

Thanks for the response! I agree you're right to say the SHA512 implementation is not the problem but the method of key stretching. I'll get to working on a PR when I can sneak in the time, shouldn't be too long.

I expect you are right. pgcrypto's digest function, for example, returns a binary string (bytea). I expect that crypt calls digest repeatedly, not encodeing the bytea into text until the end.

Indeed that seems to be the most succinct method over what I wrote above; call Digest::SHA512.digest and convert to a hex string at the very end. I'll (1) do that for all the "crypto providers" that call Digest::*.hexdigest, (2) use your suggestion for capitalization, and (3) write tests for the V2 versions and transitioning between them.

There is one more missing piece to the change working with imports, though. Google uses salt-first hashing, so imports using salted passwords will still not be compatible. From what I can read neither is "correct" though I've seen more examples of salt-first then salt-last. In any case, this cannot be address with the crypto provider classes since it's determined by Authlogic::ActsAsAuthentic::Password#encrypt_arguments, and I haven't dived in deep enough to know what would need to be built to add that as a configuration that can be handled by the transitioning logic. An issue for another time, for now I'll just work on this piece by itself.

@jaredbeck
Copy link
Collaborator

Released as 6.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants