Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 SHA and MD5 crypto providers #698

Merged
merged 6 commits into from
Mar 9, 2020
Merged

Conversation

ozydingo
Copy link
Contributor

Fixes #697

Andrew Schwartz added 2 commits February 29, 2020 14:31
Copy link
Collaborator

@jaredbeck jaredbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start, thanks Andrew.

My only major question at this point is about naming, because we only get one chance to get it right. CryptoProviders::MD5::V2 is a possible alternative.

  • Having the version number after the name is closer to normal thought. For example, no one says "v6 rails", we say "rails v6".
  • Also, what if we need a MD5::V3 in the future, but don't need a V3 of anything else? It feels awkward to have a V3 folder with only one thing in it.

Neither of these points is very strong on their own. What do you think?

I think we should also discuss having the V2 classes inherit from V1.

class MD5
  class V2 < MD5
    def encrypt

By drastically reducing duplication, we make it much easier for someone reading the code to compare v1 and v2.

@@ -30,6 +30,15 @@ module CryptoProviders
autoload :BCrypt, "authlogic/crypto_providers/bcrypt"
autoload :SCrypt, "authlogic/crypto_providers/scrypt"

# V2 crypto providers fix their predecessors' encryption schemes by
# hashing byte strings instead of the ehx strings output by `hexdigest`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# hashing byte strings instead of the ehx strings output by `hexdigest`
# hashing byte strings instead of the hex strings output by `hexdigest`

Authlogic::CryptoProviders::V2::SHA256
]
transition_password_to(providers[0], ben)
providers.each_cons(2) do |old_provider, new_provider|
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow each_cons is perfect for this. Nice!

digest = "51563330eb60e0eeb89759b01f08e872"
Authlogic::CryptoProviders::V2::MD5.stretches = 1
assert Authlogic::CryptoProviders::V2::MD5.matches?(digest, nil, salt, password, nil)
Authlogic::CryptoProviders::V2::MD5.stretches = 10
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the stretches = 10 is restoring previous config, please move it to a teardown method, to ensure it happens.

password = "test"
salt = "7e3041ebc2fc05a40c60028e2c4901a81035d3cd"
digest = "51563330eb60e0eeb89759b01f08e872"
Authlogic::CryptoProviders::V2::MD5.stretches = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how this test asserts a specific digest. Should we (also?) test with more than one stretch? If we only test one stretch, we're not testing the purpose of V2, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well that's a very good point. I copied this pattern from the V1 tests, which didn't include a test for MD5, without quite enough thought.

Really I think a better thing to do is to (1) test encrypt twice with two appropriately named methods that indicate they are testing different values of stretches, both with specific digests, and (2) only test matches? once now that stretches has been covered by (1). What do you think?

Now I'm just trying to look into MiniTest doubles to see if it's worth asserting the number of calls made on the encryption methods or to just assert values of the different hashes they produce.

I can also make the corresponding changes to the V1 tests, including adding a test for MD5, in a separate commit. Would it be better to do so in a separate PR entirely?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. to (1) test encrypt twice with .. different values of stretches, both with specific digests ..

Sounds good.

.. MiniTest doubles .. or to just assert values of the different hashes they produce(?)

I'd just assert the hashes.

I can also make the corresponding changes to the V1 tests, including adding a test for MD5, in a separate commit. Would it be better to do so in a separate PR entirely?

It's up to you. Either way is fine with me.

@jaredbeck
Copy link
Collaborator

Incidentally, as a result of this PR, I ordered a copy of "Cryptography Engineering" (Ferguson, Schneier, Kohno, 2010) and there's a nice overview of Salting and Stretching starting on p. 304. It doesn't tell us, explicitly, to hash the bytes instead of the hex-encoded string, but (to me) it is implied. (I was already convinced anyway from our research in #697) It also doesn't explicitly say whether to prepend or append the salt, probably because it doesn't matter, and also perhaps because there is no standard?

@ozydingo
Copy link
Contributor Author

ozydingo commented Mar 5, 2020

My only major question at this point is about naming, because we only get one chance to get it right. CryptoProviders::MD5::V2 is a possible alternative.

  • Having the version number after the name is closer to normal thought. For example, no one says "v6 rails", we say "rails v6".

I agree on the name being nicer. I don't like the class MD5 serving double-duty as the namespace for the V2 class though. However, see next:

  • Also, what if we need a MD5::V3 in the future, but don't need a V3 of anything else? It feels awkward to have a V3 folder with only one thing in it.

I don't actually feel awkward at all about a V3 folder with only one thing, but a very closely related thought take the argument for me. While here V2 is synonymous with "hash bytes not hex strings", that's just because it happens to be the first versioning change. There's no specific reason that ``V3::MD5is v3 for the same reason thatV3::SHA512` is v3. Or, perhaps more likely, if a `V2::BCrypt` is introduced, it's not going to be for the same reason as these `V2` classes. `V2` is not the concern that unites these subclasses, and so should not be the parent namespace.

I'll make the naming change.

@ozydingo
Copy link
Contributor Author

ozydingo commented Mar 5, 2020

I think we should also discuss having the V2 classes inherit from V1.

class MD5
  class V2 < MD5
    def encrypt

Personally I'm a pretty big stickler on "subclassing should be an is_a relationship". I have always found that violating that for benefit of reusing code has led to more confusion in the long run. Here, an MD5::V2 is not an MD5 V1. Were I to design it from scratch, MD5 might be a module included by both MD5::V1 and MD5::V2, but we obviously can't make that change. to what MD5 is.

This is expanding the scope here a bit, but what we could do is define a module for the common stretches, and matches? behavior. That could look something like

module CryptoProviders
  module SHA
    def self.included(klass)
      klass.attr_accessor :join_token
      klass.attr_writer :stretches
    end

    # The number of times to loop through the encryption.
    def stretches
      @stretches ||= 20
    end

    # Does the crypted password match the tokens? Uses the same tokens that
    # were used to encrypt.
    def matches?(crypted, *tokens)
      encrypt(*tokens) == crypted
    end
  end
end

module CryptoProviders
  class SHA256
    class << self
      include SHA

      # Turns your raw password into a Sha256 hash.
      def encrypt(*tokens)
        digest = tokens.flatten.join(join_token)
        stretches.times { digest =  Digest::SHA256.hexdigest(digest)}
        digest
      end
    end
  end
end

module CryptoProviders
  class SHA256::V2
    class << self
      include SHA

      # Turns your raw password into a Sha256 hash.
      def encrypt(*tokens)
        digest = tokens.flatten.join(join_token)
        stretches.times { digest =  Digest::SHA256.digest(digest)}
        digest.unpack("H*")[0]
      end
    end
  end
end

To be honest I'm not sure it's enough duplication to justify the additional redirection, but I don't have a super strong opinion here and am happy to take this in either direction.

@jaredbeck
Copy link
Collaborator

I think we should also discuss having the V2 classes inherit from V1.

Personally I'm a pretty big stickler on "subclassing should be an is_a relationship". ..
I'm not sure it's enough duplication to justify the additional redirection ..

That's fine, let's not worry about it for now. We can always come back and address the duplication later if we change our minds.

Andrew Schwartz added 2 commits March 7, 2020 08:49
… V2::MD5. Requires using the old-cased class names, such as Sha1::V2, instead of SHA1.
@ozydingo
Copy link
Contributor Author

ozydingo commented Mar 7, 2020

Ok, this has been updated with the discussed changes. Note:

  • I had to. ditch the capitalization of SHA because the existing classes that we are using now as namespaces were Sha. I considered simply creating new namespaces with the full caps, but then MD5 becomes the odd-man-out, and that just felt like it was introducing needless confusion.
  • I've added the autoload of the V2 classes to the V1 class definitions out of necessity. E.g. the. MD5 class has the autoload directive for MD5::V2. This works because any reference to MD5::V2 must first load the definition of MD5. I say out of necessity because as far as I can tell, autoload can only specify const name at the current namespace, and I can't get the MD5 namespace, for example, without first removing the purpose of the autoload in t he. first place. Not certain if that's really worth keeping, but I didn't want that decision to be a part of this PR.

@ozydingo
Copy link
Contributor Author

ozydingo commented Mar 7, 2020

I can also happily squash 266f9c4 into 08b04bc, but wanted to leave the history alone for the benefit of the PR.

@jaredbeck jaredbeck merged commit dc1998b into binarylogic:master Mar 9, 2020
@jaredbeck
Copy link
Collaborator

Merged, thanks Andrew. I might think some more on 1) capitalization of acronyms, and 2) reducing duplication. If I change either, I'll @ you in a PR.

Looking at the changelog, I guess our next release will be 6.0.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hashing ascii values vs bytes with SHA512
2 participants