Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HMAC-based Extract-and-Expand Key Derivation Function (HKDF) #26147

Closed
cocowalla opened this issue May 11, 2018 · 16 comments
Closed

HMAC-based Extract-and-Expand Key Derivation Function (HKDF) #26147

cocowalla opened this issue May 11, 2018 · 16 comments
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Security
Milestone

Comments

@cocowalla
Copy link

At present, System.Security.Cryptography has implementations of password-based key derivation functions (KDFs), specifically PBKDF1 and PBKDF2.

There is a need to also support non-password based KDFs that derive keys from other secrets or cryptographic key material without the intentional slowing of computation that is so desirable in password-based KDFs.

HMAC-based Extract-and-Expand Key Derivation Function (HKDF) has a simple design, and is one of the most prevalent non-password based KDFs in use today. I'm proposing adding an HKDF implementation in the System.Security.Cryptography namespace.

The RFC has full details, as does Krawczyk's original paper, but I'll supply a brief overview here. HKDF has 2 steps, the 1st of which can be omitted if you already have cryptographically strong key material:

  1. Extract: given a secret that is not uniformly random, and an optional salt, derive cryptographically strong key material - effectively 'concentrating' entropy from the input into a uniformly random key
  2. Expand: given cryptographically strong key material (such as that output from the Extract step) and an optional label/context, derive a cryptographically strong key of the desired length. By varying the label/context, unique keys are derived from the same input key material

I already have an implementation, which includes tests against the test vectors given in the RFC, and would be happy to submit a PR if you would be interested in having it.

The API looks like this:

namespace System.Security.Cryptography
{
    public static class HKDF
    {
        public static byte[] Extract(HashAlgorithmName hashAlgorithmName, byte[] ikm, byte[] salt = null) { throw null; }
        public static int Extract(HashAlgorithmName hashAlgorithmName, ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, Span<byte> prk) { throw null; }
        public static byte[] Expand(HashAlgorithmName hashAlgorithmName, byte[] prk, int outputLength, byte[] info = null) { throw null; }
        public static void Expand(HashAlgorithmName hashAlgorithmName, ReadOnlySpan<byte> prk, ReadOnlySpan<byte> info, Span<byte> output) { throw null; }
        public static byte[] DeriveKey(HashAlgorithmName hashAlgorithmName, byte[] ikm, int outputLength, byte[] salt = null, byte[] info = null) { throw null; }
        public static void DeriveKey(HashAlgorithmName hashAlgorithmName, ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, ReadOnlySpan<byte> info, Span<byte> output) { throw null; }
    }
}

It might be worth also adding a method at a higher abstraction level that performs both steps in one call:
byte[] DeriveBytes(byte[] ikm, int outputLen, byte[] salt = null, byte[] info = null);

@bartonjs, I think this is your domain - what are your thoughts on this? Interested in a PR?

@bartonjs
Copy link
Member

@cocowalla Before adding support for a new algorithm there's an internal review process we have to go through, so no PR just yet :).

As for the API, it would probably be Span-based.

public class HKDF
{
    public HKDF(HashAlgorithmName hashAlgorithm);

    // Maybe also ReadOnlySpan<byte>, ReadOnlySpan<byte> => byte[], if it seems worthwhile.
    bool TryExtract(ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, Span<byte> prk, out int bytesWritten);
    // Fills destination, however big it is.
    void Expand(ReadOnlySpan<byte> prk, ReadOnlySpan<byte> info, Span<byte> destination);
    // Fills destination, however big it is.
    void DeriveKey(ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, ReadOnlySpan<byte> info, Span<byte> destination);
}

That's assuming that it even makes sense to have instances of HKDF. Maybe it should be static methods and move the HashAlgorithmName into a parameter.

(I also got rid of salt and info having a default of null/empty, since the RFC says they are allowed to be omitted, but strongly recommends them)

@cocowalla
Copy link
Author

If you want a strict implementation of the RFC, salt and info should be optional - but I don't feel too strongly about that; I'd personally always use them both, and if someone really wants they can just supply an empty byte arrays (or Spans!).

Not sure about DeriveKey though. OK, so it's a Key Derivation Function, but it's common to derive things like initialisation vectors too. My thinking with DeriveBytes is that it's in keeping with the existing PBKDF1 and PBKDF2 implementations. I don't feel strongly about this either though.

On static vs instance: HKDF doesn't really need state, but it would be extremely rare to use a different hash algorithm for the Extract and Expand phases - in part because the input key material for Expand must be at least the same length as the hash it uses. The RFC certainly doesn't make any mention of using 2 different algorithms. I think it makes sense to have a ctor that takes a HashAlgorithmName.

I don't really understand why TryExtract instead of Extract; in what cases would it make sense to return false rather than throw? Also, that signature seems more complex than it needs to be. Personally, I'd prefer it if it just returned a byte array (or Span!) of the correct length - the way you have it, the caller needs to ensure that they pass a byte array or Span that is >= the hash output length.

I'd personally prefer that Expand took an int length as a parameter, just so the implementation stays 'closer' to the RFC.

I haven't worked with Spans yet, but I believe they are only present in .NET Core 2.1 at present?

Is it a 'policy' for new classes to only use Spans, or to have both byte array and Span-based methods?

@bartonjs
Copy link
Member

bartonjs commented May 11, 2018

If you want a strict implementation of the RFC, salt and info should be optional ... if someone really wants they can just supply an empty byte arrays (or Spans!).

Yep, the test vectors don't talk about a "missing" salt, they talk about an "empty" salt. So ReadOnlySpan.Empty.

On static vs instance: HKDF doesn't really need state ...

I guess it doesn't matter much. If the type is IDisposable then it can open the hasher in the ctor and close it in Dispose, which would be better than opening it twice.

I don't really understand why TryExtract instead of Extract; in what cases would it make sense to return false rather than throw?

When destination is too short. The main scenario is when you already have a big buffer, and you're wanting to write something down in what remains of it. It doesn't entirely make sense for HKDF, but it's the general pattern for write-to-Span (like Windows ERROR_MORE_DATA HRESULT). If HKDF is an instance class then the function it's passed to might not have any idea what hash function was chosen, and they happen to have this nice 32 byte chunk remaining... oh, no, someone picked SHA384.

I'd prefer it if it just returned a byte array (or Span!) of the correct length

Returning a Span doesn't make a lot of sense here. When a Span is a return it usually means it is a sub-range of the input data. If it was going to allocate a new array for this it would be better to be honest and return it as an array. (But then the question is: why do we need to wake the GC up to allocate an array for this 32-byte artifact when we could have just written it to a stackalloc Span to continue the rest of the computation?)

the way you have it, the caller needs to ensure that they pass a byte array or Span that is >= the hash output length

Yep, hence the Try prefix. If it makes sense to have the allocating return as well as the "write the answer here" function they could both be added. It'd be "never fails" and "causes fewer GC events".

I'd personally prefer that Expand took an int length as a parameter

You can think of Span as a triplet of { array, offset, length }. It's more powerful than that, but you can think of it that way. So int length is a parameter, it's just contained within the Span structure.

I haven't worked with Spans yet, but I believe they are only present in .NET Core 2.1 at present?

Essentially. The NuGet package is available for netfx, but the vast majority of Span-utilizing API is in netcoreapp21.

Is it a 'policy' for new classes to only use Spans, or to have both byte array and Span-based methods?

If it's input it doesn't usually make sense to have both (arrays are implicitly convertible to Span and ReadOnlySpan) -- the exception being when something "better" can be done for an array. For output it's more context-specific. "Write this to here" has the "how much was written?" out value (if it's not "fill'er up") and the "what if it's too short?" problem. Array-creating-and-returning things have the problem that they can become limiting factors for throughput, because they mean more time is spent in garbage collection. Mostly where we have two ways of doing things it's from "before Span" and "after/with Span", but they can also be thought of as "easier" and "more performant at the expense of ease".

For a "fill'er up" API, the difference is

byte[] derived = hkdf.DeriveWhatever(ikm, salt, info, 32);

and

byte[] derived = new byte[32];
hkdf.DeriveWhatever(ikm, salt, info, derived);

which doesn't look too much different. The latter allows for

byte[] derived = new byte[32];
hkdf.DeriveWhatever(ikm, salt, info1, derived.Slice(0, 16));
hkdf.DeriveWhatever(ikm, salt, info2, derived.Slice(16));
// use the key which was two 16-byte concatenations with different info values

vs

byte[] firstHalf = hkdf.DeriveWhatever(ikm, salt, info1, 16);
byte[] secondHalf = hkdf.DeriveWhatever(ikm, salt, info2, 16);
// Or however.
byte[] derived = firstHalf.Concat(secondHalf).ToArray();
// Now use the key.  firstHalf and secondHalf may have contributed to heap fragmentation

Span also lets you do some more interesting things, like avoid the GC altogether:

public void DeriveWhatever(ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, ReadOnlySpan<byte> info, Span<byte> destination)
{
    Span<byte> prk = stackalloc byte[512 / 8];

    if (!TryExtract(ikm, salt, prk, out int bytesWritten))
    {
        throw new CryptographicException("What hash were you even using?!");
    }

    Expand(prk.Slice(0, bytesWritten), info, destination);
    // This method, in and of itself, contributed no GC overhead.
    // Unless you use some future hash > 512-bit, then it allocated an exception.
}

@cocowalla
Copy link
Author

@bartonjs, OK, your API suggestion makes more sense in the context of Spans - thanks for taking the time to elucidate.

I work with byte arrays a lot, so I can see that Span is going to be really useful for reducing allocations.

@krwq
Copy link
Member

krwq commented Sep 16, 2019

I'm not sure if HashAlgorithmName would be a good choice for the input considering that COSE uses HKDF with PRF (see Table 12) but I guess it would be acceptable to call it a hashing function given that HKDF spec asks for a hashing function

It might also be worth taking enum HashAlgorithmType as an input instead of struct over string although namespace might be arguable.

@bartonjs
Copy link
Member

It might also be worth taking enum HashAlgorithmType as an input instead of struct over string although namespace might be arguable.

Definitely not 😄. That's a TLS-related enum. HashAlgorithmName (the strongly-typed string) is the way that we communicate cryptographic digest algorithms (and there are accelerators, so it looks like an enum).

I'm not sure if HashAlgorithmName would be a good choice for the input considering that COSE uses HKDF with PRF (see Table 12)

HKDF itself is "pick a hash, use HMAC":

HKDF-Extract(salt, IKM) => HMAC{Hash}(salt, IKM)

HKDF-Expand also uses HMAC{Hash}; but it's not a one-liner 😄.

I don't see a formal definition of "HKDF with AES-MAC"; but since that's creating HKDF with a different sort of flavor, maybe instead of constructors it needs static factories (to name why they're different).

@orip
Copy link

orip commented Oct 2, 2019

If you want a strict implementation of the RFC, salt and info should be optional ... if someone really wants they can just supply an empty byte arrays (or Spans!).

Yep, the test vectors don't talk about a "missing" salt, they talk about an "empty" salt. So ReadOnlySpan.Empty.

The RFC specifically mentions the case of a missing salt and provides test vectors for both an empty salt and a missing salt.

      salt     optional salt value (a non-secret random value);
               if not provided, it is set to a string of HashLen zeros.

...

A.6.  Test Case 6

   Test with SHA-1 and zero-length salt/info

   Hash = SHA-1
   IKM  = 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b (22 octets)
   salt = (0 octets)
   info = (0 octets)
   L    = 42

   PRK  = 0xda8c8a73c7fa77288ec6f5e7c297786aa0d32d01 (20 octets)
   OKM  = 0x0ac1af7002b3d761d1e55298da9d0506
          b9ae52057220a306e07b6b87e8df21d0
          ea00033de03984d34918 (42 octets)

A.7.  Test Case 7

   Test with SHA-1, salt not provided (defaults to HashLen zero octets),
   zero-length info

   Hash = SHA-1
   IKM  = 0x0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c (22 octets)
   salt = not provided (defaults to HashLen zero octets)
   info = (0 octets)
   L    = 42

   PRK  = 0x2adccada18779e7c2077ad2eb19d3f3e731385dd (20 octets)
   OKM  = 0x2c91117204d745f3500d636a62f64f0a
          b3bae548aa53d423b0d1f27ebba6f5e5
          673a081d70cce7acfc48 (42 octets)

@krwq krwq self-assigned this Oct 8, 2019
@krwq
Copy link
Member

krwq commented Oct 8, 2019

I'll start looking into this now. Note I will need to do some more reading before I make any progress since I'm not too familiar with implementation details

@cocowalla
Copy link
Author

@krwq I have a non-spanified, "pure" RFC implementation I can share if you'd like - it's simple and documented referencing the RFC.

@krwq
Copy link
Member

krwq commented Oct 8, 2019

@cocowalla if you already have it feel free to go ahead and create a PR (mark it as a draft since there was no API review yet). Once I catch up with the details I'll help with the API review and possibly with finishing up if needed.

We will likely need to add spanified version though but we can start iterating on your code since it shouldn't be hard to convert it

@cocowalla
Copy link
Author

@krwq done, dotnet/corefx#41651!

@krwq
Copy link
Member

krwq commented Oct 8, 2019

Thanks! I'll take a look once I got better understanding on the area (need to finish reading theory and see how other implementation shaped their APIs and see if there are any pitfalls or anything we should watch out for when designing APIs)

@krwq
Copy link
Member

krwq commented Oct 21, 2019

I've talked with @bartonjs offline on instance vs static and we believe there is not really much value in having the instance over static methods. The top post was edited with the API proposal.

We've also discussed how the design in case we need to generate multiple keys (or use HKDF as pseudo random generator). In that case we will create API which takes similar inputs to DeriveKey but will return stream which will be producing bytes but that is out of scope here (especially we're not sure anyone actually needs it).

@orip
Copy link

orip commented Oct 22, 2019

... (or use HKDF as pseudo random generator). ...

HKDF has a maximum output, see section 2.3 in the RFC:
L length of output keying material in octets (<= 255*HashLen)
This should probably be asserted at runtime, and makes HKDF a bad candidate for a PRNG.

@krwq
Copy link
Member

krwq commented Oct 22, 2019

@orip yes, this will be asserted.

Let's worry about it when someone needs it (they can still pass large byte array right now).

Per spec there are some valid scenarios though:

HKDF is intended for use in a wide variety of KDF applications.
   These include the building of pseudorandom generators from imperfect
   sources of randomness (such as a physical random number generator
   (RNG)); the generation of pseudorandomness out of weak sources of
   randomness, such as entropy collected from system events, user's
   keystrokes, etc.

@terrajobst
Copy link
Member

terrajobst commented Nov 12, 2019

Video

  • Expand. We should align the optionality of the info parameter between the span version and the byte array version.
    • This might cause ambiguity. We decided that optionality is omitted from the span versions
    • However, we should align the order to outputLength and output. We should swap output and info in the span-based version.
namespace System.Security.Cryptography
{
    public static class HKDF
    {
        public static byte[] Extract(HashAlgorithmName hashAlgorithmName, byte[] ikm, byte[] salt = null);
        public static int Extract(HashAlgorithmName hashAlgorithmName, ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, Span<byte> prk);

        public static byte[] Expand(HashAlgorithmName hashAlgorithmName, byte[] prk, int outputLength, byte[] info = null);
        public static void Expand(HashAlgorithmName hashAlgorithmName, ReadOnlySpan<byte> prk, Span<byte> output, ReadOnlySpan<byte> info);

        public static byte[] DeriveKey(HashAlgorithmName hashAlgorithmName, byte[] ikm, int outputLength, byte[] salt = null, byte[] info = null);
        public static void DeriveKey(HashAlgorithmName hashAlgorithmName, ReadOnlySpan<byte> ikm, Span<byte> output, ReadOnlySpan<byte> salt, ReadOnlySpan<byte> info);
    }
}

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Security
Projects
None yet
Development

No branches or pull requests

6 participants