HMAC-based Extract-and-Expand Key Derivation Function (HKDF) #26147

cocowalla · 2018-05-11T14:45:13Z

At present, System.Security.Cryptography has implementations of password-based key derivation functions (KDFs), specifically PBKDF1 and PBKDF2.

There is a need to also support non-password based KDFs that derive keys from other secrets or cryptographic key material without the intentional slowing of computation that is so desirable in password-based KDFs.

HMAC-based Extract-and-Expand Key Derivation Function (HKDF) has a simple design, and is one of the most prevalent non-password based KDFs in use today. I'm proposing adding an HKDF implementation in the System.Security.Cryptography namespace.

The RFC has full details, as does Krawczyk's original paper, but I'll supply a brief overview here. HKDF has 2 steps, the 1st of which can be omitted if you already have cryptographically strong key material:

Extract: given a secret that is not uniformly random, and an optional salt, derive cryptographically strong key material - effectively 'concentrating' entropy from the input into a uniformly random key
Expand: given cryptographically strong key material (such as that output from the Extract step) and an optional label/context, derive a cryptographically strong key of the desired length. By varying the label/context, unique keys are derived from the same input key material

I already have an implementation, which includes tests against the test vectors given in the RFC, and would be happy to submit a PR if you would be interested in having it.

The API looks like this:

namespace System.Security.Cryptography
{
    public static class HKDF
    {
        public static byte[] Extract(HashAlgorithmName hashAlgorithmName, byte[] ikm, byte[] salt = null) { throw null; }
        public static int Extract(HashAlgorithmName hashAlgorithmName, ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, Span<byte> prk) { throw null; }
        public static byte[] Expand(HashAlgorithmName hashAlgorithmName, byte[] prk, int outputLength, byte[] info = null) { throw null; }
        public static void Expand(HashAlgorithmName hashAlgorithmName, ReadOnlySpan<byte> prk, ReadOnlySpan<byte> info, Span<byte> output) { throw null; }
        public static byte[] DeriveKey(HashAlgorithmName hashAlgorithmName, byte[] ikm, int outputLength, byte[] salt = null, byte[] info = null) { throw null; }
        public static void DeriveKey(HashAlgorithmName hashAlgorithmName, ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, ReadOnlySpan<byte> info, Span<byte> output) { throw null; }
    }
}

It might be worth also adding a method at a higher abstraction level that performs both steps in one call:
byte[] DeriveBytes(byte[] ikm, int outputLen, byte[] salt = null, byte[] info = null);

@bartonjs, I think this is your domain - what are your thoughts on this? Interested in a PR?

The text was updated successfully, but these errors were encountered:

bartonjs · 2018-05-11T15:11:45Z

@cocowalla Before adding support for a new algorithm there's an internal review process we have to go through, so no PR just yet :).

As for the API, it would probably be Span-based.

public class HKDF
{
    public HKDF(HashAlgorithmName hashAlgorithm);

    // Maybe also ReadOnlySpan<byte>, ReadOnlySpan<byte> => byte[], if it seems worthwhile.
    bool TryExtract(ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, Span<byte> prk, out int bytesWritten);
    // Fills destination, however big it is.
    void Expand(ReadOnlySpan<byte> prk, ReadOnlySpan<byte> info, Span<byte> destination);
    // Fills destination, however big it is.
    void DeriveKey(ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, ReadOnlySpan<byte> info, Span<byte> destination);
}

That's assuming that it even makes sense to have instances of HKDF. Maybe it should be static methods and move the HashAlgorithmName into a parameter.

(I also got rid of salt and info having a default of null/empty, since the RFC says they are allowed to be omitted, but strongly recommends them)

cocowalla · 2018-05-11T15:39:10Z

If you want a strict implementation of the RFC, salt and info should be optional - but I don't feel too strongly about that; I'd personally always use them both, and if someone really wants they can just supply an empty byte arrays (or Spans!).

Not sure about DeriveKey though. OK, so it's a Key Derivation Function, but it's common to derive things like initialisation vectors too. My thinking with DeriveBytes is that it's in keeping with the existing PBKDF1 and PBKDF2 implementations. I don't feel strongly about this either though.

On static vs instance: HKDF doesn't really need state, but it would be extremely rare to use a different hash algorithm for the Extract and Expand phases - in part because the input key material for Expand must be at least the same length as the hash it uses. The RFC certainly doesn't make any mention of using 2 different algorithms. I think it makes sense to have a ctor that takes a HashAlgorithmName.

I don't really understand why TryExtract instead of Extract; in what cases would it make sense to return false rather than throw? Also, that signature seems more complex than it needs to be. Personally, I'd prefer it if it just returned a byte array (or Span!) of the correct length - the way you have it, the caller needs to ensure that they pass a byte array or Span that is >= the hash output length.

I'd personally prefer that Expand took an int length as a parameter, just so the implementation stays 'closer' to the RFC.

I haven't worked with Spans yet, but I believe they are only present in .NET Core 2.1 at present?

Is it a 'policy' for new classes to only use Spans, or to have both byte array and Span-based methods?

bartonjs · 2018-05-11T16:47:03Z

If you want a strict implementation of the RFC, salt and info should be optional ... if someone really wants they can just supply an empty byte arrays (or Spans!).

Yep, the test vectors don't talk about a "missing" salt, they talk about an "empty" salt. So ReadOnlySpan.Empty.

On static vs instance: HKDF doesn't really need state ...

I guess it doesn't matter much. If the type is IDisposable then it can open the hasher in the ctor and close it in Dispose, which would be better than opening it twice.

I don't really understand why TryExtract instead of Extract; in what cases would it make sense to return false rather than throw?

When destination is too short. The main scenario is when you already have a big buffer, and you're wanting to write something down in what remains of it. It doesn't entirely make sense for HKDF, but it's the general pattern for write-to-Span (like Windows ERROR_MORE_DATA HRESULT). If HKDF is an instance class then the function it's passed to might not have any idea what hash function was chosen, and they happen to have this nice 32 byte chunk remaining... oh, no, someone picked SHA384.

I'd prefer it if it just returned a byte array (or Span!) of the correct length

Returning a Span doesn't make a lot of sense here. When a Span is a return it usually means it is a sub-range of the input data. If it was going to allocate a new array for this it would be better to be honest and return it as an array. (But then the question is: why do we need to wake the GC up to allocate an array for this 32-byte artifact when we could have just written it to a stackalloc Span to continue the rest of the computation?)

the way you have it, the caller needs to ensure that they pass a byte array or Span that is >= the hash output length

Yep, hence the Try prefix. If it makes sense to have the allocating return as well as the "write the answer here" function they could both be added. It'd be "never fails" and "causes fewer GC events".

I'd personally prefer that Expand took an int length as a parameter

You can think of Span as a triplet of { array, offset, length }. It's more powerful than that, but you can think of it that way. So int length is a parameter, it's just contained within the Span structure.

I haven't worked with Spans yet, but I believe they are only present in .NET Core 2.1 at present?

Essentially. The NuGet package is available for netfx, but the vast majority of Span-utilizing API is in netcoreapp21.

Is it a 'policy' for new classes to only use Spans, or to have both byte array and Span-based methods?

If it's input it doesn't usually make sense to have both (arrays are implicitly convertible to Span and ReadOnlySpan) -- the exception being when something "better" can be done for an array. For output it's more context-specific. "Write this to here" has the "how much was written?" out value (if it's not "fill'er up") and the "what if it's too short?" problem. Array-creating-and-returning things have the problem that they can become limiting factors for throughput, because they mean more time is spent in garbage collection. Mostly where we have two ways of doing things it's from "before Span" and "after/with Span", but they can also be thought of as "easier" and "more performant at the expense of ease".

For a "fill'er up" API, the difference is

byte[] derived = hkdf.DeriveWhatever(ikm, salt, info, 32);

and

byte[] derived = new byte[32];
hkdf.DeriveWhatever(ikm, salt, info, derived);

which doesn't look too much different. The latter allows for

byte[] derived = new byte[32];
hkdf.DeriveWhatever(ikm, salt, info1, derived.Slice(0, 16));
hkdf.DeriveWhatever(ikm, salt, info2, derived.Slice(16));
// use the key which was two 16-byte concatenations with different info values

vs

byte[] firstHalf = hkdf.DeriveWhatever(ikm, salt, info1, 16);
byte[] secondHalf = hkdf.DeriveWhatever(ikm, salt, info2, 16);
// Or however.
byte[] derived = firstHalf.Concat(secondHalf).ToArray();
// Now use the key.  firstHalf and secondHalf may have contributed to heap fragmentation

Span also lets you do some more interesting things, like avoid the GC altogether:

public void DeriveWhatever(ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, ReadOnlySpan<byte> info, Span<byte> destination)
{
    Span<byte> prk = stackalloc byte[512 / 8];

    if (!TryExtract(ikm, salt, prk, out int bytesWritten))
    {
        throw new CryptographicException("What hash were you even using?!");
    }

    Expand(prk.Slice(0, bytesWritten), info, destination);
    // This method, in and of itself, contributed no GC overhead.
    // Unless you use some future hash > 512-bit, then it allocated an exception.
}

cocowalla · 2018-05-12T06:47:28Z

@bartonjs, OK, your API suggestion makes more sense in the context of Spans - thanks for taking the time to elucidate.

I work with byte arrays a lot, so I can see that Span is going to be really useful for reducing allocations.

krwq · 2019-09-16T23:06:52Z

I'm not sure if HashAlgorithmName would be a good choice for the input considering that COSE uses HKDF with PRF (see Table 12) but I guess it would be acceptable to call it a hashing function given that HKDF spec asks for a hashing function

It might also be worth taking enum HashAlgorithmType as an input instead of struct over string although namespace might be arguable.

bartonjs · 2019-09-16T23:50:56Z

It might also be worth taking enum HashAlgorithmType as an input instead of struct over string although namespace might be arguable.

Definitely not 😄. That's a TLS-related enum. HashAlgorithmName (the strongly-typed string) is the way that we communicate cryptographic digest algorithms (and there are accelerators, so it looks like an enum).

I'm not sure if HashAlgorithmName would be a good choice for the input considering that COSE uses HKDF with PRF (see Table 12)

HKDF itself is "pick a hash, use HMAC":

HKDF-Extract(salt, IKM) => HMAC{Hash}(salt, IKM)

HKDF-Expand also uses HMAC{Hash}; but it's not a one-liner 😄.

I don't see a formal definition of "HKDF with AES-MAC"; but since that's creating HKDF with a different sort of flavor, maybe instead of constructors it needs static factories (to name why they're different).

orip · 2019-10-02T07:16:22Z

If you want a strict implementation of the RFC, salt and info should be optional ... if someone really wants they can just supply an empty byte arrays (or Spans!).

Yep, the test vectors don't talk about a "missing" salt, they talk about an "empty" salt. So ReadOnlySpan.Empty.

The RFC specifically mentions the case of a missing salt and provides test vectors for both an empty salt and a missing salt.

      salt     optional salt value (a non-secret random value);
               if not provided, it is set to a string of HashLen zeros.

...

A.6.  Test Case 6

   Test with SHA-1 and zero-length salt/info

   Hash = SHA-1
   IKM  = 0x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b (22 octets)
   salt = (0 octets)
   info = (0 octets)
   L    = 42

   PRK  = 0xda8c8a73c7fa77288ec6f5e7c297786aa0d32d01 (20 octets)
   OKM  = 0x0ac1af7002b3d761d1e55298da9d0506
          b9ae52057220a306e07b6b87e8df21d0
          ea00033de03984d34918 (42 octets)

A.7.  Test Case 7

   Test with SHA-1, salt not provided (defaults to HashLen zero octets),
   zero-length info

   Hash = SHA-1
   IKM  = 0x0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c (22 octets)
   salt = not provided (defaults to HashLen zero octets)
   info = (0 octets)
   L    = 42

   PRK  = 0x2adccada18779e7c2077ad2eb19d3f3e731385dd (20 octets)
   OKM  = 0x2c91117204d745f3500d636a62f64f0a
          b3bae548aa53d423b0d1f27ebba6f5e5
          673a081d70cce7acfc48 (42 octets)

krwq · 2019-10-08T18:29:31Z

I'll start looking into this now. Note I will need to do some more reading before I make any progress since I'm not too familiar with implementation details

cocowalla · 2019-10-08T19:12:21Z

@krwq I have a non-spanified, "pure" RFC implementation I can share if you'd like - it's simple and documented referencing the RFC.

krwq · 2019-10-08T19:50:47Z

@cocowalla if you already have it feel free to go ahead and create a PR (mark it as a draft since there was no API review yet). Once I catch up with the details I'll help with the API review and possibly with finishing up if needed.

We will likely need to add spanified version though but we can start iterating on your code since it shouldn't be hard to convert it

cocowalla · 2019-10-08T20:36:33Z

@krwq done, dotnet/corefx#41651!

krwq · 2019-10-08T21:25:37Z

Thanks! I'll take a look once I got better understanding on the area (need to finish reading theory and see how other implementation shaped their APIs and see if there are any pitfalls or anything we should watch out for when designing APIs)

krwq · 2019-10-21T22:01:35Z

I've talked with @bartonjs offline on instance vs static and we believe there is not really much value in having the instance over static methods. The top post was edited with the API proposal.

We've also discussed how the design in case we need to generate multiple keys (or use HKDF as pseudo random generator). In that case we will create API which takes similar inputs to DeriveKey but will return stream which will be producing bytes but that is out of scope here (especially we're not sure anyone actually needs it).

orip · 2019-10-22T00:32:36Z

... (or use HKDF as pseudo random generator). ...

HKDF has a maximum output, see section 2.3 in the RFC:
L length of output keying material in octets (<= 255*HashLen)
This should probably be asserted at runtime, and makes HKDF a bad candidate for a PRNG.

krwq · 2019-10-22T01:19:15Z

@orip yes, this will be asserted.

Let's worry about it when someone needs it (they can still pass large byte array right now).

Per spec there are some valid scenarios though:

HKDF is intended for use in a wide variety of KDF applications.
   These include the building of pseudorandom generators from imperfect
   sources of randomness (such as a physical random number generator
   (RNG)); the generation of pseudorandomness out of weak sources of
   randomness, such as entropy collected from system events, user's
   keystrokes, etc.

terrajobst · 2019-11-12T18:29:10Z

Video

~~Expand. We should align the optionality of the info parameter between the span version and the byte array version.~~
- This might cause ambiguity. We decided that optionality is omitted from the span versions
- However, we should align the order to outputLength and output. We should swap output and info in the span-based version.

namespace System.Security.Cryptography
{
    public static class HKDF
    {
        public static byte[] Extract(HashAlgorithmName hashAlgorithmName, byte[] ikm, byte[] salt = null);
        public static int Extract(HashAlgorithmName hashAlgorithmName, ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, Span<byte> prk);

        public static byte[] Expand(HashAlgorithmName hashAlgorithmName, byte[] prk, int outputLength, byte[] info = null);
        public static void Expand(HashAlgorithmName hashAlgorithmName, ReadOnlySpan<byte> prk, Span<byte> output, ReadOnlySpan<byte> info);

        public static byte[] DeriveKey(HashAlgorithmName hashAlgorithmName, byte[] ikm, int outputLength, byte[] salt = null, byte[] info = null);
        public static void DeriveKey(HashAlgorithmName hashAlgorithmName, ReadOnlySpan<byte> ikm, Span<byte> output, ReadOnlySpan<byte> salt, ReadOnlySpan<byte> info);
    }
}

krwq self-assigned this Oct 8, 2019

krwq closed this as completed in dotnet/corefx#42567 Nov 14, 2019

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 5.0 milestone Jan 31, 2020

ghost locked as resolved and limited conversation to collaborators Dec 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HMAC-based Extract-and-Expand Key Derivation Function (HKDF) #26147

HMAC-based Extract-and-Expand Key Derivation Function (HKDF) #26147

cocowalla commented May 11, 2018

bartonjs commented May 11, 2018

cocowalla commented May 11, 2018

bartonjs commented May 11, 2018 •

edited

Loading

cocowalla commented May 12, 2018

krwq commented Sep 16, 2019 •

edited

Loading

bartonjs commented Sep 16, 2019

orip commented Oct 2, 2019 •

edited

Loading

krwq commented Oct 8, 2019

cocowalla commented Oct 8, 2019

krwq commented Oct 8, 2019 •

edited

Loading

cocowalla commented Oct 8, 2019

krwq commented Oct 8, 2019

krwq commented Oct 21, 2019

orip commented Oct 22, 2019

krwq commented Oct 22, 2019

terrajobst commented Nov 12, 2019 •

edited

Loading

HMAC-based Extract-and-Expand Key Derivation Function (HKDF) #26147

HMAC-based Extract-and-Expand Key Derivation Function (HKDF) #26147

Comments

cocowalla commented May 11, 2018

bartonjs commented May 11, 2018

cocowalla commented May 11, 2018

bartonjs commented May 11, 2018 • edited Loading

cocowalla commented May 12, 2018

krwq commented Sep 16, 2019 • edited Loading

bartonjs commented Sep 16, 2019

orip commented Oct 2, 2019 • edited Loading

krwq commented Oct 8, 2019

cocowalla commented Oct 8, 2019

krwq commented Oct 8, 2019 • edited Loading

cocowalla commented Oct 8, 2019

krwq commented Oct 8, 2019

krwq commented Oct 21, 2019

orip commented Oct 22, 2019

krwq commented Oct 22, 2019

terrajobst commented Nov 12, 2019 • edited Loading

bartonjs commented May 11, 2018 •

edited

Loading

krwq commented Sep 16, 2019 •

edited

Loading

orip commented Oct 2, 2019 •

edited

Loading

krwq commented Oct 8, 2019 •

edited

Loading

terrajobst commented Nov 12, 2019 •

edited

Loading