-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HMAC-based Extract-and-Expand Key Derivation Function (HKDF) #26147
Comments
@cocowalla Before adding support for a new algorithm there's an internal review process we have to go through, so no PR just yet :). As for the API, it would probably be Span-based. public class HKDF
{
public HKDF(HashAlgorithmName hashAlgorithm);
// Maybe also ReadOnlySpan<byte>, ReadOnlySpan<byte> => byte[], if it seems worthwhile.
bool TryExtract(ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, Span<byte> prk, out int bytesWritten);
// Fills destination, however big it is.
void Expand(ReadOnlySpan<byte> prk, ReadOnlySpan<byte> info, Span<byte> destination);
// Fills destination, however big it is.
void DeriveKey(ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, ReadOnlySpan<byte> info, Span<byte> destination);
} That's assuming that it even makes sense to have instances of HKDF. Maybe it should be static methods and move the HashAlgorithmName into a parameter. (I also got rid of salt and info having a default of null/empty, since the RFC says they are allowed to be omitted, but strongly recommends them) |
If you want a strict implementation of the RFC, Not sure about On static vs instance: HKDF doesn't really need state, but it would be extremely rare to use a different hash algorithm for the Extract and Expand phases - in part because the input key material for Expand must be at least the same length as the hash it uses. The RFC certainly doesn't make any mention of using 2 different algorithms. I think it makes sense to have a ctor that takes a I don't really understand why I'd personally prefer that Expand took an I haven't worked with Spans yet, but I believe they are only present in .NET Core 2.1 at present? Is it a 'policy' for new classes to only use Spans, or to have both byte array and Span-based methods? |
Yep, the test vectors don't talk about a "missing" salt, they talk about an "empty" salt. So ReadOnlySpan.Empty.
I guess it doesn't matter much. If the type is IDisposable then it can open the hasher in the ctor and close it in Dispose, which would be better than opening it twice.
When
Returning a Span doesn't make a lot of sense here. When a Span is a return it usually means it is a sub-range of the input data. If it was going to allocate a new array for this it would be better to be honest and return it as an array. (But then the question is: why do we need to wake the GC up to allocate an array for this 32-byte artifact when we could have just written it to a stackalloc Span to continue the rest of the computation?)
Yep, hence the
You can think of Span as a triplet of { array, offset, length }. It's more powerful than that, but you can think of it that way. So int length is a parameter, it's just contained within the Span structure.
Essentially. The NuGet package is available for netfx, but the vast majority of Span-utilizing API is in netcoreapp21.
If it's input it doesn't usually make sense to have both (arrays are implicitly convertible to Span and ReadOnlySpan) -- the exception being when something "better" can be done for an array. For output it's more context-specific. "Write this to here" has the "how much was written?" out value (if it's not "fill'er up") and the "what if it's too short?" problem. Array-creating-and-returning things have the problem that they can become limiting factors for throughput, because they mean more time is spent in garbage collection. Mostly where we have two ways of doing things it's from "before Span" and "after/with Span", but they can also be thought of as "easier" and "more performant at the expense of ease". For a "fill'er up" API, the difference is byte[] derived = hkdf.DeriveWhatever(ikm, salt, info, 32); and byte[] derived = new byte[32];
hkdf.DeriveWhatever(ikm, salt, info, derived); which doesn't look too much different. The latter allows for byte[] derived = new byte[32];
hkdf.DeriveWhatever(ikm, salt, info1, derived.Slice(0, 16));
hkdf.DeriveWhatever(ikm, salt, info2, derived.Slice(16));
// use the key which was two 16-byte concatenations with different info values vs byte[] firstHalf = hkdf.DeriveWhatever(ikm, salt, info1, 16);
byte[] secondHalf = hkdf.DeriveWhatever(ikm, salt, info2, 16);
// Or however.
byte[] derived = firstHalf.Concat(secondHalf).ToArray();
// Now use the key. firstHalf and secondHalf may have contributed to heap fragmentation Span also lets you do some more interesting things, like avoid the GC altogether: public void DeriveWhatever(ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, ReadOnlySpan<byte> info, Span<byte> destination)
{
Span<byte> prk = stackalloc byte[512 / 8];
if (!TryExtract(ikm, salt, prk, out int bytesWritten))
{
throw new CryptographicException("What hash were you even using?!");
}
Expand(prk.Slice(0, bytesWritten), info, destination);
// This method, in and of itself, contributed no GC overhead.
// Unless you use some future hash > 512-bit, then it allocated an exception.
} |
@bartonjs, OK, your API suggestion makes more sense in the context of Spans - thanks for taking the time to elucidate. I work with byte arrays a lot, so I can see that Span is going to be really useful for reducing allocations. |
I'm not sure if HashAlgorithmName would be a good choice for the input considering that COSE uses HKDF with PRF (see Table 12) but I guess it would be acceptable to call it a hashing function given that HKDF spec asks for a hashing function It might also be worth taking enum HashAlgorithmType as an input instead of struct over string although namespace might be arguable. |
Definitely not 😄. That's a TLS-related enum. HashAlgorithmName (the strongly-typed string) is the way that we communicate cryptographic digest algorithms (and there are accelerators, so it looks like an enum).
HKDF itself is "pick a hash, use HMAC":
HKDF-Expand also uses HMAC{Hash}; but it's not a one-liner 😄. I don't see a formal definition of "HKDF with AES-MAC"; but since that's creating HKDF with a different sort of flavor, maybe instead of constructors it needs static factories (to name why they're different). |
The RFC specifically mentions the case of a missing salt and provides test vectors for both an empty salt and a missing salt.
|
I'll start looking into this now. Note I will need to do some more reading before I make any progress since I'm not too familiar with implementation details |
@krwq I have a non-spanified, "pure" RFC implementation I can share if you'd like - it's simple and documented referencing the RFC. |
@cocowalla if you already have it feel free to go ahead and create a PR (mark it as a draft since there was no API review yet). Once I catch up with the details I'll help with the API review and possibly with finishing up if needed. We will likely need to add spanified version though but we can start iterating on your code since it shouldn't be hard to convert it |
@krwq done, dotnet/corefx#41651! |
Thanks! I'll take a look once I got better understanding on the area (need to finish reading theory and see how other implementation shaped their APIs and see if there are any pitfalls or anything we should watch out for when designing APIs) |
I've talked with @bartonjs offline on instance vs static and we believe there is not really much value in having the instance over static methods. The top post was edited with the API proposal. We've also discussed how the design in case we need to generate multiple keys (or use HKDF as pseudo random generator). In that case we will create API which takes similar inputs to DeriveKey but will return stream which will be producing bytes but that is out of scope here (especially we're not sure anyone actually needs it). |
HKDF has a maximum output, see section 2.3 in the RFC: |
@orip yes, this will be asserted. Let's worry about it when someone needs it (they can still pass large byte array right now). Per spec there are some valid scenarios though:
|
namespace System.Security.Cryptography
{
public static class HKDF
{
public static byte[] Extract(HashAlgorithmName hashAlgorithmName, byte[] ikm, byte[] salt = null);
public static int Extract(HashAlgorithmName hashAlgorithmName, ReadOnlySpan<byte> ikm, ReadOnlySpan<byte> salt, Span<byte> prk);
public static byte[] Expand(HashAlgorithmName hashAlgorithmName, byte[] prk, int outputLength, byte[] info = null);
public static void Expand(HashAlgorithmName hashAlgorithmName, ReadOnlySpan<byte> prk, Span<byte> output, ReadOnlySpan<byte> info);
public static byte[] DeriveKey(HashAlgorithmName hashAlgorithmName, byte[] ikm, int outputLength, byte[] salt = null, byte[] info = null);
public static void DeriveKey(HashAlgorithmName hashAlgorithmName, ReadOnlySpan<byte> ikm, Span<byte> output, ReadOnlySpan<byte> salt, ReadOnlySpan<byte> info);
}
} |
At present,
System.Security.Cryptography
has implementations of password-based key derivation functions (KDFs), specifically PBKDF1 and PBKDF2.There is a need to also support non-password based KDFs that derive keys from other secrets or cryptographic key material without the intentional slowing of computation that is so desirable in password-based KDFs.
HMAC-based Extract-and-Expand Key Derivation Function (HKDF) has a simple design, and is one of the most prevalent non-password based KDFs in use today. I'm proposing adding an HKDF implementation in the
System.Security.Cryptography
namespace.The RFC has full details, as does Krawczyk's original paper, but I'll supply a brief overview here. HKDF has 2 steps, the 1st of which can be omitted if you already have cryptographically strong key material:
I already have an implementation, which includes tests against the test vectors given in the RFC, and would be happy to submit a PR if you would be interested in having it.
The API looks like this:
It might be worth also adding a method at a higher abstraction level that performs both steps in one call:
byte[] DeriveBytes(byte[] ikm, int outputLen, byte[] salt = null, byte[] info = null);
@bartonjs, I think this is your domain - what are your thoughts on this? Interested in a PR?
The text was updated successfully, but these errors were encountered: