Protect `Dictionary<,>` against hash collision flood attacks #4761

jskeet · 2015-12-09T10:40:39Z

Java 8 introduced a new feature into HashMap whereby when a significant number of keys fall into the hash bucket and the key type implements Comparable<T> (broadly equivalent to .NET's IComparable<T>), the bucket is converted into a tree using that ordering.

This is particularly relevant when creating a map/dictionary with keys coming in over the network from an untrusted source. An attacker with appropriate knowledge of the hash code algorithm used can send N keys which require O(N²) time to insert, as each insertion currently ends up requiring O(N) time due to a linear search of the bucket. Even if the bucket only contains a single hash code, in many cases multiple keys can have the exact same hash code - and in many cases it's trivial to construct such keys. Using the tree approach, the per-key lookup time becomes O(log N), leading to an overall dictionary construction time of O(N log N).

There are downsides to this approach - anything implementing the comparison in a way which is inconsistent with Equals can lead to keys not being found. Introducing this as a default implementation would therefore be a breaking change, strictly speaking (although unlikely to hit users who don't deliberately seek it out). It would be really nice to have support for this within the BCL however, either as a separate IDictionary<,> implementation, or an option when creating a Dictionary<,>.

The text was updated successfully, but these errors were encountered:

benaadams · 2015-12-09T11:25:40Z

Wasn't this resolved >= 4.0.30319.17929 http://www.troyhunt.com/2012/08/fixing-hash-dos-good-and-proper-and.html

Also see: https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/Collections/Generic/Dictionary.cs#L386

jskeet · 2015-12-09T11:32:22Z

It's good to see that for strings, but it's entirely possible for requests to include dictionaries with other key types. The scenario I'm currently interested in is using Protocol Buffers for RPC requests, but I'm sure that's not the only plausible scenario. Imagine a Dictionary<long, string> - that would have the same problem. Now it's possible that HashHelpers could be modified to handle all the numeric types and provide randomized equality comparers for all of those... but it's not ideal for it to require BCL support like that.

(It does suggest another approach that could be used, mind you - one where I create my own randomized equality comparers for each of the key types I'm interested in.)

benaadams · 2015-12-09T11:46:32Z

I did implement a randomly seeded hasher; though it had a much smaller surface area (collection is bounded); but may be helpful in the interim until you get a better response - works on uint/byte chunks.

https://github.com/benaadams/KestrelHttpServer/blob/479a54634e7c789bc314358630c3df4c73e68667/src/Microsoft.AspNet.Server.Kestrel/Infrastructure/MemoryPoolIterator2Extensions.cs#L14-L61

Useful in that no two servers will have the same pattern and the collisions will be different between runs; so attack learnings aren't transferable.

OtherCrashOverride · 2015-12-09T20:07:48Z

The scenario I'm currently interested in is using Protocol Buffers for RPC requests

Why not just override Object.GetHashCode to provide your own custom non-predictable, custom tailored values?

Alternatively, since it has the potential to break existing code, why not make it an entirely new class that implements IDictionary<K, V>?

What is the performance impact of this change for varying set sizes and complexity including value type vs reference type? Comparing O(N^2) and O(log N) is only meaningful when the work involved is the same. Its entirely possible (and does happen as evident in sort algorithms) that O(N^2) may actually be faster.

jskeet · 2015-12-09T20:40:25Z

The keys I'm interested in are strings, the ByteString class (which is mine) and various numeric types. So I could definitely implement an IEqualityComparer for those which uses a random seed - it just feels like it would be nicer if this were taken into consideration in the dictionary itself. (In particular, I don't want to rely on the "sometimes on" prevention behaviour already, which means I need to implement my own string hashing, which is unlikely to be as fast or good as the framework code.)

In terms of the O(N^2) vs O(N log N), aside from possible locality of reference issues, I would expect this to be faster than the current implementation in situations where it triggers at all (i.e. large hash buckets). I can't immediately see any reason why it would be worse, although obviously performance can be very hard to predict. Of course, the detection of the situation has some impact as well. I wouldn't expect any boxing to occur, assuming we really detected IComparable<T> rather than just IComparable.

OtherCrashOverride · 2015-12-09T21:55:05Z

Another issue to consider is that it would change the enumeration order for items stored in the Dictionary. Dictionary (last I checked) does not provide any guarantees about the order; however, that does not mean changing it will not break code. This is a real world issue that was filed against mono's implementation of Dictionary.

The performance profile is the most important item to establish in my opinion. We need some real world data to show what benefit or regression it will offer. For those not using Dictionary as an adhoc database, its important this does not slow them down. There is a lot of code and algorithms out there using very small Dictionaries that do not have key collision issues. They are often part of high traffic APIs, so even a small increase in time would add up very quickly.

ellismg · 2015-12-14T22:33:00Z

I think this is a reasonable idea to explore. The general push-back would be around compatibility. As you can see from the existing code we only cut over to the different hashing algorithms when we have good reason to believe we are under a hash-flooding attack and not as an opportunity to improve performance. We would also need to make sure to do this only in cases we were reasonably sure that moving to IComparable<T> was safe (e.g. default equality comparer is in use or the comparer object also implements IComparable<T>).

It's a bit of a bummer that the Dictionary<TKey, TValue> implementation lives so low in the stack. We'll have to pull in some sort of balance binary tree to do this the right way, but at the same time we're trying to move code out mscorlib instead of adding a bunch more code. It may make sense to hold off on this until we can pull all the generic collections out of mscorlib.dll and into their own assembly for .NET Core.

FWIW, I believe that we have plans to expose the Randomized String Hashing Equality comparer publicly so you can ensure you always use it when you care about these sorts of things. When we do that work, it probably also makes sense to figure out how to expose the primitives in way such that you can hash whatever binary stream of data you have. I think doing this will be easier from an engineering point of view when compared with adding a bunch more smarts to Dictionary<TKey, TValue>. This would let you build a resilient IEqualityComparer<T>.

jskeet · 2015-12-14T22:53:26Z

I agree with the trickiness being on the compatibility front, unless it was opt-in via a factory method or new constructor parameter.

I believe that there are at least theoretical hash flooding attacks with any hashing implementation, but I'll be the first to admit I'm out of my depth here - and it certainly feels like a randomized hash is better than nothing for hash flood prevention. Thanks for thinking about it, and let me know anything else you'd like me to contribute to the matter - bearing in mind my position of relative ignorance on security matters.

eiriktsarpalis · 2020-05-06T16:50:45Z

Given this thread hasn't been updated in a while, @GrabYourPitchforks could you comment on what the current state of dotnet is w.r.t. randomizing hashcode generation? My understanding is that it has been added in quite a few places.

GrabYourPitchforks · 2020-05-06T17:37:54Z

On .NET Core, string.GetHashCode is always randomized, regardless of whether it's used in a dictionary.

On .NET Full Framework, string.GetHashCode is not randomized by default. However, Dictionary<string, ...> (and other similar in-box collection types) are always protected against hash collisions as long as the dictionary is instantiated using the null comparer or using one of the in-box StringComparer implementations.

For Dictionary<TKey, TValue> where TKey is anything other than string, there are no in-box protections across either .NET Core or .NET Framework against hash collision attacks. Either TKey.GetHashCode needs to be randomized, or the entity which instantiated the dictionary instance needs to pass a randomizing comparer. As Jon said, this implies that it is not safe to instantiate a Dictionary<long, ...> and to populate it with untrusted input. We check first-party code for these patterns during security review. And if somebody discovers that any first-party Microsoft code is susceptible to this attack they can disclose it responsibly and the report should qualify for a bounty. But it's unlikely that third-party code follows the same rigorous review process.

Personally I'd like to see this added as a defense-in-depth measure. Given that there's likely considerable third-party usage of Dictionary<TKey, ...> where TKey is a non-string primitive, this would give them a bit extra protection.

eiriktsarpalis · 2020-05-07T13:19:29Z

@GrabYourPitchforks would randomizing hashcode generation of, say, long be considered a breaking change? We already have a unary HashCode.Combine method which could perhaps be used for this purpose?

Another way of phrasing this, is there any case (other than perhaps System.Int32 itself) where deterministic hashcode generation is a desirable property?

benaadams · 2020-05-07T13:31:15Z

Another way of phrasing this, is there any case (other than perhaps System.Int32 itself) where deterministic hashcode generation is a desirable property?

The current performance of the hashcode generation for the primitive ValueTypes is a desirable property

eiriktsarpalis · 2020-05-07T13:35:14Z

The current performance of the hashcode generation for the primitive ValueTypes is a desirable property

Would combining a primitive with a precomputed random seed have an observable perf impact when calling GetHashCode()?

benaadams · 2020-05-07T13:59:01Z

Would combining a primitive with a precomputed random seed have an observable perf impact when calling GetHashCode()?

That's not what the unary HashCode.Combine does.

But yes, a simple xoring all hashcodes with a random readonly static inside Dictionary (and other Hash-types), probably wouldn't have a huge impact and isolating it inside these datastructures rather than changing .GetHashCode for the types would likely minimise any breaks; and would give a degree of process instantiation isolation on hashcode collisions? (e.g. a sequence causing collisions wouldn't be immediately be applicable to another instantiation of the same exe)

eiriktsarpalis · 2020-05-07T14:30:48Z

probably wouldn't have a huge impact and isolating it inside these datastructures rather than changing .GetHashCode for the types would likely minimise any breaks

Agreed. The obvious downside is that the mitigation only works with the data structures we apply it to. So it begs the question, should we restrict this to Dictionary only? Or do we apply it to other commonly used types as well?

I suppose another thing we have to be careful about is to apply a transformation that both mitigates these types of attacks but also doesn't impact the distribution characteristics of the original hashcode.

GrabYourPitchforks · 2020-05-07T15:38:01Z

You don't need to change the hash code calculation characteristics of any of the built in types to pull this off. The change can be made entirely within dictionary and related types.

In Full Framework all dictionary-like types have this same protection built in for strings. HashSet, SortedList, etc. It's around 7 or 8 types total I believe.

GrabYourPitchforks · 2020-05-07T15:45:05Z

Also, using a random 32-bit seed and XORing it with TKey.GetHashCode isn't a viable mechanism.

Keep in mind we don't have to solve this right now. It's not slated for 5.0.

markusschaber · 2023-03-01T07:42:42Z

You don't need to change the hash code calculation characteristics of any of the built in types to pull this off. The change can be made entirely within dictionary and related types.

Hmm... When the data type produces the same hash code in GetHashCode(), any random value the Dictionary applies after the fact will not resolve the hash collision.
So mixing in a static nonce in the container will not protect against attacks which manage to produce the same GetHashCode() result, which should be trivial e. G. for GUIDs or long.

An generic solution would be a separate overload for GetHashCode(int nonce), but that would essentially amount to a parallel implementation of the whole GetHashCode()-Ecosystem, and probably some performance loss - the GetHashCode implementations will need to be a bit more complex, as the basic trivial XOR-based implementations will not fix the problem.

theodorzoulias · 2023-03-01T11:44:43Z

I have a question. AFAIK the Dictionary<string, TValue> is protected against HashDoS attacks. Is it also protected in case the dictionary is constructed with a custom IEqualityComparer<string> implementation, that delegates to the EqualityComparer<string>.Default? Or this indirection invalidates the protection?

class MyEqualityComparer : IEqualityComparer<string>
{
    public bool Equals(string x, string y) => EqualityComparer<string>.Default.Equals(x, y);
    public int GetHashCode(string x) => EqualityComparer<string>.Default.GetHashCode(x);
}

Is the dictionary below protected from attacks?

Dictionary<string, int> dict = new(new MyEqualityComparer());

If not, is there any easy way to implement a custom IEqualityComparer<string> that maintains the protection?

benaadams · 2023-03-01T11:58:35Z

I have a question. AFAIK the Dictionary<string, TValue> is protected against HashDoS attacks. Is it also protected in case the dictionary is constructed with a custom IEqualityComparer<string> implementation, that delegates to the EqualityComparer<string>.Default? Or this indirection invalidates the protection?

String.GetHashCode has the protection so you should be fine

GrabYourPitchforks · 2023-03-23T04:25:40Z

So mixing in a static nonce in the container will not protect against attacks which manage to produce the same GetHashCode() result, which should be trivial e. G. for GUIDs or long.

Correct. That's what I meant by this comment. I don't really see us changing the implementation of any of the existing primitive GetHashCode routines since people currently depend on them to be fast and predictable.

A parallel implementation of GetHashCode is what we prototyped earlier and it does indeed work. It's also not as invasive as you might think. One needs to special-case some commonly used structs (Guid, long, DateTime, etc.); enlighten ValueTuple, Tuple, and HashCode to understand these special cases; and wire the whole thing up through Dictionary and friends. That's sufficient to address all the use cases we see in practice. There might be some niche use cases which aren't covered by this, but those should be vanishingly rare.

msftgits transferred this issue from dotnet/coreclr Jan 30, 2020

msftgits added this to the Future milestone Jan 30, 2020

maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 26, 2020

eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label May 6, 2020

GrabYourPitchforks mentioned this issue May 12, 2020

Improve dictionary lookup perf for OrdinalIgnoreCase #36252

Merged

GrabYourPitchforks self-assigned this Nov 10, 2020

GrabYourPitchforks mentioned this issue Nov 16, 2020

System.Text.Json doesn't deserialize IDictionary with TKey types other than string #44726

Closed

GrabYourPitchforks mentioned this issue Dec 14, 2020

Add Dictionary DOS attack warning dotnet/docs#21934

Merged

GrabYourPitchforks mentioned this issue Feb 24, 2021

Add a method consuming ReadOnlySpan<byte> to System.HashCode #48702

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protect `Dictionary<,>` against hash collision flood attacks #4761

Protect `Dictionary<,>` against hash collision flood attacks #4761

jskeet commented Dec 9, 2015

benaadams commented Dec 9, 2015

jskeet commented Dec 9, 2015

benaadams commented Dec 9, 2015

OtherCrashOverride commented Dec 9, 2015

jskeet commented Dec 9, 2015

OtherCrashOverride commented Dec 9, 2015

ellismg commented Dec 14, 2015

jskeet commented Dec 14, 2015

eiriktsarpalis commented May 6, 2020

GrabYourPitchforks commented May 6, 2020

eiriktsarpalis commented May 7, 2020 •

edited

benaadams commented May 7, 2020

eiriktsarpalis commented May 7, 2020

benaadams commented May 7, 2020 •

edited

eiriktsarpalis commented May 7, 2020

GrabYourPitchforks commented May 7, 2020

GrabYourPitchforks commented May 7, 2020

markusschaber commented Mar 1, 2023

theodorzoulias commented Mar 1, 2023

benaadams commented Mar 1, 2023

GrabYourPitchforks commented Mar 23, 2023

Protect Dictionary<,> against hash collision flood attacks #4761

Protect Dictionary<,> against hash collision flood attacks #4761

Comments

jskeet commented Dec 9, 2015

benaadams commented Dec 9, 2015

jskeet commented Dec 9, 2015

benaadams commented Dec 9, 2015

OtherCrashOverride commented Dec 9, 2015

jskeet commented Dec 9, 2015

OtherCrashOverride commented Dec 9, 2015

ellismg commented Dec 14, 2015

jskeet commented Dec 14, 2015

eiriktsarpalis commented May 6, 2020

GrabYourPitchforks commented May 6, 2020

eiriktsarpalis commented May 7, 2020 • edited

benaadams commented May 7, 2020

eiriktsarpalis commented May 7, 2020

benaadams commented May 7, 2020 • edited

eiriktsarpalis commented May 7, 2020

GrabYourPitchforks commented May 7, 2020

GrabYourPitchforks commented May 7, 2020

markusschaber commented Mar 1, 2023

theodorzoulias commented Mar 1, 2023

benaadams commented Mar 1, 2023

GrabYourPitchforks commented Mar 23, 2023

Protect `Dictionary<,>` against hash collision flood attacks #4761

Protect `Dictionary<,>` against hash collision flood attacks #4761

eiriktsarpalis commented May 7, 2020 •

edited

benaadams commented May 7, 2020 •

edited