New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protect Dictionary<,>
against hash collision flood attacks
#4761
Comments
Wasn't this resolved >= 4.0.30319.17929 http://www.troyhunt.com/2012/08/fixing-hash-dos-good-and-proper-and.html |
It's good to see that for strings, but it's entirely possible for requests to include dictionaries with other key types. The scenario I'm currently interested in is using Protocol Buffers for RPC requests, but I'm sure that's not the only plausible scenario. Imagine a (It does suggest another approach that could be used, mind you - one where I create my own randomized equality comparers for each of the key types I'm interested in.) |
I did implement a randomly seeded hasher; though it had a much smaller surface area (collection is bounded); but may be helpful in the interim until you get a better response - works on uint/byte chunks. Useful in that no two servers will have the same pattern and the collisions will be different between runs; so attack learnings aren't transferable. |
Why not just override Alternatively, since it has the potential to break existing code, why not make it an entirely new class that implements What is the performance impact of this change for varying set sizes and complexity including value type vs reference type? Comparing O(N^2) and O(log N) is only meaningful when the work involved is the same. Its entirely possible (and does happen as evident in sort algorithms) that O(N^2) may actually be faster. |
The keys I'm interested in are strings, the In terms of the O(N^2) vs O(N log N), aside from possible locality of reference issues, I would expect this to be faster than the current implementation in situations where it triggers at all (i.e. large hash buckets). I can't immediately see any reason why it would be worse, although obviously performance can be very hard to predict. Of course, the detection of the situation has some impact as well. I wouldn't expect any boxing to occur, assuming we really detected |
Another issue to consider is that it would change the enumeration order for items stored in the Dictionary. Dictionary (last I checked) does not provide any guarantees about the order; however, that does not mean changing it will not break code. This is a real world issue that was filed against mono's implementation of Dictionary. The performance profile is the most important item to establish in my opinion. We need some real world data to show what benefit or regression it will offer. For those not using Dictionary as an adhoc database, its important this does not slow them down. There is a lot of code and algorithms out there using very small Dictionaries that do not have key collision issues. They are often part of high traffic APIs, so even a small increase in time would add up very quickly. |
I think this is a reasonable idea to explore. The general push-back would be around compatibility. As you can see from the existing code we only cut over to the different hashing algorithms when we have good reason to believe we are under a hash-flooding attack and not as an opportunity to improve performance. We would also need to make sure to do this only in cases we were reasonably sure that moving to It's a bit of a bummer that the FWIW, I believe that we have plans to expose the Randomized String Hashing Equality comparer publicly so you can ensure you always use it when you care about these sorts of things. When we do that work, it probably also makes sense to figure out how to expose the primitives in way such that you can hash whatever binary stream of data you have. I think doing this will be easier from an engineering point of view when compared with adding a bunch more smarts to |
I agree with the trickiness being on the compatibility front, unless it was opt-in via a factory method or new constructor parameter. I believe that there are at least theoretical hash flooding attacks with any hashing implementation, but I'll be the first to admit I'm out of my depth here - and it certainly feels like a randomized hash is better than nothing for hash flood prevention. Thanks for thinking about it, and let me know anything else you'd like me to contribute to the matter - bearing in mind my position of relative ignorance on security matters. |
Given this thread hasn't been updated in a while, @GrabYourPitchforks could you comment on what the current state of dotnet is w.r.t. randomizing hashcode generation? My understanding is that it has been added in quite a few places. |
On .NET Core, On .NET Full Framework, For Personally I'd like to see this added as a defense-in-depth measure. Given that there's likely considerable third-party usage of |
@GrabYourPitchforks would randomizing hashcode generation of, say, long be considered a breaking change? We already have a unary Another way of phrasing this, is there any case (other than perhaps System.Int32 itself) where deterministic hashcode generation is a desirable property? |
The current performance of the hashcode generation for the primitive ValueTypes is a desirable property |
Would combining a primitive with a precomputed random seed have an observable perf impact when calling GetHashCode()? |
That's not what the unary But yes, a simple |
Agreed. The obvious downside is that the mitigation only works with the data structures we apply it to. So it begs the question, should we restrict this to Dictionary only? Or do we apply it to other commonly used types as well? I suppose another thing we have to be careful about is to apply a transformation that both mitigates these types of attacks but also doesn't impact the distribution characteristics of the original hashcode. |
You don't need to change the hash code calculation characteristics of any of the built in types to pull this off. The change can be made entirely within dictionary and related types. In Full Framework all dictionary-like types have this same protection built in for strings. HashSet, SortedList, etc. It's around 7 or 8 types total I believe. |
Also, using a random 32-bit seed and XORing it with TKey.GetHashCode isn't a viable mechanism. Keep in mind we don't have to solve this right now. It's not slated for 5.0. |
Hmm... When the data type produces the same hash code in An generic solution would be a separate overload for |
I have a question. AFAIK the class MyEqualityComparer : IEqualityComparer<string>
{
public bool Equals(string x, string y) => EqualityComparer<string>.Default.Equals(x, y);
public int GetHashCode(string x) => EqualityComparer<string>.Default.GetHashCode(x);
} Is the dictionary below protected from attacks? Dictionary<string, int> dict = new(new MyEqualityComparer()); If not, is there any easy way to implement a custom |
String.GetHashCode has the protection so you should be fine |
Correct. That's what I meant by this comment. I don't really see us changing the implementation of any of the existing primitive A parallel implementation of |
Java 8 introduced a new feature into
HashMap
whereby when a significant number of keys fall into the hash bucket and the key type implementsComparable<T>
(broadly equivalent to .NET'sIComparable<T>
), the bucket is converted into a tree using that ordering.This is particularly relevant when creating a map/dictionary with keys coming in over the network from an untrusted source. An attacker with appropriate knowledge of the hash code algorithm used can send N keys which require O(N2) time to insert, as each insertion currently ends up requiring O(N) time due to a linear search of the bucket. Even if the bucket only contains a single hash code, in many cases multiple keys can have the exact same hash code - and in many cases it's trivial to construct such keys. Using the tree approach, the per-key lookup time becomes O(log N), leading to an overall dictionary construction time of O(N log N).
There are downsides to this approach - anything implementing the comparison in a way which is inconsistent with
Equals
can lead to keys not being found. Introducing this as a default implementation would therefore be a breaking change, strictly speaking (although unlikely to hit users who don't deliberately seek it out). It would be really nice to have support for this within the BCL however, either as a separateIDictionary<,>
implementation, or an option when creating aDictionary<,>
.The text was updated successfully, but these errors were encountered: