Implement adaptive hashing using specialization #5

pczarn · 2016-03-22T18:07:28Z

Adaptive hashing provides fast and complexity-safe hashing for hashmaps with simple key types. The user doesn't need to change any code to get speedups from adaptive hashing, in contrast to the use of FnvHasher.

apasel422 · 2016-03-22T18:22:41Z

src/lib.rs

@@ -54,6 +59,8 @@ use table::BucketState::{
    Full,
 };

+pub use adaptive_map::HashMapInterface;


Does this need to be public?

Yes, this exists temporarily. As long as specialization doesn't work for inherent impls, users of adaptive hashing must import this trait.

It seems like the inherent methods on HashMap could defer to HashMapInterface, no?

I'll try that, it's a great idea!

pczarn · 2016-03-22T19:57:20Z

Now, some benchmarks. With black_box, in benches directory. This benchmark indicates real differences the most:

Adaptive

Adaptive
find_existing     bench: 13,655 ns/iter (+/- 2,068)
find_nonexisting  bench: 15,241 ns/iter (+/- 411)
get_remove_insert bench:     68 ns/iter (+/- 8)
grow_by_insertion bench:    131 ns/iter (+/- 11)
hashmap_as_queue  bench:     60 ns/iter (+/- 8)
new_drop          bench:      3 ns/iter (+/- 0)
new_insert_drop   bench:     84 ns/iter (+/- 2)

(Outdated!! the 2-4 variant is no longer used) SipHash-2-4

find_existing     bench: 26,208 ns/iter (+/- 916)
find_nonexisting  bench: 24,833 ns/iter (+/- 5,082)
get_remove_insert bench:    118 ns/iter (+/- 0)
grow_by_insertion bench:    159 ns/iter (+/- 5)
hashmap_as_queue  bench:     82 ns/iter (+/- 0)
new_drop          bench:     76 ns/iter (+/- 3)
new_insert_drop   bench:    167 ns/iter (+/- 7)

apasel422 · 2016-03-22T20:24:22Z

src/lib.rs

@@ -317,7 +323,7 @@ fn test_resize_policy() {
 /// }
 /// ```
 #[derive(Clone)]
-pub struct HashMap<K, V, S = RandomState> {
+pub struct HashMap<K, V, S = AdaptiveState> {


Changing this would be a breaking change at the API level, right?

Yes, this change can't be added to the std library. Is this library a drop-in replacement for std's HashMap?

I think the purpose of this repo is to iterate outside std, but with the ultimate goal of incorporating any worthwhile changes back into std.

pczarn · 2016-03-23T12:56:54Z

Two problems remain.

~~First, adaptive maps are not yet safeguarded against DoS by insertion through Entry. The first step is tracking nightly (#6).~~

Second, the DoS safeguard ignores the effect of fn robin_hood. The safeguard is simple; consider a situation where the user does an insertion followed by searches. The insertion of an element ensures that searches for that element in the future won't take more than a limited number of iterations (a threshold of 128). However, in between element insertion and searches for that element, other unrelated insertions may displace that element.

It seems ~~clear~~ that fn robin_hood by definition can't increase the displacement of a "unfortunate" ousted element beyond the displacement of another more "fortunate" element. Since the latter is already below the threshold of 128, it follows that the former won't be increased above 128.

This should be closely reconsidered and documented in the form of a proof.

pczarn · 2016-03-24T12:56:02Z

Updated and rebased.

/cc @contain-rs/publishers @bill-myers @pnkfelix @arthurprs @Jurily @gereeter @ticki @divyekapoor
rust-lang/rfcs#631, rust-lang/rust#11783 "Implement fast HashMap"
rust-lang/rust#29754 "Change Siphash to use one of the faster variants of the algorithm (Siphash13, Highwayhash)"
rust-lang/rust#28044 "WIP: Hash and Hasher update for faster common case hashing"
Source of the idea: rust-lang/rust#11783 (comment)

arthurprs · 2016-03-24T13:27:40Z

The difference is brutal.

Does this apply to &str/String as well?

pczarn · 2016-03-24T14:12:21Z

Does this apply to &str/String as well?

Not yet, because this requires explicit one-shot hashing with a hasher such as FarmHash, which probably needs an RFC.

ticki · 2016-03-24T15:25:25Z

Wonderful, @pczarn. Great work!

pczarn · 2016-07-13T06:25:55Z

The difference is now smaller, because SipHash-1-3 is used by default.

pczarn · 2016-07-13T06:26:39Z

I'm currently writing an RFC.

arthurprs · 2016-07-13T06:29:52Z

We should investigate using the integers as the hash in the Adaptive path.

pczarn · 2016-07-13T12:19:29Z

@arthurprs I'm sure there's no good way of using integers as the hash. The adaptive path allows us to use a non-cryptographic hash, but the hash should still be statistically good.

However, we could use hashes that are cheaply converted back to integer keys. That is, a reversible function for hashing. That would save some space at the expense of slower access to integer keys. Unfortunately, HashMap exposes iterators that must borrow keys stored somewhere inside the structure. It's a dead end.

There's more to be gained from using 32-bit hashes on 32-bit platforms. Perhaps 48-bit hashes on 64-bit platforms, too.

bstrie · 2016-09-14T17:20:23Z

@pczarn If the collision threshold is N, what's to stop an attacker from attempting to collide many different buckets N-1 times in order to DoS the server without triggering the collision detector?

ticki · 2016-09-14T17:30:41Z

@bstrie How much different would that be from just spamming keys randomly to fill the hashtable up? Clearly, it is limited how much you can delay a particular key.

comex · 2016-09-15T06:01:25Z

@ticki Very different, as the expected number of probes per lookup for random keys is constant - something like 3, AFAIK, depending on the load factor - no matter how big the hash table gets, while the worst case before hitting the limit is 127 probes.

If the u64 hash values are different (only equal modulo the table size), that's just 127 u64 comparisons, which is a bit slower but no big deal. If they're the same, though, that could be 127 comparisons of really long strings. In theory (as is mentioned in one of the todo comments) this could be a problem even with a single chain that gets repeatedly looked up - lookups aren't much cheaper than insertions, so AFAICS there's no real need to blow up the table...

This should be solvable by specifically checking for equal hashes. Set a much lower chain length limit for the fast path, like 16 or so - which should still be very uncommon. Every insertion that exceeds that chain length triggers a separate function that scans the chain to see if many keys (say, also 16) have equal hashes. If this is true, perform the switch, and of course still do it if the chain length exceeds 128 even with unequal hashes.

If the hasher truly behaves randomly (as it really should for non-attack scenarios), the chance of even one 64-bit collision should be rather low, and every additional collision required divides the probability by on the order of 2^64.

Well, technically not just attack scenarios: it's possible to end up with a non-malicious but systemic source of full collisions, such as if someone's custom Hash implementation hashes only part of the object. That usually means the input to the hasher is the same, so with enough collisions the hash table is doomed to failure no matter what hash it uses (or if anything, switching to SipHash may be beneficial). But if there are only a few objects with equal hasher input, my proposal would make a useless switch more likely. Since this would already be a serious bug and SipHash is not that slow, I don't think that's a big deal.

ticki · 2016-09-15T08:29:37Z

@comex

That's not really my point. Say you want to attack key K, then you generate a bunch of preimages of hash(K) % E. You can maximally insert N of these before switching hash function. Due to probing, doing so to another key simultaneously would require that the entry is at minimum N entries away from any other attacked keys. This means that you can at most attack E / N entries, i.e. you slow down E / N keys. Well, you'd need N insertions for each attack, hence E insertions to complete such an attack. That's no different from just repeatedly inserting without any knowledge about the internals.

The only advantage over the uninformed attack is that you partially get to choose which keys to slow down, but even you can only slow it down by N probes.

@pczarn

An idea: When reallocating the table, it should check the highest probe length and conditionally switch back to the fast hash function.

ticki · 2016-09-15T08:33:15Z

I'm fine with having N = 16. A random hash function in a table of 1024 entries should emit such behavior naturally with a probability less than 0.0001% (from the collision counting equation).

Edit: My calculation was off.

Veedrac · 2016-09-15T12:04:03Z

@comex

Set a much lower chain length limit for the fast path, like 16 or so - which should still be very uncommon.

Sadly strong chain length bounds only apply with the double-probing variant of Robin Hood hashing. Chains regularly get to length ~46 for million-long maps with the linear probing variant.

Insertions have much more worrisome behaviour, since they can end up shifting 1k elements even with purely random elements in the worst case.

ticki · 2016-09-15T15:42:07Z

@Veedrac Yeah, clustering can have a real, negative effect. One possible solution is to use quadratic + Robin Hood. That should make it much less likely to happen.

comex · 2016-09-15T17:57:36Z

@ticki Slowing down isn't all or nothing. If you insert N keys with the same hash-modulo-table-size, the first takes 0 probes, the second takes 1, ..., the Nth takes N-1; total number of probes is N(N-1)/2. That's for each chain; the current E doesn't really matter, since you can just repeat this an arbitrary number of times (as long as you know where in the table there's free space) and let the table be grown, ideally having the hash also be equal modulo the new table size - in which case each resize repeats all the work done so far, asymptotically doubling the total number of probes.

The average number of probes per insertion (which we understand as a cost to the attacker) is thus roughly (N-1)/2 without growth - for N=128, that's 63.5 - while inserting keys randomly keeps it under 3. Double both numbers to factor in growth.

Anyway, that's the worst case for the attacker, if (a) they can only perform an arbitrary number of insertions, not lookups, and (b) the inserted keys are required to be unique (e.g. the program refuses the request if an insertion finds an existing key). If they're able to either keep looking up the same key or keep re-inserting it, they can just hammer the last key in the chain, N probes for each operation. In this case the only incentive to create multiple chains is to pessimize cache behavior.

@Veedrac Hrm. If my suggestion to have an intermediate step of checking for fully equal keys is followed, having a few chains >= 16 but < 128 is not the end of the world. I guess it depends how cheap the check can be made...

But if an alternate probing scheme can avoid the code bloat of adding that logic while improving performance in general, it sounds like a good idea even at the expense of some code bloat of its own. Pure double hashing has bad cache behavior, but what about starting with linear probing and switching to double hashing after some low iteration count (like 4)? That is, the probe locations would be h1, h1+1, h1+2, h1+3, h1+h2, h1+2h2, etc. Or maybe ..., h1+3, h1+h2, h1+h2+1, h1+h2+2, h2+h2+3, h2+2h2, ...

ticki · 2016-09-15T20:06:50Z

@comex You forget the fact that collisions are not merely enough. You have to deal with probe length, i.e. you need to keep N distance from other hashes, which reduces the damage.

pczarn · 2016-09-15T20:19:27Z

Why is this only for simple key types?

The code doesn't cover strings yet. The algorithm will eventually work just fine for strings and slices.

can you think of any way to minimize the code-complexity of this proposal?

I can't think of any such way, despite lots of consideration. This proposal is already reasonably simple. If someone can come up with a simplification, I would be impressed and grateful. I still have to write an RFC.

This should be solvable by specifically checking for equal hashes. Set a much lower chain length limit for the fast path, like 16 or so - which should still be very uncommon. Every insertion that exceeds that chain length triggers a separate function that scans the chain to see if many keys (say, also 16) have equal hashes. If this is true, perform the switch, and of course still do it if the chain length exceeds 128 even with unequal hashes.

When a chain turns out not to have many equal hashes, you need to resume insertion. The hard part is implementing the resuming it in a way that won't harm code generation. A recursive call is not ideal for generated code, unless it gets tail call optimization.

Should I be more concerned about equal hashes? I think applications should at least ensure their inputs are not too long. Web servers certainly check the length of query keys.

Typical DDoS is devastating for many hashmaps, e.g. in Java, because every key comparison requires reading a heap-allocated object, which loads a cache line. With large strings, we're reading contiguous memory.

An idea: When reallocating the table, it should check the highest probe length and conditionally switch back to the fast hash function.

I think it's not necessary. Why implement switching back, when the algorithm is meant to never switch in practice.

comex · 2016-09-15T21:11:43Z

@ticki I don't know what you mean. It should be possible to fill up the hash table without gaps, up to the maximum number of entries before resize - like this, imagining N were 4:

slot  0 1 2 3 4 5 6 7 8 9 a b
hash  0 0 0 0 4 4 4 4 8 8 8 8 ...

ticki · 2016-09-15T21:36:46Z

@comex. Good point.

@pczarn

I think it's not necessary. Why implement switching back, when the algorithm is meant to never switch in practice.

Well, let's consider a strict hypothetical scenario:

You use the hashtable for a long running KV store (say, a server), and some attack generates a lot of collisions and inserts them into the hash table. Now, the hash table will move on to a secure hash function. When the attacker realize that their approach is inefficient, they might discontinue the attack. If the hash table keeps growing, and then reallocates, nothing is lost by switching back to the old hash function.

Of course, this gives a new attack vector: To do the collision attack, then let it reallocate another time, switching back to the old function, and repeat indefinite. Although, it is worth noting that getting a table to reallocate is not exactly trivial, since it requires you to insert a high (exponentially growing) number of insertions.

Generally, I think the best way of modeling the security in this case is to compare the attack to the naïve one in which a lot of random keys are spammed, given that it requires this technique to revert it back to an unsecure hash function, it is only marginally worse than having a "never-go-back" approach.

The question is this one: How big is the gain, and is it worth it? Unfortunately, that cannot be measured by micro benchmarks.

pczarn · 2016-10-30T15:42:35Z

I changed the way the safeguard works. Previously, it allowed for huge cost of map creation, only disallowing costly lookups. Now, the safeguard is a part of insertion instead of every lookup.

The code is a bit less complex.

Two tasks.

found the equation for the probability that a randomly picked bucket is in a chain (island) of length X.
wrote an RFC.

arthurprs · 2016-10-31T09:44:45Z

Very cool. It can be seen as an adaptive load factor now, which is great. Maybe it should take into account resizing policy though.

funny-falcon · 2016-11-28T11:12:02Z

It is quite easy to add seed to this mix function:

https://en.wikipedia.org/wiki/Xor-encrypt-xor

In 1991, motivated by Rivest's DESX construction, Even and Mansour proposed a much simpler scheme (the "two-key Even-Mansour scheme"), which they suggested was perhaps the simplest possible block cipher: XOR the plaintext with a prewhitening key, apply a publicly known unkeyed permutation (in practice, a pseudorandom permutation) to the result, and then XOR a postwhitening key to the permuted result to produce the final ciphertext.[3][4]

So if we have 128bit seed, then we can simply:

self.hash = mix(msg_data ^ seed.k0) ^ seed.k1;

This construction will be strong enough to not use fallback to SipHash at all (for simple keys).

(note: Even-Mansour scheme relies on "strong pseudorandom permutation".
mix function is pseudorandom permutation with unknown "strength".
But I'm pretty sure, it is "strong enough" for this use-case, ie as a hash function in a hash table).

funny-falcon · 2016-11-28T11:15:42Z

This construction will be strong enough to not use fallback to SipHash at all (for simple keys).

Then there is no need in Adaptive Hashing for simple keys.
But Adaptive Hashing still could be useful for "more complex" keys (ie strings, structs, etc.), so we just need fast hash function for them.

@pczarn

Adaptive hashmap implementation All credits to @pczarn who wrote rust-lang/rfcs#1796 and contain-rs/hashmap2#5 **Background** Rust std lib hashmap puts a strong emphasis on security, we did some improvements in #37470 but in some very specific cases and for non-default hashers it's still vulnerable (see #36481). This is a simplified version of rust-lang/rfcs#1796 proposal sans switching hashers on the fly and other things that require an RFC process and further decisions. I think this part has great potential by itself. **Proposal** This PR adds code checking for extra long probe and shifts lengths (see code comments and rust-lang/rfcs#1796 for details), when those are encountered the hashmap will grow (even if the capacity limit is not reached yet) _greatly_ attenuating the degenerate performance case. We need a lower bound on the minimum occupancy that may trigger the early resize, otherwise in extreme cases it's possible to turn the CPU attack into a memory attack. The PR code puts that lower bound at half of the max occupancy (defined by ResizePolicy). This reduces the protection (it could potentially be exploited between 0-50% occupancy) but makes it completely safe. **Drawbacks** * May interact badly with poor hashers. Maps using those may not use the desired capacity. * It adds 2-3 branches to the common insert path, luckily those are highly predictable and there's room to shave some in future patches. * May complicate exposure of ResizePolicy in the future as the constants are a function of the fill factor. **Example** Example code that exploit the exposure of iteration order and weak hasher. ``` const MERGE: usize = 10_000usize; #[bench] fn merge_dos(b: &mut Bencher) { let first_map: $hashmap<usize, usize, FnvBuilder> = (0..MERGE).map(|i| (i, i)).collect(); let second_map: $hashmap<usize, usize, FnvBuilder> = (MERGE..MERGE * 2).map(|i| (i, i)).collect(); b.iter(|| { let mut merged = first_map.clone(); for (&k, &v) in &second_map { merged.insert(k, v); } ::test::black_box(merged); }); } ``` _91 is stdlib and _ad is patched (the end capacity in both cases is the same) ``` running 2 tests test _91::merge_dos ... bench: 47,311,843 ns/iter (+/- 2,040,302) test _ad::merge_dos ... bench: 599,099 ns/iter (+/- 83,270) ```

apasel422 reviewed Mar 22, 2016
View reviewed changes

pczarn force-pushed the adaptive_hashing branch from 84929fb to 9dcdf9c Compare March 23, 2016 19:05

pczarn added 7 commits March 24, 2016 08:36

Implement adaptive hashing using specialization

de82ff5

Add benchmarks

667bab0

f update adaptive map for nightly

ab8452f

update benches

aacbf30

no private types in public interfaces; backwards compatibility

db7f552

refactor a large function

5ec6a09

doc

adcd580

pczarn force-pushed the adaptive_hashing branch from 9dcdf9c to 18a8b5f Compare March 24, 2016 09:16

pczarn added 2 commits March 24, 2016 10:32

Use a trait instead of a macro for specialization

a8faa20

remove test

70450dd

pczarn force-pushed the adaptive_hashing branch from 18a8b5f to 3c3aca6 Compare March 24, 2016 09:32

pczarn added 2 commits March 24, 2016 13:50

doc

4ec0ecb

Calculate probabilities

14d75b2

pczarn force-pushed the adaptive_hashing branch from 3c3aca6 to 14d75b2 Compare March 24, 2016 12:52

This was referenced Jul 3, 2016

hashmap: use siphash-1-3 as default hasher rust-lang/rust#33940

Merged

Extend the Hasher trait with fn delimit to support one-shot hashing rust-lang/rfcs#1666

Closed

pczarn added 10 commits September 18, 2016 09:49

Fixes for nightly Rust

df5f2b4

Small refactoring

f381c26

Major refactor

548c9b6

Fix insertion under an incorrect hash

ab6330c

Refactor

7b38667

try insertion and robin_hood instead of search

066e83a

update

de3ccad

Add flag for displacement

6360071

refactor

5e9ca83

Remove unused code

76c3875

pczarn mentioned this pull request Nov 20, 2016

Adaptive hashing rust-lang/rfcs#1796

Closed

arthurprs mentioned this pull request Nov 26, 2016

Exposure of HashMap iteration order allows for O(n²) blowup. rust-lang/rust#36481

Open

arthurprs mentioned this pull request Dec 14, 2016

Adaptive hashmap implementation rust-lang/rust#38368

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement adaptive hashing using specialization #5

Implement adaptive hashing using specialization #5

pczarn commented Mar 22, 2016

apasel422 Mar 22, 2016

pczarn Mar 22, 2016

apasel422 Mar 22, 2016

pczarn Mar 22, 2016

pczarn commented Mar 22, 2016 •

edited

apasel422 Mar 22, 2016

pczarn Mar 23, 2016

apasel422 Mar 23, 2016

pczarn commented Mar 23, 2016 •

edited

pczarn commented Mar 24, 2016

arthurprs commented Mar 24, 2016

pczarn commented Mar 24, 2016

ticki commented Mar 24, 2016

pczarn commented Jul 13, 2016

pczarn commented Jul 13, 2016

arthurprs commented Jul 13, 2016

pczarn commented Jul 13, 2016 •

edited

bstrie commented Sep 14, 2016

ticki commented Sep 14, 2016

comex commented Sep 15, 2016 •

edited

ticki commented Sep 15, 2016

ticki commented Sep 15, 2016 •

edited

Veedrac commented Sep 15, 2016

ticki commented Sep 15, 2016

comex commented Sep 15, 2016 •

edited

ticki commented Sep 15, 2016

pczarn commented Sep 15, 2016 •

edited

comex commented Sep 15, 2016

ticki commented Sep 15, 2016

pczarn commented Oct 30, 2016 •

edited

arthurprs commented Oct 31, 2016

funny-falcon commented Nov 28, 2016 •

edited

funny-falcon commented Nov 28, 2016

Implement adaptive hashing using specialization #5

Are you sure you want to change the base?

Implement adaptive hashing using specialization #5

Conversation

pczarn commented Mar 22, 2016

apasel422 Mar 22, 2016

Choose a reason for hiding this comment

pczarn Mar 22, 2016

Choose a reason for hiding this comment

apasel422 Mar 22, 2016

Choose a reason for hiding this comment

pczarn Mar 22, 2016

Choose a reason for hiding this comment

pczarn commented Mar 22, 2016 • edited

Adaptive

(Outdated!! the 2-4 variant is no longer used) SipHash-2-4

apasel422 Mar 22, 2016

Choose a reason for hiding this comment

pczarn Mar 23, 2016

Choose a reason for hiding this comment

apasel422 Mar 23, 2016

Choose a reason for hiding this comment

pczarn commented Mar 23, 2016 • edited

pczarn commented Mar 24, 2016

arthurprs commented Mar 24, 2016

pczarn commented Mar 24, 2016

ticki commented Mar 24, 2016

pczarn commented Jul 13, 2016

pczarn commented Jul 13, 2016

arthurprs commented Jul 13, 2016

pczarn commented Jul 13, 2016 • edited

bstrie commented Sep 14, 2016

ticki commented Sep 14, 2016

comex commented Sep 15, 2016 • edited

ticki commented Sep 15, 2016

ticki commented Sep 15, 2016 • edited

Veedrac commented Sep 15, 2016

ticki commented Sep 15, 2016

comex commented Sep 15, 2016 • edited

ticki commented Sep 15, 2016

pczarn commented Sep 15, 2016 • edited

comex commented Sep 15, 2016

ticki commented Sep 15, 2016

pczarn commented Oct 30, 2016 • edited

arthurprs commented Oct 31, 2016

funny-falcon commented Nov 28, 2016 • edited

funny-falcon commented Nov 28, 2016

pczarn commented Mar 22, 2016 •

edited

pczarn commented Mar 23, 2016 •

edited

pczarn commented Jul 13, 2016 •

edited

comex commented Sep 15, 2016 •

edited

ticki commented Sep 15, 2016 •

edited

comex commented Sep 15, 2016 •

edited

pczarn commented Sep 15, 2016 •

edited

pczarn commented Oct 30, 2016 •

edited

funny-falcon commented Nov 28, 2016 •

edited