Permalink
Browse files

[Runtime] Improve the metadata hashing function.

The inputs to the hash function is pointers that have a predictable patten. The
hashes that we were generating and using for the metadata caches were not very
good, and as a result we generated very deep search trees. A small change that
improved the utilization of the 'length' field and another bit-rotate round
improved the quality of the hash function.

I am going to attach to the github commit two pictures. The first picture is the
binary with the old hash. The first tree is very deep and sparse. The second
picture is with the new hash function and the tree is very wide and uniform.  I
used the benchmark 'TypeFlood' in debug mode to generate huge amounts of
metadata.
  • Loading branch information...
nadavrot committed Feb 10, 2016
1 parent 866b44c commit 422764545c720f696bf7061513eac30941d39cf4
Showing with 3 additions and 2 deletions.
  1. +3 −2 stdlib/public/runtime/MetadataCache.h
@@ -66,12 +66,13 @@ class EntryRef {
}
size_t hash() {
size_t H = 0x56ba80d1 ^ length ;
size_t H = 0x56ba80d1 * length ;
for (unsigned i = 0; i < length; i++) {
H = (H >> 10) | (H << ((sizeof(size_t) * 8) - 10));
H ^= ((size_t)args[i]) ^ ((size_t)args[i] >> 19);
}
return H * 0x27d4eb2d;
H *= 0x27d4eb2d;
return (H >> 10) | (H << ((sizeof(size_t) * 8) - 10));
}
const void * const *begin() const { return args; }

15 comments on commit 4227645

@nadavrot

This comment has been minimized.

Show comment
Hide comment
@nadavrot

nadavrot Feb 10, 2016

Contributor

The two pictures below show the shape of the internal data structure used by the metadata caches.

Before:

before

After:
after

Contributor

nadavrot replied Feb 10, 2016

The two pictures below show the shape of the internal data structure used by the metadata caches.

Before:

before

After:
after

@dduan

This comment has been minimized.

Show comment
Hide comment
@dduan

dduan Feb 10, 2016

Collaborator

Sweeeeeet

Collaborator

dduan replied Feb 10, 2016

Sweeeeeet

@kostiakoval

This comment has been minimized.

Show comment
Hide comment
@kostiakoval

kostiakoval Feb 10, 2016

Collaborator

Amazing, would be interesting be see the speed improvements related to this change.

Collaborator

kostiakoval replied Feb 10, 2016

Amazing, would be interesting be see the speed improvements related to this change.

@jckarter

This comment has been minimized.

Show comment
Hide comment
@jckarter

jckarter Feb 10, 2016

Member

Awesome!

Member

jckarter replied Feb 10, 2016

Awesome!

@jckarter

This comment has been minimized.

Show comment
Hide comment
@jckarter

jckarter Feb 10, 2016

Member

ER: playgrounds should display the bucket graph for Dictionarys

Member

jckarter replied Feb 10, 2016

ER: playgrounds should display the bucket graph for Dictionarys

@riking

This comment has been minimized.

Show comment
Hide comment
@riking

riking Feb 11, 2016

@nadavrot Can you thumbnail those images? My browser struggles on the image decode when loading this page...

riking replied Feb 11, 2016

@nadavrot Can you thumbnail those images? My browser struggles on the image decode when loading this page...

@Cyan4973

This comment has been minimized.

Show comment
Hide comment
@Cyan4973

Cyan4973 Feb 12, 2016

You might also consider changing entirely the hash function by some proven ones with excellent dispersion and speed. See for example this excellent article which makes a great job at comparing several hash algorithms for their speed and dispersion.

Better dispersion == smaller tree depth.

A PRIMER ON REPEATABLE RANDOM NUMBERS, by RUNE SKOVBO JOHANSEN

Cyan4973 replied Feb 12, 2016

You might also consider changing entirely the hash function by some proven ones with excellent dispersion and speed. See for example this excellent article which makes a great job at comparing several hash algorithms for their speed and dispersion.

Better dispersion == smaller tree depth.

A PRIMER ON REPEATABLE RANDOM NUMBERS, by RUNE SKOVBO JOHANSEN

@gribozavr

This comment has been minimized.

Show comment
Hide comment
@gribozavr

gribozavr Feb 12, 2016

Collaborator

It would be even better to not roll our own, and just use llvm Hashing.h or std::hash (if all of our platforms are new enough). In fact, the issue of bad balancing in the tree was caused by us writing our own hash function in the first place.

Collaborator

gribozavr replied Feb 12, 2016

It would be even better to not roll our own, and just use llvm Hashing.h or std::hash (if all of our platforms are new enough). In fact, the issue of bad balancing in the tree was caused by us writing our own hash function in the first place.

@nadavrot

This comment has been minimized.

Show comment
Hide comment
@nadavrot

nadavrot Feb 12, 2016

Contributor

@gribozavr Using LLVM's hash is not a very good idea. We hash our metadata pointers very frequently and LLVM's hash is way too slow. Most Swift programs have a small number of generic types at runtime, which means that we have a small number of entries in the tree, but we calculate the hash and perform the lookup very frequently. This means that a low-quality hash that runs very fast outperforms a slow hash that does a very good job. Another problem that we have is that the number if inputs to our hash function is typically very small (1 to 3 pointers) and we need to ensure that we have as much entropy in the high bits, even for the small inputs. A non-specialized hash, like LLVM's, can only do this at the cost of a high runtime (by doing lots of operations that blend the bits, such as multiplication and rotation). Another requirement that we have is the support for both 32bits and 64bits. We need the hash to be fast and effective on 32bit platforms, and LLVM is 64bits only.

Contributor

nadavrot replied Feb 12, 2016

@gribozavr Using LLVM's hash is not a very good idea. We hash our metadata pointers very frequently and LLVM's hash is way too slow. Most Swift programs have a small number of generic types at runtime, which means that we have a small number of entries in the tree, but we calculate the hash and perform the lookup very frequently. This means that a low-quality hash that runs very fast outperforms a slow hash that does a very good job. Another problem that we have is that the number if inputs to our hash function is typically very small (1 to 3 pointers) and we need to ensure that we have as much entropy in the high bits, even for the small inputs. A non-specialized hash, like LLVM's, can only do this at the cost of a high runtime (by doing lots of operations that blend the bits, such as multiplication and rotation). Another requirement that we have is the support for both 32bits and 64bits. We need the hash to be fast and effective on 32bit platforms, and LLVM is 64bits only.

@gribozavr

This comment has been minimized.

Show comment
Hide comment
@gribozavr

gribozavr Feb 12, 2016

Collaborator

@nadavrot OK, here's another hash with known good properties:

https://131002.net/siphash/

The performance is discussed in https://131002.net/siphash/siphash.pdf on page 13.

Collaborator

gribozavr replied Feb 12, 2016

@nadavrot OK, here's another hash with known good properties:

https://131002.net/siphash/

The performance is discussed in https://131002.net/siphash/siphash.pdf on page 13.

@Cyan4973

This comment has been minimized.

Show comment
Hide comment
@Cyan4973

Cyan4973 Feb 12, 2016

siphash has very poor performance in 32-bits mode unfortunately,
which doesn't meet @nadavrot requirement.

Most modern fast hashes are tailored for 64-bits CPU, since they run faster using this mode.
So it leaves only a handful which remain fast on 32-bits targets.

This benchmark was specifically run on 32-bits.

Now it also seems that another requirement is to work on short keys, which isn't uncommon.
That also could be tested with a benchmark.
The SMHasher test set for example includes a benchmark module for small keys.

edit : @nadavrot : sorry, crossed posts by a few seconds ...

Cyan4973 replied Feb 12, 2016

siphash has very poor performance in 32-bits mode unfortunately,
which doesn't meet @nadavrot requirement.

Most modern fast hashes are tailored for 64-bits CPU, since they run faster using this mode.
So it leaves only a handful which remain fast on 32-bits targets.

This benchmark was specifically run on 32-bits.

Now it also seems that another requirement is to work on short keys, which isn't uncommon.
That also could be tested with a benchmark.
The SMHasher test set for example includes a benchmark module for small keys.

edit : @nadavrot : sorry, crossed posts by a few seconds ...

@nadavrot

This comment has been minimized.

Show comment
Hide comment
@nadavrot

nadavrot Feb 12, 2016

Contributor

@gribozavr SipHash is a bad candidate for two reasons. First, it is slow (optimistically 15 cycles per byte), which is an order of magnitude slower than what we use today. And Second, it requires at least 16 bytes of inputs, which we don't always have.

Contributor

nadavrot replied Feb 12, 2016

@gribozavr SipHash is a bad candidate for two reasons. First, it is slow (optimistically 15 cycles per byte), which is an order of magnitude slower than what we use today. And Second, it requires at least 16 bytes of inputs, which we don't always have.

@gribozavr

This comment has been minimized.

Show comment
Hide comment
@gribozavr

gribozavr Feb 12, 2016

Collaborator

We could use different functions for 32 and 64-bit targets.

Collaborator

gribozavr replied Feb 12, 2016

We could use different functions for 32 and 64-bit targets.

@Cyan4973

This comment has been minimized.

Show comment
Hide comment
@Cyan4973

Cyan4973 Feb 12, 2016

Sure.
In general, the gains from 32 to 64 bits makes it worthwhile.
Two caveats though :

  • Increases maintenance complexity (can be mitigated by using both version from a single library)
  • For short key specifically, 64-bits version may not have enough room to prove superior.
    Once again, only benchmarking will tell...

Cyan4973 replied Feb 12, 2016

Sure.
In general, the gains from 32 to 64 bits makes it worthwhile.
Two caveats though :

  • Increases maintenance complexity (can be mitigated by using both version from a single library)
  • For short key specifically, 64-bits version may not have enough room to prove superior.
    Once again, only benchmarking will tell...
@rudkx

This comment has been minimized.

Show comment
Hide comment
@rudkx

rudkx Feb 13, 2016

Member

👍 💯

Member

rudkx replied Feb 13, 2016

👍 💯

Please sign in to comment.