Skip to content

Commit

Permalink
Add tsl::prime_growth_policy which keeps the size of the map/set to a…
Browse files Browse the repository at this point in the history
… prime number (#19)
  • Loading branch information
Tessil committed Mar 14, 2017
1 parent 838413d commit fd765fb
Show file tree
Hide file tree
Showing 3 changed files with 85 additions and 40 deletions.
12 changes: 12 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,18 @@ Thread-safety and exceptions guarantees are the same as `std::unordered_map/set`
- The key and the value of the map don't need a copy constructor/operator, move-only types are supported.
- It uses less memory for its speed as it can sustain a load factor of 0.95 (which is the default value in the library compare to the 0.5 of `google::dense_hash_map`) while keeping good performances.
### Growth policy
By default `tsl::hopscotch_map/set` uses `tsl::power_of_two_growth_policy` as `GrowthPolicy`. This policy keeps the size of the map to a power of two by doubling the size of the map when a rehash is required. It allows the map to avoid the usage of the slow modulo operation, instead of <code>hash % 2<sup>n</sup></code>, it uses <code>hash & (2<sup>n</sup> - 1)</code>.
This may cause a lot of collisions with a poor hash function as the modulo just masks the most significant bits.
If you encounter poor performances, check `overflow_size()`. If it's not 0, you may have a lot of collisions due to a common pattern in the least significant bits. Either change the hash function for something more uniform or use `tsl::prime_growth_policy` which keeps the size of the map to a prime size.
You can also use `tsl::mod_growth_policy` if you want a more configurable growth rate or you could even define your own policy (see [API](https://tessil.github.io/hopscotch-map/doc/html/classtsl_1_1hopscotch__map.html#details)).
A bad distribution may lead to a runtime complexity of O(n) for lookups. Unfortunately it's sometimes difficult to guard yourself against it (e.g. DoS attack on the hash map). If needed, check `tsl::hopscotch_sc_map/set` which offer a worst-case scenario of O(log n) on lookups, see [details](https://github.com/Tessil/hopscotch-map#deny-of-service-dos-attack) in example.
### Installation
To use hopscotch-map, just add the [src/](src/) directory to your include path. It's a **header-only** library.
Expand Down
106 changes: 68 additions & 38 deletions src/hopscotch_hash.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@


#include <algorithm>
#include <array>
#include <cassert>
#include <cmath>
#include <cstddef>
Expand Down Expand Up @@ -150,6 +151,70 @@ class mod_growth_policy {
};



namespace detail_hopscotch_hash {

static constexpr const std::array<std::size_t, 38> PRIMES = {{
17ul, 29ul, 37ul, 53ul, 67ul, 79ul, 97ul, 131ul, 193ul, 257ul, 389ul, 521ul, 769ul, 1031ul, 1543ul, 2053ul,
3079ul, 6151ul, 12289ul, 24593ul, 49157ul, 98317ul, 196613ul, 393241ul, 786433ul, 1572869ul, 3145739ul,
6291469ul, 12582917ul, 25165843ul, 50331653ul, 100663319ul, 201326611ul, 402653189ul, 805306457ul,
1610612741ul, 3221225473ul, 4294967291ul
}};

template<unsigned int IPrime>
static std::size_t mod(std::size_t hash) { return hash % PRIMES[IPrime]; }

// MOD_PRIME[iprime](hash) returns hash % PRIMES[iprime]. This table allows for faster modulo as the
// compiler can optimize the modulo code better with a constant known at the compilation.
static constexpr const std::array<std::size_t(*)(std::size_t), 38> MOD_PRIME = {{
&mod<0>, &mod<1>, &mod<2>, &mod<3>, &mod<4>, &mod<5>, &mod<6>, &mod<7>, &mod<8>, &mod<9>, &mod<10>,
&mod<11>, &mod<12>, &mod<13>, &mod<14>, &mod<15>, &mod<16>, &mod<17>, &mod<18>, &mod<19>, &mod<20>,
&mod<21>, &mod<22>, &mod<23>, &mod<24>, &mod<25>, &mod<26>, &mod<27>, &mod<28>, &mod<29>, &mod<30>,
&mod<31>, &mod<32>, &mod<33>, &mod<34>, &mod<35>, &mod<36>, &mod<37>
}};

}

/**
* Grow the map by using prime numbers as size. Slower than tsl::power_of_two_growth_policy in general
* but will probably distribute the values around better in the buckets with a poor hash function.
*/
class prime_growth_policy {
public:
prime_growth_policy(std::size_t& min_bucket_count_in_out) {
auto it_prime = std::lower_bound(tsl::detail_hopscotch_hash::PRIMES.begin(),
tsl::detail_hopscotch_hash::PRIMES.end(), min_bucket_count_in_out);
if(it_prime == tsl::detail_hopscotch_hash::PRIMES.end()) {
throw std::length_error("The map exceeds its maxmimum size.");
}

m_iprime = std::distance(tsl::detail_hopscotch_hash::PRIMES.begin(), it_prime);
min_bucket_count_in_out = *it_prime;
}

std::size_t bucket_for_hash(std::size_t hash) const {
return bucket_for_hash_iprime(hash, m_iprime);
}

std::size_t next_bucket_count() const {
if(m_iprime + 1 >= tsl::detail_hopscotch_hash::PRIMES.size()) {
throw std::length_error("The map exceeds its maxmimum size.");
}

return tsl::detail_hopscotch_hash::PRIMES[m_iprime + 1];
}

private:
std::size_t bucket_for_hash_iprime(std::size_t hash, unsigned int iprime) const {
tsl_assert(iprime < tsl::detail_hopscotch_hash::MOD_PRIME.size());
return tsl::detail_hopscotch_hash::MOD_PRIME[iprime](hash);
}

private:
unsigned int m_iprime;
};


namespace detail_hopscotch_hash {


Expand Down Expand Up @@ -326,43 +391,8 @@ class hopscotch_bucket : public hopscotch_bucket_hash<StoreHash> {
m_neighborhood_infos = bucket.m_neighborhood_infos;
}

hopscotch_bucket& operator=(const hopscotch_bucket& bucket)
noexcept(std::is_nothrow_copy_constructible<value_type>::value)
{
if(this != &bucket) {
if(!is_empty()) {
destroy_value();
set_is_empty(true);
}

if(!bucket.is_empty()) {
::new (static_cast<void*>(std::addressof(m_value))) value_type(bucket.get_value());
this->copy_hash(bucket);
}

m_neighborhood_infos = bucket.m_neighborhood_infos;
}

return *this;
}

hopscotch_bucket& operator=(hopscotch_bucket&& bucket)
noexcept(std::is_nothrow_move_constructible<value_type>::value)
{
if(!is_empty()) {
destroy_value();
set_is_empty(true);
}

if(!bucket.is_empty()) {
::new (static_cast<void*>(std::addressof(m_value))) value_type(std::move(bucket.get_value()));
this->copy_hash(bucket);
}

m_neighborhood_infos = bucket.m_neighborhood_infos;

return *this;
}
hopscotch_bucket& operator=(const hopscotch_bucket& bucket) = delete;

This comment has been minimized.

Copy link
@jcelerier

jcelerier Apr 9, 2017

Contributor

from the template backtraces I get in #22 I suppose this is where the problem comes from

hopscotch_bucket& operator=(hopscotch_bucket&& bucket) = delete;

~hopscotch_bucket() noexcept {
if(!is_empty()) {
Expand Down Expand Up @@ -1625,7 +1655,7 @@ class hopscotch_hash {

private:
static const std::size_t MAX_PROBES_FOR_EMPTY_BUCKET = 10*NeighborhoodSize;
static constexpr float MIN_LOAD_FACTOR_FOR_REHASH = 0.3f;
static constexpr float MIN_LOAD_FACTOR_FOR_REHASH = has_key_compare<OverflowContainer>::value?0.3f:0.15f;

private:
buckets_container_type m_buckets;
Expand Down
7 changes: 5 additions & 2 deletions tests/hopscotch_map_tests.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,14 @@ using test_types = boost::mpl::list<
std::allocator<std::pair<self_reference_member_test, self_reference_member_test>>, 6, true>,
// hopscotch_sc_map
tsl::hopscotch_sc_map<int64_t, int64_t, mod_hash<9>>,
// with tsl::prime_growth_policy
tsl::hopscotch_map<std::string, std::string, mod_hash<9>, std::equal_to<std::string>,
std::allocator<std::pair<std::string, std::string>>, 62, false, tsl::prime_growth_policy>,
// with tsl::mod_growth_policy
tsl::hopscotch_map<std::string, std::string, mod_hash<9>, std::equal_to<std::string>,
std::allocator<std::pair<std::string, std::string>>, 6, false, tsl::mod_growth_policy<>>,
std::allocator<std::pair<std::string, std::string>>, 62, false, tsl::mod_growth_policy<>>,
tsl::hopscotch_map<std::string, std::string, mod_hash<9>, std::equal_to<std::string>,
std::allocator<std::pair<std::string, std::string>>, 6, false, tsl::mod_growth_policy<std::ratio<4, 3>>>
std::allocator<std::pair<std::string, std::string>>, 62, false, tsl::mod_growth_policy<std::ratio<4, 3>>>
>;


Expand Down

0 comments on commit fd765fb

Please sign in to comment.