You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Oftentimes when you insert a set of N keys into a map, you don't know ahead of time the number of distinct keys m. This can lead to inefficient memory usage if N >> m. For example, if you bulk insert 1M keys, then for a 50% load factor we allocate 2M slots. If there are only 2 unique values, then only 2 of 2M slots in the map will be occupied. This is a waste of memory.
It would be nice if there was an API to allow "compacting" hash map down to a smaller number of slots.
One way we could achieve this is with a static_map::rehash function. Somewhat inspired by std::unordered_map::rehash. However, unlike std::unordered_map::rehash which takes the number of buckets as the argument, the cuco::static_map::rehash function would probably take the number of slots as the argument. For example:
// Creates a new hash map with the specified number of slots and rehashes the existing keys into the new slots
// Destroys the old slot storage
void rehash(std::size_t num_slots)
One tricky thing is to ensure that num_slots >= the number of keys that exist in the map. Assuming all inserts have been done through the bulk insert API, we already have this information. However, if a user does manual inserts via the mutable_device_view, then we know longer know exactly how many keys are present. In this situation, @Nicolas-Iskos had the idea of first doing a kernel to count the number of existing keys to ensure num_slots is valid.
The text was updated successfully, but these errors were encountered:
If get to the point where we have many sub-maps, it would likely be better for future search performance to rehash the submaps into a single (or fewer) submaps, e.g., if we have submaps 1, 1, 2, 4GB. It may eventually prove better to rehash into a single map of size 8GB.
duplicate label removed as this issue is now a sub-task of #110 rather than a duplicate. topic: static_map label removed as this issue applies to all data structures.
Is your feature request related to a problem? Please describe.
Oftentimes when you insert a set of
N
keys into a map, you don't know ahead of time the number of distinct keysm
. This can lead to inefficient memory usage ifN >> m
. For example, if you bulk insert 1M keys, then for a 50% load factor we allocate 2M slots. If there are only 2 unique values, then only 2 of 2M slots in the map will be occupied. This is a waste of memory.It would be nice if there was an API to allow "compacting" hash map down to a smaller number of slots.
One way we could achieve this is with a
static_map::rehash
function. Somewhat inspired bystd::unordered_map::rehash
. However, unlikestd::unordered_map::rehash
which takes the number of buckets as the argument, thecuco::static_map::rehash
function would probably take the number of slots as the argument. For example:One tricky thing is to ensure that
num_slots
>= the number of keys that exist in the map. Assuming all inserts have been done through the bulk insert API, we already have this information. However, if a user does manual inserts via themutable_device_view
, then we know longer know exactly how many keys are present. In this situation, @Nicolas-Iskos had the idea of first doing a kernel to count the number of existing keys to ensurenum_slots
is valid.The text was updated successfully, but these errors were encountered: