Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] static_map::rehash #21

Closed
jrhemstad opened this issue Jul 8, 2020 · 2 comments · Fixed by #380
Closed

[FEA] static_map::rehash #21

jrhemstad opened this issue Jul 8, 2020 · 2 comments · Fixed by #380
Assignees
Labels
type: feature request New feature request

Comments

@jrhemstad
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

Oftentimes when you insert a set of N keys into a map, you don't know ahead of time the number of distinct keys m. This can lead to inefficient memory usage if N >> m. For example, if you bulk insert 1M keys, then for a 50% load factor we allocate 2M slots. If there are only 2 unique values, then only 2 of 2M slots in the map will be occupied. This is a waste of memory.

It would be nice if there was an API to allow "compacting" hash map down to a smaller number of slots.

One way we could achieve this is with a static_map::rehash function. Somewhat inspired by std::unordered_map::rehash. However, unlike std::unordered_map::rehash which takes the number of buckets as the argument, the cuco::static_map::rehash function would probably take the number of slots as the argument. For example:

// Creates a new hash map with the specified number of slots and rehashes the existing keys into the new slots
// Destroys the old slot storage
void rehash(std::size_t num_slots)

One tricky thing is to ensure that num_slots >= the number of keys that exist in the map. Assuming all inserts have been done through the bulk insert API, we already have this information. However, if a user does manual inserts via the mutable_device_view, then we know longer know exactly how many keys are present. In this situation, @Nicolas-Iskos had the idea of first doing a kernel to count the number of existing keys to ensure num_slots is valid.

@jrhemstad
Copy link
Collaborator Author

This same idea would apply to the dynamic map.

If get to the point where we have many sub-maps, it would likely be better for future search performance to rehash the submaps into a single (or fewer) submaps, e.g., if we have submaps 1, 1, 2, 4GB. It may eventually prove better to rehash into a single map of size 8GB.

@PointKernel PointKernel added duplicate This issue or pull request already exists type: feature request New feature request topic: static_map Issue related to the static_map labels Dec 3, 2021
@sleeepyjack sleeepyjack removed duplicate This issue or pull request already exists topic: static_map Issue related to the static_map labels Aug 10, 2022
@sleeepyjack
Copy link
Collaborator

duplicate label removed as this issue is now a sub-task of #110 rather than a duplicate.
topic: static_map label removed as this issue applies to all data structures.

@sleeepyjack sleeepyjack self-assigned this Oct 5, 2023
sleeepyjack added a commit that referenced this issue Oct 11, 2023
This PR adds the ability to `rehash` aka resize or regenerate an open
addressing container.

Closes #21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: feature request New feature request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants