You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
A while back I implemented an optimized string dictionary builder for IOx. This contains two major tricks to provide better performance:
Use ahash instead of SipHash - this alone provides a 40% speedup
Use hashbrown's raw_entry_mut to not duplicate string values into the hashmap
I have an implementation of this for arrow that needs a bit more polish, but leads to a 60% speedup over the current implementation in arrow. Unfortunately it depends on #1850 as it needs to be able to read the string data from an in-progress StringBuilder
I also have an implementation that I could contribute, it would need some changes for generic key types and maybe for null values. It's based on using hashbrown directly and storing a (start_index, end_index) tuple in the hashmap into a backing MutableBuffer, also keeping track of the offsets in a MutableBuffer. Probably very similar to your impl, but using mutable buffers directly.