-
Notifications
You must be signed in to change notification settings - Fork 0
Hashmap
Folder: 07-hashmap
A hash table written from scratch as a class template. It maps keys to values in roughly constant time by hashing each key to a bucket. Collisions, where two keys land in the same bucket, are handled by separate chaining: every bucket is a linked list, and colliding keys sit in the same list. This is the strongest data structure in the repository, so the page goes deep.
The hash function is a small functor, DefaultHash<T>. It sums the bytes of the key and takes the result modulo the bucket count. The container is parameterized on the key type, the value type, an equality comparator (std::equal_to by default), and the hash functor, which mirrors how std::unordered_map is parameterized.
Inside, the table is a vector<list<Element>>, where Element is pair<const Key, Value>. The outer vector is the bucket array. Each inner list is one bucket's chain.
flowchart LR
subgraph BucketArray["elems_ : vector of buckets"]
b0["bucket 0"]
b1["bucket 1"]
b2["bucket 2"]
bn["bucket n-1"]
end
b0 --> e0["(k4, 40)"]
b1 --> e1a["(k6, 60)"] --> e1b["(k107, 12)"]
b2 --> empty["empty"]
bn --> en["(k88, 7)"]
key["key"] -->|hash mod numBuckets| BucketArray
The diagram shows a collision in bucket 1: keys k6 and k107 both hashed there, so they share a chain.
find hashes the key to a bucket, then walks that bucket's list with the helper findElement. It returns a pointer to the matching element or nullptr.
insert is a safe insert. It hashes the key, looks in the bucket, and if the key is absent it appends the pair and returns make_pair(pointer, true). If the key is already there it returns make_pair(pointer_to_existing, false), so the caller can tell an insertion from a no-op. This is the same contract std::map::insert offers.
erase removes by key and returns a pair. The pointer is the element that followed the erased one, even when that successor lives in a later bucket, which is why erase scans forward through the bucket array if the current bucket runs out. The boolean reports whether anything was removed.
operator[] reuses insert so the key is looked up only once. It inserts a default-constructed value if the key is missing, then returns a reference to the value either way.
rehash grows the table. When asked for more buckets, it builds a new bucket vector and a new hash functor sized to the new count, then re-hashes every existing element into its new bucket. If the requested size is not larger than the current size it does nothing.
classDiagram
class DefaultHash~T~ {
-size_t numBuckets_
+hash(T) size_t
+numBuckets() size_t
}
class hashmap~Key, Value, Compare, Hash~ {
-size_t size_
-Compare comp_
-Hash hash_
-vector~list~Element~~ elems_
+find(Key) Element*
+insert(Element) pair~Element*, bool~
+erase(Key) pair~Element*, bool~
+operator[](Key) Value&
+rehash(size_t) void
}
hashmap~Key, Value, Compare, Hash~ *-- DefaultHash~T~ : hashes keys with
g++ -std=c++17 07-hashmap/testHashmap.cpp -o testHashmap
./testHashmap
g++ -std=c++17 07-hashmap/testHashmapUpdated.cpp -o testHashmapUpdated
./testHashmapUpdatedtestHashmap.cpp reads a few employee numbers and names from standard input; testHashmapUpdated.cpp runs assertions with no input.
The container skeleton and the default hash came from the course, adapted from Professional C++. My work is the safe insert returning a pair, the erase that returns the next element across bucket boundaries, the single-lookup operator[], and rehash. One caveat carried over from the skeleton: DefaultHash hashes the raw bytes of the key, so it works for integer keys but not for std::string keys, which would need a string-aware hash functor.
Data structures & STL
Design patterns