# Chapter 06: Hash Tables

## Maps

**Map**: data structure that allows the assigning of identifier keys to elements. Also known as **dictionaries**. Stores a collection of key-value pairs (k,v)

**Multimap**: maps that allow for multiple values for the same key

Maps support the following methods:

- get(k): if map M contains an item of key k, return the value of the item
- put(k,v): insert an item with key k and value v. If there is already an item with key k, replace its value with v
- remove(k): remove M from an item with key equal to k, and return this item

### Lookup Tables

**Lookup table**: implementation of maps where keys are the integers 0 to N-1. Allows for direct accessing via using the key as an index, e.g. A[k]

Lookup tables start off from an empty array, with the map operations having the following effects:

- put(k,v): assign (k,v) to A[k]
- get(k): return A[k]
- remove(k): return A[k] and assign NULL to A[k]

Each of the operations of a lookup table runs in $O(1)$ time in the worst case. Lookup tables require $\Theta(n)$ space

## Hash Functions

**Hash function**: maps each key k in a map to an integer in the range [0, N-1] where N is the capacity of the underlying array 

**Collision**: a pair of keys with the same hash value. Good hash functions minimize collisions

### Key Representations

Keys may be viewed as tuples of the form $(x_1,x_2,...,x_3)$. To compute the total sum of these keys the following formula is used:

$$h(x) = \oplus_{i=1}^d x_i$$

However, this is not practical as it may lead to sources of collisions. For example, the order of integers within the key may matter (e.g., "temp01" vs. "temp10"). An alternative and better methodology is relying on **polynomial evaluation functions**: 

$$h(k) = x_d + a(x_{d-1} + a(x_{d-2} + ... + a(x_3 + a(x_2 + ax_1))...)) \textrm{ where a is some nonzero constant != 1}$$

Additionally, **tabulation-based hash functions** may be used for when each key is a tuple representation of the form of $k=(x_1,x_2,...,x_d)$. Tables $T_d$ are initialized for each key, and the following hash function is used:

$$h(k) = T_1[x_1]\oplus T_2[x_2]\oplus ... \oplus T_d[x_d]$$

**Random linear hash functions** help mitigate repeated patterns in a set of integer keys. They are of the form $h(k)=(ak+b)modN$, where N is a prime number and $0<a<N$ and $0\leq b < N$

## Handling Collisions and Re-hashing

There exist consistent strategies for resolving collisions: 

### Separate Chaining

Have each bucket A[i] store a reference to a set $S_i$ that stores all items that a hash function has mapped to the bucket A[i] in a linked list

The number of items that map to a bucket may be represented by the following equation:

$$E(x) = \frac{n}{N}$$

In the above equation, n is the number of items in the map and N is a location in A. The ratio of n to N is known as the **load factor** of the table

### Open Addressing

Each item is stored directly in a bucket, with at most one item per bucket. There are several ways to accomplish this:

- **linear probing**: if an item is tried to be inserted into a bucket A[i] that is already occupied, then the next bucket A[$(i+1)modN$] is tried. If this is full too, then A[$(i+2)modN$] is tried, and so on. Run time of $O(1)$ so long as the load factor in the table is at most 1/2
- **quadratic probing**: iteratively try the buckets A[$(i+f(j))modN$] for $j=0,1,2,...,$ where $f(j)=k^2$ until finding an empty bucket
- **double hashing**: choose a secondary hash function, h', and if h results in a collision then iteratively try the buckets A[$(i+f(j))modN$] for $j=1,2,3...$ where $f(j)=j*h'(k)$

## Cuckoo Hashing

Two lookup tables, $T_0$ and $T_1$ are used, each with a size of N where N is greater than n by at least a constant factor (e.g., $N \geq 2n$). A hash function $h_0$ is used for $T_0$ and a different one, $h_1$, for $T_1$. For any key k, there are only two possible places where an item of key k may be stored, namely in $T_0[h_0(k)]$ or $T_1[h_1(k)]$. If a bucket is already occupied, then the element is bounced back-and-forth between the two tables until an opening is found

The expected running time for performing a put(k,v) operation for a cuckoo table is $O(1)$

