In [None]:
// run this cell to prevent Jupyter from displaying the null output cell
com.twosigma.beakerx.kernel.Kernel.showNullExecutionResult = false;

<a href="notebook_id"></a>
# Hash tables

Balanced binary search trees can provide $O(\log n)$ worst-case algorithms for adding and removing elements, and for searching for a specified element. A heap can provide $O(\log n)$ adding of an element and $O(1)$ retrieval and removal of the maximum element. A hash table is a data structure that can provide $O(1)$ add, remove, and search *for the average case*.

Consider a collection that stores key-value pairs where the keys are unique. For example, a credit card company might use a collection where the keys are 16-digit credit card numbers and the values are customer accounts. An impractical collection is an array of size $10^{16}$ where the credit card number is simply an index into the array and the array element is a reference to a customer account. Because array indexing is in $O(1)$ adding and searching for an account linked to a credit card number is in $O(1)$.

**Exercise 1** Explain why removing an account linked to a credit card number is also in $O(1)$.

The main problem with this scheme is that the vast majority of the array will contain null elements because most indexes are not valid credit card numbers and the size of the array is more than one million times the current population of the earth.

Using an array is an attractive idea because of the $O(1)$ random access. To handle keys that are not integers, a function called a *hash function* is used to map keys to the range $0$ to $m-1$ where $m$ is the size of the array $t$ used to store the key-value pairs. The array $t$ is called a *hash table* and each array element is called a *bucket*.

For the time being, we make the simplifying assumption that the keys are non-negative integers. Then one possible hash function that maps keys to the range $0$ to $m-1$ is Java's remainder operator `%`; in other words our hash function is simply:

```java
    /**
     * Hash function for non-negative integer keys.
     * 
     * @param k a non-negative key
     * @return an array index
     */
    private int h(int k) {
        int m = t.length;
        return k % m;
    }
```

The size of the hash table is usually much smaller than the number of possible keys. This means that the hash function may map multiple keys to the same array index. For example, if $m = 11$ then the values $1, 12, 23, 34, ...$ all map to the same index using our current hash function.

We say that a *collision* occurs when two or more keys map to the same index. Hash tables use a *collision resolution* strategy to deal with collisions when adding key-value pairs to the hash table.

**Exercise 2** Conduct some research into the birthday problem. Suppose that $t$ has size 1000 and we add $n$ key-value pairs to the hash table where the keys are drawn from a uniform random distribution over the range $0$ to $999$, inclusive. What is the approximate probability that there is at least one collision for $n = 25, 50, 100$?


For the time being, our collision resolution strategy will be to throw an exception when trying to add a key-value pair to the hash table and a collision occurs.

To test if the hash table contains a key we:

1. use the hash function to compute the array index for the key $k$, 
2. test if the key-value pair at the computed index is null
    1. if true, then return false
    2. if false, then test if $k$ matches the key from the key-value pair
        1. if true, then return true
        2. if false, then return false

In pseudocode, our `containsKey` algorithm is:

```
containsKey(k):
    index = h(k)
    pair = t[index]
    if pair == null
        return false
    else
        if k == pair.key
            return true
        else
            return false
```

Because the hash function is in $O(1)$ and array indexing is $O(1)$, searching a hash table for a key is in $O(1)$ if collisions do not occur.

To get a value from a hash table, we:

1. use the hash function to compute the array index for the key $k$,
2. test if the key-value pair at the computed index is null
    1. if true, then return null
    2. if false, then test if $k$ matches the key from the key-value pair
        1. if true, then return the value of the key-value pair
        2. if false, then return null
        
In pseudocode, our `get` algorithm is:

```
get(k):
    index = h(k)
    pair = t[index]
    if pair == null
        return null
    else
        if k == pair.key
            return pair.value
        else
            return null
```

Because the hash function is in $O(1)$ and array indexing is $O(1)$, getting a value mapped to a key is in $O(1)$ as long as collisions do not occur.

Removing a key-value pair from a hash table is similar to getting a value:

1. use the hash function to compute the array index for the key $k$,
2. test if the key-value pair at the computed index is null
    1. if true, then return null
    2. if false, then test if $k$ matches the key from the key-value pair
        1. if true, then set t[index] to null and return the removed value
        2. if false, then return null

In pseudocode, our `remove` algorithm is:

```
remove(k):
    index = h(k)
    pair = t[index]
    if pair == null
        return null
    else
        if k == pair.key
            old = pair.value
            t[index] = null
            return old
        else
            return null
```

Because the hash function is in $O(1)$ and array indexing is $O(1)$, removng a key-value pair is in $O(1)$ as long as collisions do not occur.

An implementation of a fixed-size hash table that does not handle collisions is shown below:

In [None]:
import java.util.Arrays;

/**
 * A hash table using non-negative integer keys. This implementation cannot
 * store null values.
 *
 * @param <V> the value type
 */
public class HashTable<V> {
    
    class Entry {
        int key;
        V value;
        
        Entry(int key, V value) {
            this.key = key;
            this.value = value;
        }
        
        @Override
        public String toString() {
            return "{" + key + ", " + value + "}";
        }
    }

    private Object[] t;
    private static final int DEFAULT_CAPACITY = 31; // prime number, stay tuned for details

    /**
     * Initialize this hash table as an empty hash table.
     */
    public HashTable() {
        this.t = new Object[DEFAULT_CAPACITY];
    }

    /**
     * Hash function for non-negative integer keys.
     * 
     * @param k a non-negative key
     * @return an array index
     */
    private int h(int k) {
        return k % this.t.length;
    }

    private static void testKey(int key) {
        if (key < 0) {
            throw new IllegalArgumentException();
        }
    }

    
    private Entry getEntry(int key) {
        testKey(key);
        int index = this.h(key); // use the hash function to compute an array index
        return (Entry) this.t[index];
    }
    
    /**
     * Returns true if this hash table has some value associated with the specified
     * key.
     * 
     * @param key a key
     * @return true if this hash table has some value associated with the specified
     *         key
     */
    public boolean containsKey(int key) {
        Entry e = this.getEntry(key);
        if (e != null && e.key == key) {    // test if key matches entry key
            return true;
        }
        return false;
    }

    /**
     * Returns the value associated with the specified key, or null if this hash
     * table contains no mapping for the key.
     * 
     * @param key a key
     * @return the value associated with the specified key, or null if this hash
     *         table contains no mapping for the key
     */
    public V get(int key) {
        Entry e = this.getEntry(key);
        if (e != null && e.key == key) {    // test if key matches entry key
            return e.value;
        }
        return null;
    }

    /**
     * Puts a key-value pair into this hash table overwriting the previously stored
     * value if there is one.
     *
     * <p>
     * An exception is thrown if a collision occurs.
     *
     * @param key   a key
     * @param value a value
     * @return the previous value mapped to the specified key, or null if it did not
     *         have one
     * @throws NullPointerException if value is null
     * @throws RuntimeException if the key-value pair cannot be put into this hash table
     */
    public V put(int key, V value) {
        if (value == null) {
            throw new NullPointerException();
        }
        testKey(key);
        int index = this.h(key);
        Entry e = (Entry) this.t[index];
        if (e == null) {
            e = new Entry(key, value);
            this.t[index] = e;
            return null;
        }
        else if (e.key == key) {
            V old = e.value;
            e.value = value;
            return old;
        }
        else {
            throw new RuntimeException("collision of key = " + key + " with exisiting pair = " + e);
        }
    }

    /**
     * Removes the value mapped to the specified key returning the removed value or
     * null if there is no value mapped to the key.
     * 
     * @param key a key
     * @return the removed value, or null if there is no value mapped to the key
     */
    public V remove(int key) {
        testKey(key);
        int index = this.h(key);
        Entry e = (Entry) this.t[index];
        if (e == null || e.key != key) {
            return null;
        }
        V removed = e.value;
        this.t[index] = null;
        return removed;
    }
    
    /**
     * Returns a string representation of this hash table.
     * 
     * @return a string representation of this hash table
     */
    @Override
    public String toString() {
        return Arrays.toString(this.t);
    }
    
    public static void main(String[] args) {
        HashTable<String> ht = new HashTable<>();
        ht.put(0, "hi");       // maps to index 0
        ht.put(3, "salut");    // maps to index 3
        ht.put(32, "ciao");    // maps to index 1 (32 % 31)
        ht.put(2, "hujambo");  // maps to index 2
        ht.put(4, "ni hao");   // maps to index 4
        ht.put(31 * 100 - 1, "konnichiwa");    // maps to index 30
        System.out.println(ht);
        
        System.out.println("got: " + ht.get(32));    // should be ciao
        System.out.println("got: " + ht.get(1));     // should be null because there is no value with key 1
        
        String removed = ht.remove(0);
        System.out.println("removed: " + removed);
        System.out.println(ht);
        
        // intentionally cause a collision with key 2
        ht.put(64, "byebye");
    }
}

Run the next cell to  run the `main` method above:

In [None]:
HashTable.main(null);