# F - Hash Tables

* A **hash table** is a type of data structure for fast access (eg $O(1)$ time)
* For a KV Map, this means potential performance gains for the following functions:
    * `contains()`
    * `erase()`
    * `insert()`
    * `operator[]()` (fetch/update)

## The Basic Idea

* Keep an array of elements (called "buckets")
    * This array is called the *table*
* define a "hash function" $h\:\::\text{ key } \implies \text{ index}$
    * this function has gotta be FAST in relation to the number of elements in the table with size $n$
* each bucket is a linked list containing KV-pairs
* the table is a resizable array

## Examples of Hash Functions

### Selecting Digits

* select specific parts/digits of (integer) keys to use as the has value (index)
* EX: 9-digit student IDs
    * Let `h(k)` = the 4th and 9th digit
    * `h(001364825) = 35`
    * insert the KV-pair into the hash table at index 35
* note that with strings you can get the ASCII representation of the string to do the same thing

> Q: Is this a good has function?
> * FAST: it's constant time both with respect to the size of the table *and* the size of the key
> * EVENLY DISTRIBUTES: it depends on the keys being used and the size of the has table
>    * the bigger your hash table the more empty buckets you have


### Folding

* add (sum) the digits of the key
* 9 digit integer key, $\therefore$ `h(k) = i_1 + i_2 + ... + i_n`

> Q: Is this a good has function?
> * FAST: it's slower than the last algorithm, but at least it's independent of the size of the table so still fast
> * EVENLY DISTRIBUTES: this doesn't evenly distribute (if anything maybe it normally distributes? - would be curious to see that)

### A Combination of the Previous Two

* we could do a function where we multiply each cluster of three digits and then add them together, etc. etc.
* a good hash function is a hash function that will give you a lot of different values


### Modular Arithmetic

* to deal with hash values outside of index range
* `h(k) = f(k) % n` where `n` is equal to the table size


### Weighted Sum

* If our key is the string `"note"`, then `n + o + t + e = 78 + 79 + 84 + 69`
* However, if our key is `"tone"`, then it has the same value as `"note"`
* Therefore, multiply values at a given index by a certain constant
* Still take the modulus at the end


### Universal Hashing

* Trying to find a hash function such that you are minimizing the probability of a collision
    * So a hash function that satisfies $P(h(x) = h(y)) \le \frac{1}{n}$
    * $h(x) = [(k \cdot x + q) \mod p] \mod n$ where $k,q$ are random numbers, $p \ge n$ and is prime

## Hash Functions for the `HashMap`

In [2]:
#include <functional>
#include <iostream>

std::hash<int> hash_fun;
int code = hash_fun(238746); // just pressed random things on my keyboard for an example
int index = code % 15; // replace the 15 with whatever the capacity of the hash table is
std::cout << "The index is: " << index << std::endl;

The index is: 6


## The Load Factor Threshold

* Since we have a resizable array of linked lists, we can include an infinite number of elements without resizing. This means that we need a way to check for when we need to increase the size of the array. Therefore, we can use the load factor threshold to see when we need to resize the array. 

* The load factor threshold is ratio of the number of elements in the array divided by the capacity. 

* When you resize, you also need to rehash every element because the hashing depended on the capacity and we double the capacity when we resize