# **Hashtables**
_______

#### Dictionary in python is a kind of a Hashtable data structure

Hashtable implements mapping from keys to values

## Operations:
1. Find a key
2. Insert a key
3. Delete a key
4. Update key -> value mapping

### Basic idea
- Hashtable is a table
- There are slots/rows with keys and values
- Type of a key or value doesn't matter
- Hash function takes a key and calculates a number - an index of a slot in a hashtable (a place for a key value)
- Hash function is computed efficiently

Hash function must map key to slot in the table.

The main issue of a hash function is that it should map lots of keys and if there are a lot of keys, it can map several keys to one slot and we have to deal with such *collision*

### Insertion
- compute a hash of a key
- if computed slot is free (no collision):
    - put a key-value in a slot 
- if there's a collision:
    - make Chaining - deal with collision

### Chaining
Dealing with collisions
- Idea is to make a hashtable so that each slot maps to a **list** (initially empty)
- A list can grow and contain multiple key-value pairs
- Each slot is associated with a list of key-value pairs

#### Pros:
- Never filled up

#### Cons:
- Wasted space (by linked lists)
- Maintaining a dynamic array
- Clustering

### Find
How to find a value by its key
- Hash the key -> find a slot
- Search the list in this slot to find a key

### Delete
How to delete a key
- Hash the key -> find a slot
- If no slot found -> return None
- If slot found -> search through the list to find a key
- Delete a key

### Load factor
- Assume hashtable has *m* slots
- Let's insert *n* elements into the hash table
- ***Worst case*** - all *n* elements collide onto the same slot
    - poorly designed hash function?
    - adversary chosen keys?
- ***Average case***:
    - $ P(h(k_i)=j)=1/m $ - probability of putting a key into a slot *j*
    - Average list size = $n/m$ 

**Load factor** - is a number of elements in a hash table divided by the number of slots

Complexity of Insert, Delete and Find operations equals to a load factor of a hash table

### Rehashing
To improve a load factor (to manage distribution of the elements in a hash function)

It's a process to modify a hash table with increased number of slots with O(n+m) complexity

### Open Address Hashtables
Sometimes solving the collisions using Chaining method can be very memory consuming. If so, we can use a technique that simplifies hashtable design - open address hastable.

#### The main idea:
- when our function returns a slot *j* which is already occupied,
- then we look at slot *j+1*
- if it is empty we store our key-value there, if not...
- try *j+2*, *j+3* etc until *j*
- if no place found -> rehashing a table

#### Pros:
- Better space utilization compared to Chaining
- Solves clustering problem

#### Cons:
- Needs rehashing
- 'cache locality'

### Universal hash families

This is a family of hash functions, not just a single function.<br>
Each hash function is going to take keys into the slots of a hash table.<br>
For any key we randomly choose a hash function from a hash family.<br>
That helps to avoid collisions.


*Family of hash functions* $ H:\{h_1,...,h_N\} $
$$ h_i: Keys \rightarrow \{0,...,m-1\} $$

***Guarantee***: *if we randomly choose two different functions $h_i, h_j$ from this family*:
 for any keys $k_i, k_j$ where $k_i \neq k_j$
 probability that they collide for randomly chosen hash function:
 $$\mathbb{P}_{h\in H}(h(k_i)=h(k_j)) \le \frac{c}{m} \sim \frac{1}{m}  $$
    *which means keys are randomly distributed in slot and its randomness is over a randomly chosen hash function.*<br>
Although the average length of chains is small, there may be few big outliers, so it's not a *perfect* solution

### Perfect hashing

It's about finding a hash function and creating a hashtable where no collisions possible (strong guarantee).

One of the techniques known - ***two-level hashtables***

#### Idea:

Assume we know the number *n* of distinct keys to be inserted into a hashtable, then:
- Choose a random hash function: $h\in H$
- Create a hashtable with $Kn^2$ slots, where K is a parameter to be determined
- Insert each key into this *big* hashtable
- If collision happens: *abort* and redo a procedure

**Two-level hashtable** is a hash table whose slots are themselves hash tables 



### Cuckoo hashing

Key advantage is that deletion and lookup operation is guaranteed O(1) time complexity. Insertion is O(1) in average.

#### Idea:
- we use two hashtables
- we use random hash function for each of two hashtables
- every key can be found either in the first hashtable or in the second
- when we insert a new key $k$, we try first to put it in the fisrt hashtable,
- if it is empty -> done, else:
    - kick out the key that was initially in the first hashtable $k_1$ to the second hashtable
    - if the slot in the second hashtable is free -> done, else:
        - displace the key in the second hashtable $k_2$ into the first hashtable
    - repeat

#### Pros:
- no chaining
- no open addressing

## Problems

There are sorts of problems for hashtables

#### Leetcode 1. Two Sum
**Given an array of integers nums and an integer target, return indices of the two numbers such that they add up to target**

You may assume that each input would have exactly one solution, and you may not use the same element twice.

In [4]:
def twoSum(self, nums: int, target: int):
# here is the naive version that takes O(n2) time complexity:
    # n = len(nums)
    # for i in range(n):
    #     for j in range(i+1, n):
    #         if nums[i] + nums[j] == target and i != j:
    #             return [i, j]

# using hashtable (python dictionary) we have O(n) complexity:   
    d = {}
    n = len(nums)
    for i in range(n):
        key = target - nums[i]
        if key in d:
            return [d[key], i]
        d[nums[i]] = i