## Very simple implementation (Simple hash table)

This deals with generating slot or index to any "key" value. Perfect hashing or perfect hash function is the one which assigns a unique slot for every key value. Sometimes, there can be cases where the hash function generates the same index for multiple key values. The size of the hash table can be increased to improve the perfection of the hash function 

In [1]:
# 10 size hash table
hash_table = [None] * 10
print(hash_table)

[None, None, None, None, None, None, None, None, None, None]


In [2]:
def hashing_func(key):
    return key % len(hash_table)

# Simple hash function that returns the modulus of the length of the
# table. 

print(hashing_func(10))
print(hashing_func(25))

0
5


In [3]:
# Inserting data into hash table.
def insert(hash_table,key,value):
    hash_key = hashing_func(key)
    hash_table[hash_key] = value
    
def delete(hash_table,key):
    hash_key = hashing_func(key)
    hash_table[hash_key] = None
    
def search(hash_table,key):
    hash_key = hashing_func(key)
    return hash_table[hash_key]
    
insert(hash_table,10,'Nepal')
print(hash_table)

['Nepal', None, None, None, None, None, None, None, None, None]


In [4]:
insert(hash_table,25,'USA')
print(hash_table)

['Nepal', None, None, None, None, 'USA', None, None, None, None]


In [5]:
delete(hash_table,5)

print(hash_table)

print(search(hash_table,0))

['Nepal', None, None, None, None, None, None, None, None, None]
Nepal


### Collision: 

A collision occurs when two items/value get the same slot/index, i.e the hashing function generates same slot number for multiple items. If proper collision resolution steps are not taken then the previous item in the slot will be replaced by the new item whenever the collision occurs. 

Eg:


In [6]:
insert(hash_table,20,'India')
print(hash_table)

['India', None, None, None, None, None, None, None, None, None]


As you can see that 'Nepal' is replaced by 'India' as the first item of the hash table because the result of ```hashing_func``` for keys 10 and 20 is the same i.e 0


### Collision Resolution:

### 1) Chaining:
This allows multiple items to exist in the same slot/index. We place all the elements that has key to the same slot into the same linked list. This slot 'j' contains a pointer to the head of the list of all stored elements that hash to 'j'.


In [7]:
hash_table = [[] for _ in range(10)]
print(hash_table)

[[], [], [], [], [], [], [], [], [], []]


In [8]:
def chained_hash_insert(hash_table,key,value):
    hash_key = hashing_func(key)
    hash_table[hash_key].append(value)
    
def chained_hash_delete(hash_table,key):
    hash_key = hashing_func(key)
    hash_table[hash_key] = []
    
def chained_hash_search(hash_table,key):
    hash_key = hashing_func(key)
    return(hash_table[hash_key])

In [9]:
chained_hash_insert(hash_table,10,'Nepal')
print(hash_table)
print("\r\n")

chained_hash_insert(hash_table,25,'USA')
print(hash_table)
print("\r\n")

chained_hash_insert(hash_table,20,'India')
print(hash_table)
print("\r\n")

[['Nepal'], [], [], [], [], [], [], [], [], []]


[['Nepal'], [], [], [], [], ['USA'], [], [], [], []]


[['Nepal', 'India'], [], [], [], [], ['USA'], [], [], [], []]




For searching in **chaining** method, first we have to calculate the hash value of the element. Then we can go to the index for the hash value. If the element is in the first index of the linked list stored in that index, then we return that value. If the element is not in the first index of the linked list, we linearly search for the element in the linked list and we return the element. This traversing ofcourse comes with some cost. Although the element is stored in the same index using linked list, when the load factor is low, it is more efficient to use **Open addressing** for conflict resolution (see below).

In [10]:
chained_hash_delete(hash_table,5)
print(hash_table)

[['Nepal', 'India'], [], [], [], [], [], [], [], [], []]


In [11]:
chained_hash_search(hash_table, 20)

['Nepal', 'India']

In [12]:
chained_hash_delete(hash_table,10)
print(hash_table)

[[], [], [], [], [], [], [], [], [], []]


# Hash functions

### Objective of Hash functions:
- Minimize collisions
- Uniform distribution of hash values
- Easy to calculate
- Resolve any collisions

1) **The division method:**

```h(k) = k mod(m)```

In [13]:
def d_hashing_func(k,m):
    return (k % m)

d_hashing_func(1234,12)

10

Avoid ```m = 2 ^ p``` because h(k) will always be p-lowest bit of k. For example

In [14]:
a1 = 0b100110100
a2 = 0b11101110101
a3 = 0b1100000111
m = 2 ** 3

h1 = d_hashing_func(a1,m)
h2 = d_hashing_func(a2,m)
h3 = d_hashing_func(a3,m)

print(bin(h1))
print(bin(h2))
print(bin(h3))

## Permuting the characters of k doesn't change it's hash value

0b100
0b101
0b111


2) **The multiplication method**:

```h(k) = floor(m(kA mod 1))``` where k is key and A can be any constant value between 0 and 1. both k and A are multiplied and their fractional part is seperated. 

The advantage here is that the value of m is not critical. we typically chose this as a power of 2. ```m = 2 ^ p``` because it is easier for computer to perform. 

In [15]:
import numpy as np

def m_hashing_func(k,m,A=0.618033):
    return int(np.floor(m * ((k * A) % 1)))

k = 12453
m = 12

print(m_hashing_func(k,m))

4


3) **The Universal method:**

```h(k) = ((a * k + b) mod p ) mod m``` where k = key, p = large prime number, a & b = another prime number


In [16]:
def lc_hashing_func(k,m): #lc = linear combination
    p = 524287
    a = 8191
    b = 127
    return ((a * k + b) % p) % m

k = 12453
k2 = 12454
k5 = 1224
m = 2 ** 11

print(lc_hashing_func(k,m))
print(lc_hashing_func(k2,m))
print(lc_hashing_func(k5,m))

156
155
970


# Open Addressing

 In open addressing, all elements are stored in the hash table itself. That is, each table entry contains either an element of the dynamic set or NIL. When searching for an element, we systematically examine table slots until the desired element is found or it is clear that the element is not in the table. This is called **linear probing**. The method is called **open addressing** because the table is open for the element to position if the desired index is occupied by other elements.

In [17]:
def hash_insert(hash_table,k,m):
    i = 1
    while (i != m):
#         print(i)
        hash_key = lc_hashing_func(k,i)
        if hash_table[hash_key] == None:
            hash_table[hash_key] = k
            return hash_key
        else:
            i += 1
    print("hash table overflow")

In [18]:
k = 12459
m = 4
hash_table = [None] * m
hash_insert(hash_table,k,m)
print(hash_table)
hash_insert(hash_table,11,m)
print(hash_table)
hash_insert(hash_table,12,m)
print(hash_table)
hash_insert(hash_table,13,m)
print(hash_table)
hash_insert(hash_table,)

[12459, None, None, None]
hash table overflow
[12459, None, None, None]
[12459, 12, None, None]
[12459, 12, 13, None]


When searching an element, hash function is used to find the index of the element. If the element is not present in the index, because of the collision, then finding the item will also involve **linear probing** i.e linear search by incrementing the index by 1 and searching again until the element is found.

In [22]:
def hash_search(hash_table,k,m):
    i = 0
    while True:
        hash_key = m_hashing_func(k,i)
        if hash_table[hash_key] == k:
            return hash_key
        i += 1
        
        if (hash_table[hash_key] == None or i == m):
            print("Element not in hash table because the table was overflowed, or the element is not stored in the table")
            break
    return None

In [20]:
hash_search(hash_table,345,m)

$$ Load Factor = \frac{n}{m}$$
where, $n$ = number of key hashed, and $m$ = table size 

As long as the load factor is low, the linear probing method should work reasonably well. 

Another way to deal with collision is called **chaining**, which is explained above.

#### Linear Probing: 
We can use linear probing as explained above, but it can result in **Primary clustering**. In other words, keys might bunch together in an array, while the large proportion of the space in the array is unoccupied. There are alternatives of Linear probing, called **Plus 3 rehash**. In this method, index value is increased by 3 instead of 1. 

#### Quadratic Probing
Quadratic probing will square the number of failed attempt to decide how far the index to look for / or insert next. 

#### Double Hashing
Double hashing applies a second hash function to the key when the collision occurs. The number from the second hash function gives the number of index from the point where the collision occurs to look for/ or insert the element. 


# Closed addressing
This involves chaining items that have collided in the linked list or in other suitable data structure. (See above for implementation)