## Very simple implementation (Simple hash table)

This deals with generating slot or index to any "key" value. Perfect hashing or perfect hash function is the one which assigns a unique slot for every key value. Sometimes, there can be cases where the hash function generates the same index for multiple key values. The size of the hash table can be increased to improve the perfection of the hash function 

In [96]:
# 10 size hash table
hash_table = [None] * 10
print(hash_table)

[None, None, None, None, None, None, None, None, None, None]


In [97]:
def hashing_func(key):
    return key % len(hash_table)

# Simple hash function that returns the modulus of the length of the
# table. 

print(hashing_func(10))
print(hashing_func(25))

0
5


In [98]:
# Inserting data into hash table.
def insert(hash_table,key,value):
    hash_key = hashing_func(key)
    hash_table[hash_key] = value
    
def delete(hash_table,key):
    hash_key = hashing_func(key)
    hash_table[hash_key] = None
    
def search(hash_table,key):
    hash_key = hashing_func(key)
    return hash_table[hash_key]
    
insert(hash_table,10,'Nepal')
print(hash_table)

['Nepal', None, None, None, None, None, None, None, None, None]


In [5]:
insert(hash_table,25,'USA')
print(hash_table)

['Nepal', None, None, None, None, 'USA', None, None, None, None]


In [10]:
delete(hash_table,5)

print(hash_table)

print(search(hash_table,0))

['Nepal', None, None, None, None, None, None, None, None, None]
Nepal


### Collision: 

A collision occurs when two items/value get the same slot/index, i.e the hashing function generates same slot number for multiple items. If proper collision resolution steps are not taken then the previous item in the slot will be replaced by the new item whenever the collision occurs. 

Eg:


In [12]:
insert(hash_table,20,'India')
print(hash_table)

['India', None, None, None, None, None, None, None, None, None]


As you can see that 'Nepal' is replaced by 'India' as the first item of the hash table because the result of ```hashing_func``` for keys 10 and 20 is the same i.e 0


### Collision Resolution:

### 1) Chaining:
This allows multiple items to exist in the same slot/index. We place all the elements that has to the same slot into the same linked list. This slot 'j' contains a pointer to the head of the list of all stored elements that hash to 'j'.


In [29]:
hash_table = [[] for _ in range(10)]
print(hash_table)

[[], [], [], [], [], [], [], [], [], []]


In [30]:
def chained_hash_insert(hash_table,key,value):
    hash_key = hashing_func(key)
    hash_table[hash_key].append(value)
    
def chained_hash_delete(hash_table,key):
    hash_key = hashing_func(key)
    hash_table[hash_key] = []
    
def chained_hash_search(hash_table,key):
    hash_key = hashing_func(key)
    return(hash_table[hash_key])

In [31]:
chained_hash_insert(hash_table,10,'Nepal')
print(hash_table)
print("\r\n")

chained_hash_insert(hash_table,25,'USA')
print(hash_table)
print("\r\n")

chained_hash_insert(hash_table,20,'India')
print(hash_table)
print("\r\n")

[['Nepal'], [], [], [], [], [], [], [], [], []]


[['Nepal'], [], [], [], [], ['USA'], [], [], [], []]


[['Nepal', 'India'], [], [], [], [], ['USA'], [], [], [], []]




In [33]:
chained_hash_delete(hash_table,5)
print(hash_table)

[['Nepal', 'India'], [], [], [], [], [], [], [], [], []]


In [36]:
chained_hash_search(hash_table, 20)

'Nepal'

# Hash functions

1) **The division method:**

```h(k) = k mod(m)```

In [52]:
def d_hashing_func(k,m):
    return (k % m)

d_hashing_func(1234,12)

10

Avoid ```m = 2 ^ p``` because h(k) will always be p-lowest bit of k. For example

In [62]:
a1 = 0b100110100
a2 = 0b11101110101
a3 = 0b1100000111
m = 2 ** 3

h1 = d_hashing_func(a1,m)
h2 = d_hashing_func(a2,m)
h3 = d_hashing_func(a3,m)

print(bin(h1))
print(bin(h2))
print(bin(h3))

## Permuting the characters of k doesn't change it's hash value

0b100
0b101
0b111


2) **The multiplication method**:

```h(k) = floor(m(kA mod 1))``` where k is key and A can be any constant value between 0 and 1. both k and A are multiplied and their fractional part is seperated. 

The advantage here is that the value of m is not critical. we typically chose this as a power of 2. ```m = 2 ^ p``` because it is easier for computer to perform. 

In [76]:
import numpy as np

def m_hashing_func(k,m,A=0.618033):
    return np.floor(m * ((k * A) % 1))

k = 12453
m = 2 ** 11

print(m_hashing_func(k,m))

747.0


3) **The Universal method:**

```h(k) = ((a * k + b) mod p ) mod m``` where k = key, p = large prime number, a & b = another prime number


In [93]:
def lc_hashing_func(k,m): #lc = linear combination
    p = 524287
    a = 8191
    b = 127
    return ((a * k + b) % p) % m

k = 12453
k2 = 12454
m = 2 ** 11

print(lc_hashing_func(k,m))
print(lc_hashing_func(k2,m))

156
155


# Open Addressing

In [101]:
def hash_insert(hash_table,k,val,m):
    i = 0
    for i in range(m):
        hash_key = m_hashing_func(k,i)
        if hash_table[hash_key] == None:
            hash_table[hash_key] = val
            return h
        else:
            i += 1


In [103]:
k = 12453
m = 8
hash_table = [None] * m
print(hash_table)
val = 12
hash_insert(hash_table,k,val,m)

[None, None, None, None, None, None, None, None]


TypeError: list indices must be integers or slices, not numpy.float64