
## Implement a HashMap

1. put: enter value into HashMap
2. get: obtain value from Hashmap
3. get_bucket_index: calculate index for each bucket based on hashcode
4. get_hash_code: calculate hashcode using hash function
5. size: return number of entries

### Hash function for string

For a string, say `abcde`, a very effective function is treating this as number of prime number base `p`. 
Let's elaborate this statement. 

For a number, say `578`, we can represent this number in base 10 number system as $$5*10^2 + 7*10^1 + 8*10^0$$

Similarly, we can treat `abcde` in base `p` as $$a * p^4 + b * p^3 + c * p^2 + d * p^1 + e * p^0$$

Here, we replace each character with its corresponding ASCII value. 

A lot of research goes into figuring out good hash functions and this hash function is one of the most popular functions used for strings. We use prime numbers because the provide a good distribution. The most common prime numbers used for this function are 31 and 37.

### Collison Handling

1. **Separate chaining** - Separate chaining is a clever technique where we use the same bucket to store multiple objects. The bucket in this case will store a linked list of key-value pairs. Every bucket has it's own separate chain of linked list nodes.


2. **Open Addressing** - In open addressing, we do the following:
 * If, after getting the bucket index,  the bucket is empty, we store the object in that particular bucket

 * If the bucket is not empty, we find an alternate bucket index by using another function which modifies the current hash code to give a new code. This process of finding an alternate bucket index is called **probing**. A few probing techniques are - linear probing, quddratic probing, or double hashing. 
 

***Separate chaining:
    In case of collision, the `put()` function uses the same bucket to store a linked list of key-value pairs. 
    Every bucket will have it's own separate chain of linked list nodes.***
    
    If the key is a new one, hence not found in the chain (LinkedList), then following two cases arise:
         1. The key has generated a new bucket_index
         2. The key has generated an existing bucket_index. 
            This event is a Collision, i.e., two different keys have same bucket_index.

    In both the cases, we will prepend the new node (key, value) at the beginning (head) of the chain (LinkedList).
    Remember that each `bucket` at position `bucket_index` is actually a chain (LinkedList) with 1 or more nodes.
    
<img style="float: center;" src="bucket2.png"><br>

### Practical Consideration for Time Complexity  of `put` and `get` Operation
**Note:** Theoretically, the worst case time complexity of `put` and `get` operations of a HashMap can be $O(\dfrac{n}{b}) \approx O(n)$, when $b < < n$ . However, our hashing functions are sophisticated enough that in real-life we easily avoid collisions and never hit `O(n)`. Rather, for the most part, we can safely assume that the time complexity of `put` and `get` operations will be `O(1)`. 

Therefore, when you are asked to solve any practice problem involving HashMaps, assume the worst case time complexity for `put` and `get` operations to be `O(1)`.

In [3]:
# Implement HashMap with Array
class HashMap:
    def __init__(self, size = 10):
        self.bucket_array = [None for _ in range(size)]
        self.p = 31 # prime number for hashfunction
        self.num_entries = 0 
    
    def size(self):
        return self.num_entries #number of entries in hashMap
    
    def get_bucket_index(self, key):
        return self.get_hash_code(key)
        
    def get_hash_code(self, key):
        hashcode = 0
        i = len(key)-1
        number_bucket = len(self.bucket_array)
        
        for char in key:
            coef = (self.p**i)%number_bucket # compression
            hashcode += ord(char)*coef # ord is to obtain the ASCII value of hash function
            i -= 1
        # compress hashcode
        return hashcode % number_bucket # compress twice
        
    def put(self, key, value):
        ''' 
        O(n/b): n is the number of entries, b is the number of buckets in the HashMap
        n/b is the load factor and should not be more than 0.7 ideally. number of entries < number of buckets
        If load factor > 0.7 => rehashing by increase the number of buckets and recalculate bucket index
        '''
        pass
    
    def get(self, key):
        '''
        O(n/b) runtime. In practice, put and get function can be close to O(1)
        '''
        pass
    
    def delete(self, key):
        '''
        O(n/b) and no resizing the bucket size
        '''
        pass

In [10]:
hm = HashMap()
print(hm.get_bucket_index('one'))
print(hm.get_bucket_index('two'))
print(hm.get_bucket_index('three'))
print(hm.get_bucket_index('neo'))

2
6
6
2
