# Hashing

- A hash table is a collection of items which are stored in such a way to make it easier to find them later
- Each position of hash table / slots can hold an item and is named by indexes
- eg. slot 0 holds item 0, slot 1 holds item 1 ect...
- initially all slots are empty and hash table holds no items

# Hash functions
- function to map an item to a slot
- they take any item in the collection and return an integer in the range of slot names between 0 and m-1

# Hash function 1: Remainder Method
- when presented with an item, the item is divded by the table size and this becomes the slot number
- eg. a empty hash table is assigned a size of 11
- thus the hash function is h(item) = item%11


- Item  |  Hash Value
- 54    |  10
- 26    |  4
- 93    |  5
- 17    |  6
- 77    |  0
- 31    |  9

- ^ We are now ready to occupy 6 of 11 slots
- this is the LOAD FACTOR (number of items / table size)

- [0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10]
- [77| x | x | x |26 |93 |17 | x | x | 31| 54]


# Collisions
- What if, whilst implementing remainder method to assign slots to items, two items achieve the same remainder?
- eg. 44%11 = 0  and  77%11 = 0  

# Perfect hash functions
- will map each item to a unique slot
- can be achieved by minimising number of collisions
- goals of a perfect hash function : easy to compute + minimise collisions via evenly distributing items

    
# Hash function 2: Folding method
- divide item into equal size pieces
- add the items together to give resulting hash value

- eg. Item: 4365554601
- becomes 43+65+55+46+01  = 210
- if hash table has 11 slots 
- 210 % 11 = 1  
- so  Item: 4365554601 is assigned to slot 1
    
    
# Hash function 3: Mid Square Method
- square the item, extract some portion of resulting digits
- eg. Item: 44
- 44 ** 2 = 1936 
- If hash table has 11 slots
- extract 93 from 1936
- 93%11 = 5
- assign item 44 to slot 5

In [1]:
ord("w") # we can use this method to hash strings
# thanks to the built in ord value
# which returns anumber for each letter string

119

# Collision probing / Rehashing
- if there is a collision, deal with it by having one of the collided items fit into the next empty slot found in datastructure
- Quadraitc probing. Collision probing varient: skip a slot each time one does a fill
    
# Collision resolution method 2: Chaining.
- Allow each slot to hold a reference to a collection or chain of items. Chaining allows many items to exist at the same location of a hash table.


# Map
- The idea of a dictionary is to use a hash table to get and retrieve items using keys
- This is referred to as mapping


- del - delete a key value pair
- len - return number of key value pairs stored 


In [10]:
class HashTable(object):
    '''
    - HashTable() - creates a new, empty map
    creating a hash table using two lists as a base
    '''
    
    def __init__(self, size):
        self.size = size
        self.slots = [None] * self.size # creates x number of slots
        self.data = [None] * self.size
        
    def hashfunction(self, key, size):
        '''
        hash function: remainder method
        '''
        return key%size
    
    def rehash(self, oldhash, size):
        return(oldhash+1)%size # move along, try to find next empty slot
        
    def put(self, key, data):
        '''
        - put(key, val) - add a new key-value pair to the map + if key is already in the map, replace old value with new value
        '''
        hashvalue = self.hashfunction(key, len(self.slots))
        # once we insert the key to get the hash value
        # which is the index we wish to put the item
        # in the hash table 
        
        # we need to see if key-item is already occupied in given slot
        
        if self.slots[hashvalue] == None:
            self.slots[hashvalue] = key
            self.data[hashvalue] = data
        # collision case - requires a rehash function
        else:
            if self.slots[hashvalue] == key:
                self.data[hashvalue] = data
            else: 
                nextslot = self.rehash(hashvalue, len(self.slots))
                
                # try to next slot
                while self.slots[nextslot] != None and self.slots[nextslot] != key:
                    nextslot = self.rehash(nextslot, len(self.slots))
                
                # set new key if no key is present
                if self.slots[nextslot] == None:
                    self.slots[nextslot] = key
                    self.data[nextslot] = data
                # if new key exist in map, simply replace the old value
                else:
                    # otherwise just replace the old value
                    self.data[nextslot] = data
                    
    def get(self, key):
        '''
        - get(key) - given a key, return the value stored in the map or None otherwise
        '''
        startslot = self.hashfunction(key, len(self.slots)) # tells what slot to start off in search
        data = None
        stop = False
        found = False
        position = startslot
        # find that key-value pair! 
        while self.slots[position] != None and not found and not stop:
            if self.slots[position] == key:
                # we found the key! 
                found = True
                data = self.data[position]
            else:
                #
                position = self.rehash(position, len(self.slots))
                if position == startslot:
                    stop = True
        return data
    
    def __getitem__(self, key):
        return self.get(key)
    
    def __setitem__(self, key, data):
        self.put(key, data)
    
            
    
    
        

In [11]:
h = HashTable(5)

In [12]:
h[1] = "One"
h[2] = "TWO"
h[3] = "THREE"

In [13]:
h[3]

'THREE'