# Why make a python hashtable? 

Python already has the dictionary collection that already implements a HashTable internally. This gives O(1) look-ups in most cases. Here I am seeking to understand the data structure better by building my own.

Note : A great deal of these notes have been adapted from this fantastic [blog entry](https://coderbook.com/@marcus/how-to-create-a-hash-table-from-scratch-in-python/)

### The Hash Function

The hash functions is what allows for lookups to be done in constant time. The key will be pumped through the hash() function, which will return a "unique" integer, which will be used to calculate the index that the new object will be added.


In [1]:
hash(100)

100

In [2]:
hash('blah')

9018687441287506589


# HashTable Object



In [None]:
class HashTable(object):
    
    def __init__(self, size=10):
        self.array = [None] * size # initialize a table with None / empty values
  
    def hash(self, key): # get the index in our table with a specific key
        return hash(key) % len(self.array) # so here we are taking it and we are turning into some function of the length of the current 
    
    def add(self, key, value):
        # first we are adding to the table based on it's key
        # a case could happen where we are trying to add to an index where something already exists. we will have to deal with it somehow
        ind_to_add = self.hash(key)
        if self.array[ind_to_add] is not None:
            # a situation where index already exists there. There are 2 possibilities. 
            # we can update an existing key-value pair 
            keyvalpairs = self.array[ind_to_add]
            for keyval in keyvalpairs:
                if keyval[0] == key:
                    keyval[1] = value
                break
            else:
                # then we just add to the end of the index
                self.array[index].append([key, value])
        else:
            # if there is nothing in the hashtable then we can add at the index. Just add
            # first create a new array for the index
            self.array[ind_to_add] = []
            self.array[ind_to_add].append([key, value])
    
    def get(self, key, value):
        ind_to_search = self.hash(key)
        # if there is nothing then return error? 
        if self.array[index_to_search] is None: 
            raise KeyError()
        else:
            # then we can at least loop through all fo the key value pairs
            
            for keyvalpairs in self.array[ind_to_search]:
                if keyvalpair[0] == key:
                    return keyvalpair[1]
        
            # we get to the end of the list without doing anything, then we have another problem. 
            raise KeyError()
        
    

So here is version 1 of the algorithm, we don't have the whole data structure fleshed out, but it's a start. 

* the __init__ function creates an empty hashtable, stored int he self.array variable. The default size of the array is 10, and the default value is None. 

* the hash() function calculates the index for the key value to be stored by using the built in hash() func, which generates a unique integer for a particular key. We hash a string into an integer value, and then we mod that value by the size of of the array, which gives us the numerical index in which the key-value pair will be stored.

* the add() fuction attempts to look for a key-value in the hashed index. If the index of the array is still None (the initial value), then it means that nothing has been added yet.  The function will add the key value pair to the location. If the key-value pair already exists, then the function will update the value, and if only other key-value pairs exist at that index, then the function will add to the end of the list. 

* the get() function attemps to look up the key-value pair at the hashed index. It will loop through all the key values at a given hash index, and if found, will return the value corresponding value to the inputted key. If not, the it will reutrn a KeyError()

## Improving on the HashTable



The problem with the current set-up is there are only 10 slots. How can we dynamically increase the size of our hashtable. There are two pieces to this problem. First we want to determine when the Hashtable is too big, and then be able to rearrange and distribute the items in hash table by their new hash value.


In [1]:
class HashTable(object):
    
    def __init__(self, size=10):
        self.array = [None] * size # initialize a table with None / empty values
  
    def hash(self, key): # get the index in our table with a specific key
        return hash(key) % len(self.array) # so here we are taking it and we are turning into some function of the length of the current 
    
    def add(self, key, value):
        # first we are adding to the table based on it's key
        # a case could happen where we are trying to add to an index where something already exists. we will have to deal with it somehow
        ind_to_add = self.hash(key)
        if self.array[ind_to_add] is not None:
            # a situation where index already exists there. There are 2 possibilities. 
            # we can update an existing key-value pair 
            keyvalpairs = self.array[ind_to_add]
            for keyval in keyvalpairs:
                if keyval[0] == key:
                    keyval[1] = value
                break
            else:
                # then we just add to the end of the index
                self.array[index].append([key, value])
        else:
            # if there is nothing in the hashtable then we can add at the index. Just add
            # first create a new array for the index
            self.array[ind_to_add] = []
            self.array[ind_to_add].append([key, value])
        # this is the new function that is now extended
        if self.is_full():
            self.double_hash()
    
    def get(self, key, value):
        ind_to_search = self.hash(key)
        # if there is nothing then return error? 
        if self.array[index_to_search] is None: 
            raise KeyError()
        else:
            # then we can at least loop through all fo the key value pairs
            
            for keyvalpairs in self.array[ind_to_search]:
                if keyvalpair[0] == key:
                    return keyvalpair[1]
        
            # we get to the end of the list without doing anything, then we have another problem. 
            raise KeyError()
            

    """
    new functions here
    """
    
    def is_full(self):
        num_items = 0
        for item in self.array:
            if item is not None:
                num_items += 1
        # this is true if the hashtable is more than 1/2 full 
        return num_items > len(self.array) / 2
    
    
    def double_hash(self):
        # allocate new hash table
        newhash = HashTable(size=(len(self.array)*2))
        
        # for each item in the array then we want to re-add to the array
        for i in range(len(self.array)):
            if self.array[i] is None: 
                continue # nothing to see here folks
            else:
                for keyvalpair in self.array[i]:
                    newhash.add(kevalpair[0], keyvalpair[1])
                    
        # once we have re-added then we replace
        self.array = newhash
        
                