# Notes for Hash Tables
* Hash Tables are like numbered Lists or Dictionaries;
    - Dictionaries have key-value pairs, e.g. "nails" : 1000 in itemname/quantity pair
    - Hash Tables add an Address or index;
        - When the input is "nails"
        - The hashing function returns an address ("2") and the value (1000)
   - Python Dictionaries are actually built on Hash Tables!
        
* Hashing functions have 2 functions are 
    - **One-way only.** Hash("nails") gets 2, but there is no way to retrieve "nails" from (2)
    - **Deterministic:** Every time you Hash "nails", the input will be "2" 100% of the time.
    

### Hash Table by Analogy

* Consider a Hash Table like a drawer in a hardware store;
    - each address (1, 2, 3...) is like a separate drawer
    - hashing is deciding in which address/to which drawer each item or key-value pair goes in;
    - each time you need to retrieve the quantity, you simply hash the itemname to obtain the drawer/address you require
    
### Hash Table Collisions
* You can store multiple items in the same drawer/store multiple key-value pairs in the same address
    - you *can* put multiple items in the same drawer, without overwriting
    - if there are overlaps or 'collisions', you simply need to loop through the pairs to find the right one / rummage around in the drawer
    - This technique is called **SEPARATE CHAINING**;
        - if you have multiple key-value pairs, they are stored within a list or LinkedList
        
   - an alternative option is **LINEAR PROBING** that limits each address to one key:value pair;
   - if the hashed address is full, you simply try the next address until you find an empty one; this is an example of **OPEN ADDRESSING**

## Hash Table Big O

Big O of operations for Hash Table


1. **HASH** an item  O(1)
   - for any given key with n letters, hashing takes the same number of operations (once for each letter);
   - therefore, hashing has O(1) -- constant time


2. **SET**  an Item O(1)
    - This has the same O as appending to a Linked List; O(1) 
    

3. **GET**  a Key O(1)
    - This is an abstraction; 
    - In the worst case scenario, a HT of n objects will have all n objects be hashed to the same address. Getting the hash is O(1) but to iterate through n objects will take O(n)
    - However, the assumption with a hash table is that n objects will be evenly distributed across the addresses and the address space is large enough that collisions are avoided, meaning that you will not need to iterate through n objects; this approximates to O(1)
    
3b. Search for a **Value** (?)
    - If you simply wish to lookup a value with no Key, you need to search every element of a HT
    - Therefore, key lookup is O(1) (just the speed of Hash function), but value lookup is dependent on the current size of the HT

### 4.0 HashTable Constructor & Node class
- the optimal number of addresses is usually a prime number, e.g. 7 
- this increases the randomness of how key-value pairs are distributed through the hash table, thereby reducing the number of potential collisions
- so, the optimal address is range(7) e.g. (0, 1, 2, 3, 4, 5, 6)
    - e.g. number addresses from 0 to 7, but remove 7 to still have a prime number of addresses

In [5]:
# Creating HashTable Class
class HashTable:                       # initializing the node class
    def __init__(self, size = 7):      # sets default size=7
        self.data_map = [None] * size  # creates a data_map of 7 [Nones]
        
    def hash(self, key):
        my_hash = 0
        for letter in key:         
            my_hash = (my_hash         # difficult to backcalculate 
                      + ord(letter)    # takes ASCII value of each leter
                      * 23             # 23 is prime number
                      ) % len(self.data_map)
        # here, len(self.data_map) == 7
        # so modulo 7 will always return a number from 0 to 6
        # which gives us a clever way to find a hash address
        return my_hash
    
    def print_table(self):
        for i, val in enumerate(self.data_map):
            print(i, " : ", val)

In [6]:
my_hash_table = HashTable()
my_hash_table.print_table()

0  :  None
1  :  None
2  :  None
3  :  None
4  :  None
5  :  None
6  :  None


### 4.1 HashTable `set_item` and `get_item`
- set_item takes a key-value pair and
    - hashes the key to find the address
    - stores the key-value pair at the address
- get_item takes a key-value pair and
    - hashes the key to find the address
    - if it exists, retrieves the key_value pair here
    - else, returns None


In [24]:
# Creating HashTable set_item
class HashTable:                       # initializing the node class
    def __init__(self, size = 7):      # sets default size=7
        self.data_map = [None] * size  # creates a data_map of 7 [Nones]
        
    def hash(self, key):
        my_hash = 0
        for letter in key:         
            my_hash = (my_hash         # difficult to backcalculate 
                      + ord(letter)    # takes ASCII value of each leter
                      * 23             # 23 is prime number
                      ) % len(self.data_map)
        # here, len(self.data_map) == 7
        # so modulo 7 will always return a number from 0 to 6
        # which gives us a clever way to find a hash address
        return my_hash
    
    def print_table(self):
        for i, val in enumerate(self.data_map):
            print(i, " : ", val)
    
    def set_item(self, key, value):
        index = self.hash(key)       # hashing to create the address
        if self.data_map[index] == None:
            self.data_map[index] = []      # creating empty list at address
        self.data_map[index].append([key, value])

In [25]:
my_hash_table = HashTable()

my_hash_table.set_item('bolts', 1400)
my_hash_table.set_item('washers', 50)
my_hash_table.set_item('lumber', 70)


my_hash_table.print_table()

0  :  None
1  :  None
2  :  None
3  :  None
4  :  [['bolts', 1400], ['washers', 50]]
5  :  None
6  :  [['lumber', 70]]


In [34]:
# Creating HashTable get_item
class HashTable:                       # initializing the node class
    def __init__(self, size = 7):      # sets default size=7
        self.data_map = [None] * size  # creates a data_map of 7 [Nones]
        
    def hash(self, key):
        my_hash = 0
        for letter in key:         
            my_hash = (my_hash         # difficult to backcalculate 
                      + ord(letter)    # takes ASCII value of each leter
                      * 23             # 23 is prime number
                      ) % len(self.data_map)
        # here, len(self.data_map) == 7
        # so modulo 7 will always return a number from 0 to 6
        # which gives us a clever way to find a hash address
        return my_hash
    
    def print_table(self):
        for i, val in enumerate(self.data_map):
            print(i, " : ", val)
    
    def set_item(self, key, value):
        index = self.hash(key)       # hashing to create the address
        if self.data_map[index] == None:
            self.data_map[index] = []      # creating empty list at address
        self.data_map[index].append([key, value])
        
    def get_item(self, key):
        index = self.hash(key)       # hashing to create the address
        if self.data_map[index] is not None:
            for list in self.data_map[index]:
                if list[0] == key:
                    return list [1]  # returns the value of pair
        return None

In [36]:
my_hash_table = HashTable()

my_hash_table.set_item('bolts', 1400)
my_hash_table.set_item('washers', 50)
my_hash_table.set_item('lumber', 70)


my_hash_table.print_table()

0  :  None
1  :  None
2  :  None
3  :  None
4  :  [['bolts', 1400], ['washers', 50]]
5  :  None
6  :  [['lumber', 70]]


In [37]:
print(my_hash_table.get_item('lumber'))
print(my_hash_table.get_item('bolts'))
print(my_hash_table.get_item('volts'))

70
1400
None


### 4.2 HashTable `keys`
- `keys` method goes through the HT, and extracts all the keys from each index

In [29]:
# Creating HashTable get_item
class HashTable:                       # initializing the node class
    def __init__(self, size = 7):      # sets default size=7
        self.data_map = [None] * size  # creates a data_map of 7 [Nones]
        
    def hash(self, key):
        my_hash = 0
        for letter in key:         
            my_hash = (my_hash         # difficult to backcalculate 
                      + ord(letter)    # takes ASCII value of each leter
                      * 23             # 23 is prime number
                      ) % len(self.data_map)
        # here, len(self.data_map) == 7
        # so modulo 7 will always return a number from 0 to 6
        # which gives us a clever way to find a hash address
        return my_hash
    
    def print_table(self):
        for i, val in enumerate(self.data_map):
            print(i, " : ", val)
    
    def set_item(self, key, value):
        index = self.hash(key)       # hashing to create the address
        if self.data_map[index] == None:
            self.data_map[index] = []      # creating empty list at address
        self.data_map[index].append([key, value])
        
    def get_item(self, key):
        index = self.hash(key)       # hashing to create the address
        if self.data_map[index] is not None:
            for list in self.data_map[index]:
                if list[0] == key:
                    return list [1]  # returns the value of pair
        return None
    
    def keys(self):
        all_keys = []
        for container in self.data_map:
            if container is not None:
                for key_value_pair in container:
                    all_keys.append(key_value_pair[0])
        return all_keys

In [30]:
my_hash_table = HashTable()

my_hash_table.set_item('bolts', 1400)
my_hash_table.set_item('washers', 50)
my_hash_table.set_item('lumber', 70)


my_hash_table.print_table()

0  :  None
1  :  None
2  :  None
3  :  None
4  :  [['bolts', 1400], ['washers', 50]]
5  :  None
6  :  [['lumber', 70]]


In [31]:
my_hash_table.keys()

['bolts', 'washers', 'lumber']

### 4.3 HashTable Sample Interview question

There are two lists:
    - List_A = [1, 3, 5]
    - List_B = [2, 4, 5]
    
How do we determine which items are in common?

#### Naive approach
- Nested for loops;
    - For every item in List A...
    - Compare to every item in List B...
- Completing in O(n^2)

#### Hash Table Approach
- Key knowledge: the same key will result in the same hash number 
- Steps:
    1. Put List A into a dictionary; the value of each key as "True"
    2. For each item in List B, search through the dictionary. If no match, return false
- Since this loops through each list only once each time, this runs in O(2n), simplifying to O(n)
- HashTable/Dictionary approach is way more efficient

In [42]:
def item_in_common(list1, list2):
    dict_list1 = {}
    for item in list1:
        dict_list1[item] = True
    for item in list2:
        if item in dict_list1:
            return True
    return False

In [44]:
list1 = ["mary", "elizabeth", "winstead"]
list2 = ["matthew", "mccough", "mary"]
item_in_common(list1, list2)

True

False

In [29]:
# Creating HashTable get_item
class HashTable:                       # initializing the node class
    def __init__(self, size = 7):      # sets default size=7
        self.data_map = [None] * size  # creates a data_map of 7 [Nones]
        
    def hash(self, key):
        my_hash = 0
        for letter in key:         
            my_hash = (my_hash         # difficult to backcalculate 
                      + ord(letter)    # takes ASCII value of each leter
                      * 23             # 23 is prime number
                      ) % len(self.data_map)
        # here, len(self.data_map) == 7
        # so modulo 7 will always return a number from 0 to 6
        # which gives us a clever way to find a hash address
        return my_hash
    
    def print_table(self):
        for i, val in enumerate(self.data_map):
            print(i, " : ", val)
    
    def set_item(self, key, value):
        index = self.hash(key)       # hashing to create the address
        if self.data_map[index] == None:
            self.data_map[index] = []      # creating empty list at address
        self.data_map[index].append([key, value])
        
    def get_item(self, key):
        index = self.hash(key)       # hashing to create the address
        if self.data_map[index] is not None:
            for list in self.data_map[index]:
                if list[0] == key:
                    return list [1]  # returns the value of pair
        return None
    
    def keys(self):
        all_keys = []
        for container in self.data_map:
            if container is not None:
                for key_value_pair in container:
                    all_keys.append(key_value_pair[0])
        return all_keys

In [30]:
my_hash_table = HashTable()

my_hash_table.set_item('bolts', 1400)
my_hash_table.set_item('washers', 50)
my_hash_table.set_item('lumber', 70)


my_hash_table.print_table()

0  :  None
1  :  None
2  :  None
3  :  None
4  :  [['bolts', 1400], ['washers', 50]]
5  :  None
6  :  [['lumber', 70]]


In [31]:
my_hash_table.keys()

['bolts', 'washers', 'lumber']