# Hash Tables 

Hash tables are one of the most widly used data structures since when implemented correctly, they allow for essentially O(1) time lookups. Entries in a hash table consist of a key and a value. The value is what is stored in the table, and the key is a unique identifier used to find the place in the hash table in which to put the value. 

Or order to determine where in the hash table to put a certain value, the key is passed through a hash function. This hash function maps the key to an index. In order to your hash table to be most effecient, you want the size of hash table (m) to be the same as the number of values you want to put into the hash table (n). The proportion of the number of values stores in hash table to the hash table size (n/m) is known as the **load factor ($\alpha$)**. 

Sometimes two keys keys can map to the same index, this is refered to as a **collision**. There are a couple of different ways to resolve collisions:

1. **Chaining** - one of the easiest to implement collision resolution techniques, chaining involves using a linked list to chain together values with keys that map to the same address. A hash table with chaining implemented is essentially an array of linked lists. While easy to implement, this method is not the most space effecient since it uses two different underlying data structures. Additionally, if a chain becomes long hash table accessing can start to behave closer to O(n) time, since searching through a linked list is O(n). 

2. **Double (or triple, quadruple, etc) Hashing** -  involves applying additional hash functions to any keys that collide. The additional hash functions can all be the same or they can be different hash functions, as long as the hash functions are applied in the same order each time. This method does not add any space complexity. However, eventually all open slots in a hash table can fill up which will require a **rehashing**. This is where a new hash table is created that is double the size of the old table, and all the entries in the original are rehashed into the new table. This is essentially the same method that is used for dynamic array resizing. 

3. **Linear Probing** - Linear Probing involves going to the next availble spot in a hash table when the inital spot is full. This does not add to the space complexity, but does induce clustering into the hash table (where values are clustered together in certain regions of the hash table). Similar to dealing with a long chain, when there is a lot of clustering collisions can start to take closer to O(n) time to resolve. 

4. **Quadratic Probing** - Quadratic probing is the same as linear probing, but instead of going to the immediately adjacent slots you first go 1^2 slots away, then if that is full you go 2^2 slots away, then 3^2 slots away, and so on and so on. 

If you make a hash table too small you'll have a lot of collisions which increases the time complexity, but if you make it too big then you risk wasting a lot of memory. Generally, a load factor of around $\alpha = 0.75$ is considered a good balance between time and space complexity.

## Dicts : Python's built in HashMap

In [34]:
hash_map = {
    "name" : "tucker",
    "age" : 22,
    "eyes" : "blue",
    "hair" : "blonde"
           }

#Make a COPY of the dict
new_hash_map = hash_map.copy()
print(new_hash_map)

# Initalize a new dicts with specified keys and a default value
print(hash_map.fromkeys(["key1", "key2", "key3"], "default_value"))

# Two ways to get a value from the key 
print(hash_map.get("name")) 
print(hash_map["name"])
assert hash_map.get("name") == hash_map["name"]

# display a list of items, MUST CAST TO LIST 
print(list(hash_map.items()))

# get a list of the dicts keys
print(list(hash_map.keys()))

# remove item with certain key 
hash_map.pop("hair")
print(hash_map)

# insert a new key val pair 
hash_map["height"] = "5'10"
print(hash_map)

# remove the last inserted item 
hash_map.popitem()
print(hash_map)

# get a specified key, or insert if it doesnt exsist
print(hash_map.setdefault("eyes", "brown"))
print(hash_map.setdefault("weight", "150"))
print(hash_map)

# return all values in a dict 
print(list(hash_map.values()))

# update a dict with specified key value pairs (essentially add a dict to the present dict)
hash_map.update({"address" : "123 main street", "organ_donor" : True})
print(hash_map)


{'name': 'tucker', 'age': 22, 'eyes': 'blue', 'hair': 'blonde'}
{'key1': 'default_value', 'key2': 'default_value', 'key3': 'default_value'}
tucker
tucker
[('name', 'tucker'), ('age', 22), ('eyes', 'blue'), ('hair', 'blonde')]
['name', 'age', 'eyes', 'hair']
{'name': 'tucker', 'age': 22, 'eyes': 'blue'}
{'name': 'tucker', 'age': 22, 'eyes': 'blue', 'height': "5'10"}
{'name': 'tucker', 'age': 22, 'eyes': 'blue'}
blue
150
{'name': 'tucker', 'age': 22, 'eyes': 'blue', 'weight': '150'}
['tucker', 22, 'blue', '150']
{'name': 'tucker', 'age': 22, 'eyes': 'blue', 'weight': '150', 'address': '123 main street', 'organ_donor': True}


## Sets : Pythons Hash Set (only values, no keys)

Sets are a good way to store a list of values that you need to keep track of, since you can check if a value exsists in a set in O(1) time. 

Sets are unordered, unchangable, and do not allow duplicates. 


In [60]:
hash_set = {"math", "english", "history", "french"}

# add to a set 
hash_set.add("biology")
print(hash_set)

# Check if a value is in a set (in theta(1) time!)
if "english" in hash_set:
    print("yes, english is in the set")
else:
    print("no, english is not in the set")
    
if "physics" in hash_set:
    print("yes, physics is in the set")
else:
    print("no, physics is not in the set")

    
# COMPARING TWO SETS -------------------------------------

set1 = {1, 6, 14, 19}
set2 = {2, 6, 14, 32}
set3 = {4, 16, 90}
set4 = {1, 6}

# get UNION of sets (two ways)
print(set1.union(set2))
assert (set1 | set2) == (set1.union(set2))

# INTERSECTION of sets
print(set1.intersection(set2))
assert (set1 & set2) == set1.intersection(set2)

# DIFFERENCE between sets 
print("items in set1 but not in set 2: ",set1.difference(set2)) 
print("items in set2 but not in set 1: ", set2.difference(set1)) 
assert (set1 - set2) == set1.difference(set2)

# SYMMETRIC Difference
print("items in either of the sets but NOT both", set1.symmetric_difference(set2))
assert (set1 ^ set2) == set1.symmetric_difference(set2)

# Are the sets DISJOINT? (do they have nothing in common?)
print("Are set1 and set2 disjoint?", set1.isdisjoint(set2))
print("Are set1 and set3 disjoint?", set1.isdisjoint(set3))

# SUBSET?
print("is set 1 a subset of set2?", set1.issubset(set2))
print("is set4 a subset of set1?", set4.issubset(set1))
print("is set1 a superset of set4?", set1.issuperset(set4))

#OTHER FUNCTIONS 


{'english', 'math', 'history', 'biology', 'french'}
yes, english is in the set
no, physics is not in the set
{32, 1, 2, 6, 14, 19}
{6, 14}
items in set1 but not in set 2:  {1, 19}
items in set2 but not in set 1:  {32, 2}
items in either of the sets but NOT both {32, 1, 2, 19}
Are set1 and set2 disjoint? False
Are set1 and set3 disjoint? True
is set 1 a subset of set2? False
is set4 a subset of set1? True
is set1 a superset of set4? True
