# Hashmaps

A **hashmap** or **hash table** is a ds that maps keys to values, just like a dictionary in Python. The **`lookup, insertion, and deletion`** operations of a hashmap have an average computational cost of **`O(1)`**

Under the hood, **`hashmaps`** are built on top of **`arrays or lists`**. They use a **`hash function`** to convert a hashable key into an index in the array. The function does these steps:

1. Takes a key and returns an int
2. Always returns the same int for the same key
3. Always returns a valid index in the array (not negative or greater than array size)

### Example of a **hash function**



In [1]:
class HashMap:
    def key_to_index(self, key):
        sum = 0
        for letter in key:
            #ord returns unicode value of the character
            sum += ord(letter)
            #The sum is greater than the hashmap size, so to use a valid index we use the
            #modulo (remainder of the division) to get a value in the range of the hashmap
            #This also ensures that the same value is always obteined with the same input
        return sum % len(self.hashmap)

    def __init__(self, size):
        self.hashmap = [None for i in range(size)]

    def __repr__(self):
        buckets = []
        for v in self.hashmap:
            if v != None:
                buckets.append(v)
        return str(buckets)

### **`insert()`** method

In [None]:
def insert(self, key, value):
        index = self.key_to_index(key)
        self.hashmap[index] = (key, value)

### **`get()`** method

In [None]:
def get(self, key):
        index = self.key_to_index(key)
        try:
            return self.hashmap[index][1]
        except Exception:
            raise Exception("sorry, key not found")

___
## Resizing

The implementation of the previous methods implies that the hashmap will have a lot of collisions. This is because we are using a fixed size for the hashmap, and **`key_to_index()`** would return the same index for different keys. 

To solve this, the hashmap should increase the number of slots depending on the amount of keys that are inserted. This can be accomplished by using a **`resize()`** method. 

When **`resizing`**, we create a new hashmap with a larger number of slots, and then we re-insert all the **`key-value`** pairs from the old hashmap into the new one.

In [1]:
class HashMap:
    def insert(self, key, value):
        self.resize()
        index = self.key_to_index(key)
        self.hashmap[index] = (key, value)
        
    def key_to_index(self, key):
        sum = 0
        for letter in key:
            sum += ord(letter)
        return sum % len(self.hashmap)

    def resize(self):
        if len(self.hashmap) == 0:
            self.hashmap = [None]
            return
        current_load = self.current_load()
        #using a very low value (5%) for test
        if current_load < .05:
            return
        else:
            current_hashmap = self.hashmap
            #creates a new hashmap with the size of the previous one multiplied by 10
            self.hashmap = [None for i in range(len(current_hashmap)*10)]
            
            for key_value in current_hashmap:
                if key_value is not None:
                    index = self.key_to_index(key_value[0])
                    self.hashmap[index] = (key_value[0], key_value[1])
        
    def current_load(self):
        number_of_filled_buckets = 0
        for bucket in self.hashmap:
            if bucket is not None:
                number_of_filled_buckets += 1
        if len(self.hashmap) == 0:
            return 1
        else:
            return number_of_filled_buckets / len(self.hashmap)

___
## Linear probing

The previous hashmap algorithms does not handle collisions, so, when **`key_to_index()`** returns a value that already exists in the hashmap, the new values will override the past ones. 

To solve this, **`linear probing`** find the next available slot after the collision index, and place the new **`key-value`** pair there.

___

## **Final code**

In [2]:
class HashMap:
    def insert(self, key, value):
        index = self.key_to_index(key)
        original_index = index
        first_iteration = True

        while self.hashmap[index] is not None and self.hashmap[index][0] != key:
            #this happens when the index has covered the whole hashmap, and returned to the beggining
            if first_iteration is False and index == original_index:
                raise Exception("hashmap is full")
            index += 1 
            if index == len(self.hashmap):
                #index returns to the beggining of the hashmap to find an available place
                index = 0
            first_iteration = False
        self.hashmap[index] = (key, value)

    def get(self, key):
        index = self.key_to_index(key)
        original_index = index
        first_iteration = True

        while self.hashmap[index] is not None:
            if self.hashmap[index][0] == key:
                return self.hashmap[index][1]
            if first_iteration is False and index == original_index:
                raise Exception("sorry, key not found")
            if self.hashmap[index][0] != key:
                index += 1
            if index == len(self.hashmap):
                index = 0
            first_iteration = False
        raise Exception("sorry, key not found")

    def __init__(self, size):
        self.hashmap = [None for i in range(size)]

    def key_to_index(self, key):
        sum = 0
        for c in key:
            sum += ord(c)
        return sum % len(self.hashmap)

    def __repr__(self):
        final = ""
        for i, v in enumerate(self.hashmap):
            if v != None:
                final += f" - {str(v)}\n"
        return final

In [3]:
run_cases = [
    (
        2,
        [
            ("Billy Beane", "General Manager"),
            ("Peter Brand", "Assistant GM"),
        ],
        [(False, None), (False, None)],
    ),
    (
        3,
        [
            ("Art Howe", "Manager"),
            ("Ron Washington", "Coach"),
            ("David Justice", "Designated Hitter"),
        ],
        [(False, None), (False, None), (False, None)],
    ),
]

submit_cases = run_cases + [
    (
        2,
        [
            ("Paul DePodesta", "Analyst"),
            ("Ron Washington", "Coach"),
            ("Chad Bradford", "Pitcher"),
        ],
        [
            (False, None),
            (False, None),
            (True, "hashmap is full"),
        ],
    )
]


def test(size, items, errors):
    hm = HashMap(size)
    print("=====================================")
    inserted_items = {}
    for (key, val), (error_expected, expected_error_message) in zip(items, errors):
        print(f"Inserting ({key}, {val})...")
        try:
            hm.insert(key, val)
            if error_expected:
                print(
                    f"Expected error '{expected_error_message}' but insertion succeeded."
                )
                print("Fail")
                return False
            else:
                inserted_items[key] = val
        except Exception as e:
            if error_expected:
                if str(e) == expected_error_message:
                    print(f"Expected error occurred: {e}")
                else:
                    print(
                        f"Error occurred, but message '{e}' does not match expected '{expected_error_message}'."
                    )
                    print("Fail")
                    return False
            else:
                print(f"Unexpected error occurred during insertion: {e}")
                print("Fail")
                return False
    for key, expected_val in inserted_items.items():
        print(f"Getting {key}...")
        try:
            actual_val = hm.get(key)
            print(f"Expected: {expected_val}, Actual: {actual_val}")
            if actual_val != expected_val:
                print("Fail")
                return False
        except Exception as e:
            print(f"Error getting {key}: {e}")
            print("Fail")
            return False
    print("Pass")
    return True


def main():
    passed = 0
    failed = 0
    skipped = len(submit_cases) - len(test_cases)
    for test_case in test_cases:
        correct = test(*test_case)
        if correct:
            passed += 1
        else:
            failed += 1
    if failed == 0:
        print("============= PASS ==============")
    else:
        print("============= FAIL ==============")
    if skipped > 0:
        print(f"{passed} passed, {failed} failed, {skipped} skipped")
    else:
        print(f"{passed} passed, {failed} failed")


test_cases = submit_cases
if "__RUN__" in globals():
    test_cases = run_cases

main()


Inserting (Billy Beane, General Manager)...
Inserting (Peter Brand, Assistant GM)...
Getting Billy Beane...
Expected: General Manager, Actual: General Manager
Getting Peter Brand...
Expected: Assistant GM, Actual: Assistant GM
Pass
Inserting (Art Howe, Manager)...
Inserting (Ron Washington, Coach)...
Inserting (David Justice, Designated Hitter)...
Getting Art Howe...
Expected: Manager, Actual: Manager
Getting Ron Washington...
Expected: Coach, Actual: Coach
Getting David Justice...
Expected: Designated Hitter, Actual: Designated Hitter
Pass
Inserting (Paul DePodesta, Analyst)...
Inserting (Ron Washington, Coach)...
Inserting (Chad Bradford, Pitcher)...
Expected error occurred: hashmap is full
Getting Paul DePodesta...
Expected: Analyst, Actual: Analyst
Getting Ron Washington...
Expected: Coach, Actual: Coach
Pass
3 passed, 0 failed
