#### Hash Tables

A hash table is a data structure which maps keys to values for highly efficient lookup. In a good implementation, the lookup time is **O(1)**, however it can have a worst case of **O(N)** in the case that there are many collisions.

A simple implementation of a hash table can involve an array of linked lists and a hash code function, as follows:

1. Compute the key's hash code (which will usually be an int or a long) using the **hash function**. Note, different keys could have the same hash code due to an infinite number of keys and a finite number of ints.
2. Map the hashcode to the index of an array. This could be done with something like hash(key) % array_max_length. Different hash codes could map to the same index.
3. At this index, there is a linked list of keys and values. We use a linked list to account for collisions.

<img src="assets/hash-table.png" width="400">

An implementation of a hash table is shown below.

In [77]:
class HashTable:
    def __init__(self):
        self.ARRAY_LENGTH = 10
        self.array = [None] * self.ARRAY_LENGTH

    def add(self, key: str, val: float):
        idx = self.__hash_function(key)
        if self.array[idx] == None:
            self.array[idx] = LinkedList()
        self.array[idx].insert(key, val) 
    
    def get(self, key: str):
        idx = self.__hash_function(key)
        val = self.array[idx].find(key)
        
        if val is None:
            raise KeyError("Key was not found!")
        else:
            return val
    
    def __hash_function(self, string: str) -> int: 
        hash_val = 0
        for i, ch in enumerate(string):
            hash_val += (i + len(string)) ** ord(ch)
        # Perform modulus to stay in range of max length
        return hash_val % self.ARRAY_LENGTH
    
    def __str__(self):
        s = ""
        for i in range(len(self.array)):
            s += f"{i}: {self.array[i]}\n"
        return s

class LinkedList:
    def __init__(self):
        self.head = None
        self.curr = None
    
    def insert(self, key: str, val: float):
        if self.head is None:
            self.head = self.Node(key, val)
            self.curr = self.head
        else:
            self.curr.next = self.Node(key, val)
            self.curr = self.curr.next
    
    def find(self, key: str):
        if self.head is None:
            return self.head
        
        curr = self.head
        while True:
            if curr.key == key:
                return curr.val
            if curr.next is None:
                return curr.next
            curr = curr.next
    
    def __str__(self):
        s = ""
        if self.head is None:
            return s
        
        curr = self.head
        while True:
            s += f"({curr.key}, {curr.val})"
            
            if curr.next is None:
                return s
            
            s += ", "
            curr = curr.next
    
    class Node: 
        def __init__(self, key: str, val: float):
            self.key = key
            self.val = val
            self.next: Node = None

In [84]:
import requests
import random

hash_table = HashTable()

content = requests.get("https://www.mit.edu/~ecprice/wordlist.10000").content
words = content.splitlines()

for i, word in enumerate(words[:100]):
    # Convert from byte to string and store in hash table
    hash_table.add(word.decode("utf-8"), i)

print(hash_table)
print(hash_table.get("accountability"))

0: (ability, 9), (absolute, 20), (accommodate, 48), (accounts, 61), (achievement, 73), (acquisition, 85), (acrylic, 91)
1: (a, 0), (ab, 4), (abandoned, 5), (aboriginal, 11), (accent, 32), (accept, 33), (access, 39), (accessible, 42), (accessory, 45), (accidents, 47), (accommodation, 49), (accomplish, 53), (accounting, 60), (accurately, 66)
2: (aaa, 2), (able, 10), (absence, 18), (academic, 28), (acc, 31), (accepts, 38), (accessed, 40), (accompanied, 51), (accordingly, 57), (accused, 67), (ace, 69), (acm, 80), (acne, 81), (acre, 87), (act, 92), (actions, 95)
3: (aaron, 3), (about, 13), (absorption, 22), (abstracts, 24), (acceptable, 34), (accepting, 37), (accessibility, 41), (accordance, 55), (accredited, 63), (acdbentity, 68), (acids, 77), (across, 90)
4: (abc, 6), (abs, 17), (abu, 25), (academy, 30), (accepted, 36), (accomplished, 54), (account, 58), (accuracy, 64), (acer, 70), (acknowledge, 78), (acrobat, 89)
5: (aa, 1), (absolutely, 21), (achieving, 75), (acting, 93), (action, 94), 

#### ArrayList & Resizable Arrays

In certain languages, when you need array-like data structures with dynamic resizing, you usually use an ArrayList. An ArrayList resizes itself as needed while still providing **O(1)** access. Each resizing takes O(n) time but due to **amortized** insertion time is still O(1).

Typically, when the array is full, the array doubles in size (although in some languages like Java the size might increase by 50% or another value).

An implementation of an ArrayList is shown below.


In [9]:
class ArrayList:
    def __init__(self):
        self.MAX_LENGTH = 10
        self.size = 0
        self._array = [None] * self.MAX_LENGTH
    
    def add(self, item):
        if self.size >= self.MAX_LENGTH:
            self.__grow()
        self._array[self.size] = item
        self.size += 1
    
    def get(self, index):
        if index >= self.size or index < 0:
            raise IndexError("List index out of range")
        return self.array[index]
    
    def delete(self, index):
        if index >= self.size or index < 0:
            raise IndexError("List index out of range")
        
        for i in range(index, self.size):
            self._array[i] = self._array[i+1]
        
        self.size -= 1
    
    def __grow(self):
        self.MAX_LENGTH *= 2
        new_array = [None] * self.MAX_LENGTH
        for i, item in enumerate(self._array):
            new_array[i] = item
        self._array = new_array
        print(f"New array length is {len(self._array)}")
    
    @property
    def array(self):
        """Getter for array."""
        return self._array[:self.size]

In [10]:
array_list = ArrayList()

for i in range(90):
    array_list.add(i)

print(array_list.get(0), array_list.get(89))
print(array_list.get(90))

New array length is 20
New array length is 40
New array length is 80
New array length is 160
0 89


IndexError: List index out of range

In [11]:
print(array_list.size)
array_list.delete(3)
print(array_list.size)
print(array_list.array)

90
89
[0, 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89]


#### StringBuilder

Assuming all strings are of an equal length x, concatenating a list of strings would take O($xn^{2}$) time using the algorithm below: 

```
String joinWords(String[] words) {
    String sentence = "";
    for (String w : words) {
        sentence = sentence + w;
    }
}
```

On each concatenation, a new copy of the string is created and the two strings are copied over character by character. The amount of copied characters is O(x + 2x + 3x + ... + nx) = O($xn^{2}$).

Alternatively, a StringBuilder creates a resizable array of all the strings, and copies them back to a string **only when necessary**.

```
String joinWords(String[] words) {
    StringBuilder sentence = new StringBuilder();
    for (String w : words) {
        sentence.append(w);
    }
    return sentence.toString();
}
```

A StringBuilder is basically an ArrayList for strings with a toString() method for concatenating the