# Hashtables

## Agenda

1. The **Map** ADT
2. Direct lookups via *Hashing*
3. Hashtables
4. `class` HashTable

## The **Map** ADT

We will focus next on the "*map*" abstract data type (aka "associative array" or "dictionary"), which is used to associate keys (which must be unique) with values. 

A map *does not* intrinsically impose any ordering on its contents --- i.e., an implementation of a map does not need to support positional access to keys, nor report a consistent view of key order.

Python's `dict` type is an implementation of the map ADT. 

## Direct lookups via *Hashing*

Hashes (a.k.a. hash codes or hash values) are simply numerical values computed for objects.

In [None]:
hash('hello')

In [None]:
hash('batman')

In [None]:
hash('batmen') 

In [None]:
[hash(s) for s in ['different', 'objects', 'have', 'very', 'different', 'hashes']]

In [None]:
[hash(s)%100 for s in ['different', 'objects', 'have', 'very', 'different', 'hashes']]

### Random Hashing

The `hash` function in Python is *randomized* by default -- i.e., each time a Python interpreter is fired up, the implementation of `hash` will use a different "seed" for the random number generator used in computing hashes. While hashcodes computed for a given value will be consistent for a given interpreter instance, they will not be across instances! This means we shouldn't save hashcodes for values to disk, or save them to a database, as values will almost certainly hash to different hashcodes after we restart our software!

## Hashtables

A **hashtable** is an implementation of the map ADT that uses the hash function for a key to compute an index into an linked-list where the corresponding key/value pair will be stored. 

In [None]:
# Creating Hashtable as a nested list.
HashTable = [[] for _ in range(10)]
  
# Hashing Function to return key for every value.
def __hash__(key):
    return key % len(HashTable)

def __setitem__(Hashtable, key, value):
    hash_key = __hash__(key)
    Hashtable[hash_key].append(value)

def __getitem__(Hashtable, key):
    hash_key = __hash__(key)
    return HashTable[hash_key]

def __delitem__(Hashtable, key, value):
    hash_key = __hash__(key)
    Hashtable[hash_key].remove(value)

def __display__(hashTable):
    for i in range(len(hashTable)):
        print(i, end = " ")          
        for j in hashTable[i]:
            print("-->", end = " ")
            print(j, end = " ")             
        print()

In [None]:
__setitem__(HashTable, 10, 'New York')
__setitem__(HashTable, 25, 'Chicago')
__setitem__(HashTable, 20, 'Boston')
__setitem__(HashTable, 9, 'Los Angeles')
__setitem__(HashTable, 21, 'Miami')
__setitem__(HashTable, 21, 'Austin')
  
__display__(HashTable)

In [None]:
__delitem__(HashTable, 10, 'New York')
__display__(HashTable)

In [None]:
__getitem__(HashTable, 1)

## Addendum: On *Hashability*

Remember: *a given object must always hash to the same value*. This is required so that we can always map the object to the same hash bucket.

Hashcodes for collections of objects are usually computed from the hashcodes of its contents, e.g., the hash of a tuple is a function of the hashes of the objects in said tuple:

In [None]:
hash(('two', 'strings'))

This is useful. It allows us to use a tuple, for instance, as a key for a hashtable.

However, if the collection of objects is *mutable* — i.e., we can alter its contents — this means that we can potentially change its hashcode.`

If we were to use such a collection as a key in a hashtable, and alter the collection after it's been assigned to a particular bucket, this leads to a serious problem: the collection may now be in the wrong bucket (as it was assigned to a bucket based on its original hashcode)!

For this reason, only immutable types are, by default, hashable in Python. So while we can use integers, strings, and tuples as keys in dictionaries, lists (which are mutable) cannot be used. Indeed, Python marks built-in mutable types as "unhashable", e.g.,

In [None]:
hash([1, 2, 3])

## `class` HashTable

In [None]:
class HashTable:
    
    class Node:
        def __init__(self, key, value):
            self.key = key
            self.value = value
            self.next = None
            
    def __init__(self, n_buckets):
        self.n_buckets = n_buckets
        self.count = 0
        self.table = [None] * n_buckets
  
    def __hash__(self, key):
        return hash(key) % self.n_buckets
  
    def __setitem__(self, key, value):
        index = self.__hash__(key)
        if self.table[index] is None:
            self.table[index] = HashTable.Node(key, value)
            self.count += 1
        else:
            current = self.table[index]
            while current:
                if current.key == key:
                    current.value = value
                    return
                current = current.next
            new_node = HashTable.Node(key, value)
            new_node.next = self.table[index]
            self.table[index] = new_node
            self.count += 1
  
    def __getitem__(self, key):
        items = []
        index = self.__hash__(key) 
        current = self.table[index]
        while current:
            items.append(current.value)
            current = current.next
        return items
  
    def __delitem__(self, key):
        index = self.__hash__(key)  
        previous = None
        current = self.table[index] 
        while current:
            if current.key == key:
                if previous:
                    previous.next = current.next
                else:
                    self.table[index] = current.next
                self.count -= 1
                return
            previous = current
            current = current.next  
        raise KeyError(key)
  
    def __len__(self):
        return self.count
    
    def __display__(self):
        for i in range(self.n_buckets):
            print(i, end = " ")
            current = self.table[i]       
            while current:
                print("-->", end = " ")
                print(current.value, end = " ")
                current = current.next
            print()

In [None]:
ht = HashTable(5)

# Create the hash table
ht.__setitem__(3, "apple")
ht.__setitem__(2, "banana")
ht.__setitem__(5, "cherry")
ht.__setitem__(3, "grape")
ht.__setitem__(1, "watermelon")
ht.__setitem__(6, "peach")
ht.__setitem__(9, "avocado")

# Display the hash table
ht.__display__()

In [None]:
#Get the value for a key
print(ht.__getitem__(1))

# Update the value for a key
ht.__setitem__(4, "apple")
print(ht.__getitem__(4))

# Delete an entry
ht.__delitem__(3)

# Display the hash table
ht.__display__()

In [None]:
ht = HashTable(5)

# Create the hash table
ht.__setitem__("apple", 3)
ht.__setitem__("banana", 2)
ht.__setitem__("cherry", 5)
ht.__setitem__("grape", 6)
ht.__setitem__("watermelon", 8)
ht.__setitem__("peach", 1)
ht.__setitem__("cherry", 9)

# Display the hash table
ht.__display__()

In [None]:
print(hash("apple"))
print(hash("banana"))
print(hash("cherry"))
print(hash("grape"))
print(hash("watermelon"))
print(hash("peach"))