<!--![pythonLogo.png](attachment:pythonLogo.png)-->

# 09 Hashing


## Plan for the Lecture 

* Introduction to Hashing and ASCII

* Hashing algorithms

* Collisions

In [3]:
ord('A')

65

In [2]:
ord('a')

97

In [19]:
name = 'Nick'
sum = 0
for char in name: 
    print(char, ":", ord(char))
    sum += ord(char)
print("ASCII sum for", name, "=", sum)

N : 78
i : 105
c : 99
k : 107
ASCII sum for Nick = 389


In [20]:
name = "nick"
sum = 0
for char in name: 
    print(char, ":", ord(char))
    sum += ord(char)
print("ASCII sum for", name, "=", sum)

n : 110
i : 105
c : 99
k : 107
ASCII sum for nick = 421


In [21]:
"Nick" == "nick"

False

In [18]:
"Nick" == "Nick"

True

## Python dictionaries (`dict`) use hashing

* Dictionaries use hashing for their keys to allow efficient lookups, but the values themselves are stored as they are. 

* The internal hashing of dictionary keys is handled by Python’s hashing mechanism and is not exposed directly.

In [22]:
d = {"Nick" : 56, "Sam": 67, "Lucy": 61, "Tino": 71}
d

{'Nick': 56, 'Sam': 67, 'Lucy': 61, 'Tino': 71}

In [31]:
d["Nick"]

56

In [42]:
hash("Nick")

-8040880066191665653

In [36]:
hashes = {key: hash(key) for key in d}
print(hashes)

{'Nick': -8040880066191665653, 'Sam': -7729753638879594802, 'Lucy': 6614073294121365487, 'Tino': -8816718471250138643}


In [38]:
d[8040880066191665653]

KeyError: 8040880066191665653

# Hash Maps

* Hash Maps allow users to search a data structure via characters instead of integer index positions. 

* A HashMap is similar to a dictionary - it operates on a key and value pairing. But we can build a HashMap class to manage the 'hashing' function. 

* The characters (string) are converted to integers by a process of 'hashing'. This hashing function will look up the ASCII value of each character. 

* Each integer value is then multiplied by its position in the string to achieve a unique number. These are added together to create a large number. 

* To prevent having to allocate a large number of elements, the modulo sign (%) calculates a smaller number to reduce the size of elements.



![hash](https://d18l82el6cdm1i.cloudfront.net/uploads/34EvJ7agjl-hash_table.gif)

In [None]:
class HashMap:
  def __init__(self, array_size):
    self.array_size = array_size
    self.array = [None for item in range(array_size)]

  def hash(self, key):
    key_bytes = key.encode()
    hash_code = sum(key_bytes)
    return hash_code

  def compressor(self, hash_code):
    return hash_code % self.array_size

  def assign(self, key, value):
    array_index = self.compressor(self.hash(key))
    self.array[array_index] = value

  def retrieve(self, key):
    array_index = self.compressor(self.hash(key))
    return self.array[array_index]

In [None]:
hash_map = HashMap(20)
hash_map.assign('gneiss', 'metamorphic')
print(hash_map.retrieve('gneiss'))

# Handling Collisions - Open Addressing

With a reduced number of spaces available, hashing methods will return the same remainder for some modulus operations. This means that two (or more) items will be competing for the same element in the data structure.

There are regarded to be two approaches to resolving collisions: 
- Open Addressing 
- Separate Chaining

The Open Addressing technique works by finding an available space in the data structrue and placing the colliding element in this available space.

In [None]:
class HashMap:
  def __init__(self, array_size):
    self.array_size = array_size
    self.array = [None for item in range(array_size)]

  def hash(self, key, count_collisions=0):
    key_bytes = key.encode()
    hash_code = sum(key_bytes)
    return hash_code + count_collisions

  def compressor(self, hash_code):
    return hash_code % self.array_size

  def assign(self, key, value):
    array_index = self.compressor(self.hash(key))
    current_array_value = self.array[array_index]

    if current_array_value is None:
      self.array[array_index] = [key, value]
      return

    if current_array_value[0] == key:
      self.array[array_index] = [key, value]
      return

    # Collision!

    number_collisions = 1

    while(current_array_value[0] != key):
      new_hash_code = self.hash(key, number_collisions)
      new_array_index = self.compressor(new_hash_code)
      current_array_value = self.array[new_array_index]

      if current_array_value is None:
        self.array[new_array_index] = [key, value]
        return

      if current_array_value[0] == key:
        self.array[new_array_index] = [key, value]
        return

      number_collisions += 1

    return

  def retrieve(self, key):
    array_index = self.compressor(self.hash(key))
    possible_return_value = self.array[array_index]

    if possible_return_value is None:
      return None

    if possible_return_value[0] == key:
      return possible_return_value[1]

    retrieval_collisions = 1

    while (possible_return_value != key):
      new_hash_code = self.hash(key, retrieval_collisions)
      retrieving_array_index = self.compressor(new_hash_code)
      possible_return_value = self.array[retrieving_array_index]

      if possible_return_value is None:
        return None

      if possible_return_value[0] == key:
        return possible_return_value[1]

      number_collisions += 1

    return

In [None]:
hash_map = HashMap(15)
hash_map.assign('gabbro', 'igneous')
hash_map.assign('sandstone', 'sedimentary')
hash_map.assign('gneiss', 'metamorphic')
print(hash_map.retrieve('gabbro'))
print(hash_map.retrieve('sandstone'))
print(hash_map.retrieve('gneiss'))

## Formative Exercises ##

Insert a 'code' cell below. In this do the following:

- 1 - Now instantiate the HashMap class above. Create a Person class that has attributes for first_name, second_name and phone_number, and appropriate methods that get, set and print values for these attributes. Now instantiate the HashMap class and add Person objects to the HashMap.
- 2 - Extend the hash_map class to create a linked list in each element position where there is a collision. A colliding item should be added to the end of a linked list for that index position.


C++ EXERCISES to ADAPT: 

*/

* Exercise 1: Set up the hash table
* 
* In Main above, replace the comments with code to set up the hashTable. 
* Then complete the hash_ function below which uses the 'division method' 
* to return an unique index value based on the key and tablesize passed in.
* Once ready, in main, call the 'getPhoneNumber()' function, which for this example, will act as the 'key' (and value).
* Then pass this 'key' to the hash_ function (along with tableSize) to return a unique index in which to store the phone number. 
* You should then assign the phone number to the hashTable in main at the index returned by hash_.
* 
* Check this has been assigned correctly by uncommenting the call to the hash_ function within the subscript operator of the hashTable. 
*  
*/

/*
* Exercise 2: String hashing
* So now, let's choose a more meaningful key - a string which represents a name of a contact
* You should see an overloaded 'hash_()' function below which takes a string as the key, in addition to the tableSize.
* In this function, code an algorithm that will formulate a unique index position 
* based up on the ASCII values of each character in the string key.
* Return this unique index and test that you can add a phone number for a name of a person to the HashTable, as well as retrieve it.
* 
* Extension: Try adding a handful of names and phone numbers checking you can retrieve the right number for the right person. 
*/






/*
* Exercise 3: Open Addressing with probing techniques 
* To illustrate a collision, in main above, attempt to hash an identical name. This should return an identical index value.
* You should notice that a new random phone number has been assigned, overwriting the previously stored number. 
* One method to resolve collisions like this would be to use open addressing. 
* You could amend the two hash functions that you have, or you could code a strategy in a function below, which is then called in both hash_ functions
* Try linear, quadratic and prime probing strategies.
* 
* In main, test that you can retrieve the right number for the adjusted position 
* calculated by the probing strategies.
*/


/* Exercise 4: Closed chaining 
* The other approach to resolving collisions could be to create a structured within the hashTable itself. 
* Now, because we've set up a array of primitive integers, it's not going to take 
* Therefore, consider how you could set up a wrapper class for your 'HashTable' which makes use of your LinkedList class
* or alternatively to a LinkedList, you could use a vector instead. 
*
* Add as many classes and/or functions as you need.  
*
* Question: what are the benefits and drawbacks of each approach? Which situations are they most suitable? 
*/



![hashseparatechaining](http://algs4.cs.princeton.edu/34hash/images/separate-chaining.png)
