In [1]:
!pip install jovian --upgrade --quiet

In [2]:
import jovian

## Problem Statement - Python Dictionaries and Hash Tables

In this assignment, you will recreate Python dictionaries from scratch using data structure called *hash table*. Dictionaries in Python are used to store key-value pairs. Keys are used to store and retrieve values. For example, here's a dictionary for storing and retrieving phone numbers using people's names. 

In [3]:
phone_numbers = {
  'Aakash' : '9489484949',
  'Hemanth' : '9595949494',
  'Siddhant' : '9231325312'
}
phone_numbers

{'Aakash': '9489484949', 'Hemanth': '9595949494', 'Siddhant': '9231325312'}

You can access a person's phone number using their name as follows:

In [4]:
phone_numbers['Aakash']

'9489484949'

You can store new phone numbers, or update existing ones as follows:

In [5]:
# Add a new value
phone_numbers['Vishal'] = '8787878787'
# Update existing value
phone_numbers['Aakash'] = '7878787878'
# View the updated dictionary
phone_numbers

{'Aakash': '7878787878',
 'Hemanth': '9595949494',
 'Siddhant': '9231325312',
 'Vishal': '8787878787'}

You can also view all the names and phone numbers stored in `phone_numbers` using a `for` loop.

In [6]:
for name in phone_numbers:
    print('Name:', name, ', Phone Number:', phone_numbers[name])

Name: Aakash , Phone Number: 7878787878
Name: Hemanth , Phone Number: 9595949494
Name: Siddhant , Phone Number: 9231325312
Name: Vishal , Phone Number: 8787878787


In [7]:
class HashTable:
    def insert(self, key, value):
        """Insert a new key-value pair"""
        pass
    
    def find(self, key):
        """Find the value associated with a key"""
        pass
    
    def update(self, key, value):
        """Change the value associated with a key"""
        pass
    
    def list_all(self):
        """List all the keys"""
        pass

### Data List

We'll build the `HashTable` class step-by-step. As a first step is to create a Python list which will hold all the key-value pairs. We'll start by creating a list of a fixed size.



In [8]:
MAX_HASH_TABLE_SIZE = 4096

**QUESTION 1: Create a Python list of size `MAX_HASH_TABLE_SIZE`, with all the values set to `None`.**

_Hint_: Use the [`*` operator](https://stackoverflow.com/questions/3459098/create-list-of-single-item-repeated-n-times).

In [9]:
# List of size MAX_HASH_TABLE_SIZE with all values None
data_list = [None] * MAX_HASH_TABLE_SIZE

If the list was created successfully, the following cells should output `True`.

In [10]:
len(data_list) == 4096

True

In [11]:
data_list[99] == None

True

### Hashing Function

A _hashing function_ is used to convert strings and other non-numeric data types into numbers, which can then be used as list indices. For instance, if a hashing function converts the string `"Aakash"` into the number `4`, then the key-value pair `('Aakash', '7878787878')` will be stored at the position `4` within the data list.

Here's a simple algorithm for hashing, which can convert strings into numeric list indices.

1. Iterate over the string, character by character
2. Convert each character to a number using Python's built-in `ord` function.
3. Add the numbers for each character to obtain the hash for the entire string 
4. Take the remainder of the result with the size of the data list


**QUESTION 2: Complete the `get_index` function below which implements the hashing algorithm described above.**

In [12]:
def get_index(data_list, a_string):
    # Variable to store the result (updated after each iteration)
    result = 0
    
    for a_character in a_string:
        # Convert the character to a number (using ord)
        a_number = ord(a_character)
        # Update result by adding the number
        result += a_number
    
    # Take the remainder of the result with the size of the data list
    list_index = result % len(data_list)
    return list_index

If the `get_index` function was defined correctly, the following cells should output `True`.

In [13]:
get_index(data_list, '') == 0

True

In [14]:
get_index(data_list, 'Aakash') == 585

True

In [15]:
get_index(data_list, 'Don O Leary') == 941

True

#### Insert

To insert a key-value pair into a hash table, we can simply get the hash of the key, and store the pair at that index in the data list.

In [16]:
key, value = 'Aakash', '7878787878'

In [17]:
idx = get_index(data_list, key)
idx

585

In [18]:
data_list[idx] = (key, value)

Here's the same operation expressed in a single line of code.

In [19]:
data_list[get_index(data_list, 'Hemanth')] = ('Hemanth', '9595949494')

#### Find

The retrieve the value associated with a pair, we can get the hash of the key and look up that index in the data list.

In [20]:
idx = get_index(data_list, 'Aakash')
idx

585

In [21]:
key, value = data_list[idx]
value

'7878787878'

#### List

To get the list of keys, we can use a simple [list comprehension](https://www.w3schools.com/python/python_lists_comprehension.asp).

In [22]:
pairs = [kv[0] for kv in data_list if kv is not None]

In [23]:
pairs

['Aakash', 'Hemanth']

Let's save our work before continuing.

### Basic Hash Table Implementation

We can now use the hashing function defined above to implement a basic hash table in Python.


**QUESTION 3: Complete the hash table implementation below by following the instructions in the comments.**

_Hint_: Insert and update can have identical implementations.


In [24]:
class BasicHashTable:
    def __init__(self, max_size=MAX_HASH_TABLE_SIZE):
        # 1. Create a list of size `max_size` with all values None
        self.data_list = [None] * max_size
     
    
    def insert(self, key, value):
        # 1. Find the index for the key using get_index
        idx = get_index(self.data_list, key)
        
        # 2. Store the key-value pair at the right index
        self.data_list[idx] = (key, value)
    
    
    def find(self, key):
        # 1. Find the index for the key using get_index
        idx = get_index(self.data_list, key)
        
        # 2. Retrieve the data stored at the index
        kv = self.data_list[idx]
        
        # 3. Return the value if found, else return None
        if kv is None:
            return None
        else:
            key, value = kv
            return value
    
    
    def update(self, key, value):
        # 1. Find the index for the key using get_index
        idx = get_index(self.data_list, key)
        
        # 2. Store the new key-value pair at the right index
        self.data_list[idx] = (key, value)

    
    def list_all(self):
        # 1. Extract the key from each key-value pair 
        return [kv[0] for kv in self.data_list if kv is not None]

If the `BasicHashTable` class was defined correctly, the following cells should output `True`.

In [25]:
basic_table = BasicHashTable(max_size=1024)
len(basic_table.data_list) == 1024

True

In [26]:
# Insert some values
basic_table.insert('Aakash', '9999999999')
basic_table.insert('Hemanth', '8888888888')

# Find a value
basic_table.find('Hemanth') == '8888888888'

True

In [27]:
# Update a value
basic_table.update('Aakash', '7777777777')

# Check the updated value
basic_table.find('Aakash') == '7777777777'

True

In [28]:
# Get the list of keys
basic_table.list_all() == ['Aakash', 'Hemanth']

True

### Handling Collisions with Linear Probing

As you might have wondered, multiple keys can have the same hash. For instance, the keys `"listen"` and `"silent"` have the same hash. This is referred to as _collision_. Data stored against one key may override the data stored against another, if they have the same hash.


In [29]:
basic_table.insert('listen', 99)

In [30]:
basic_table.insert('silent', 200)

In [31]:
basic_table.find('listen')

200

As you can see above, the value for the key `listen` was overwritten by the value for the key `silent`. Our hash table implementation is incomplete because it does not handle collisions correctly.

To handle collisions we'll use a technique called linear probing. Here's how it works: 

1. While inserting a new key-value pair if the target index for a key is occupied by another key, then we try the next index, followed by the next and so on till we the closest empty location.

2. While finding a key-value pair, we apply the same strategy, but instead of searching for an empty location, we look for a location which contains a key-value pair with the matching key.

2. While updating a key-value pair, we apply the same strategy, but instead of searching for an empty location, we look for a location which contains a key-value pair with the matching key, and update its value.


We'll define a function called `get_valid_index`, which starts searching the data list from the index determined by the hashing function `get_index` and returns the first index which is either empty or contains a key-value pair matching the given key.

**QUESTION 4: Complete the function `get_valid_index` below by following the instructions in the comments.**

In [32]:
def get_valid_index(data_list, key):
    # Start with the index returned by get_index
    idx = get_index(data_list, key)
    
    while True:
        # Get the key-value pair stored at idx
        kv = data_list[idx]
        
        # If it is None, return the index
        if kv is None:
            return idx
        
        # If the stored key matches the given key, return the index
        k, v = kv
        if k == key:
            return idx
        
        # Move to the next index
        idx += 1
        
        # Go back to the start if you have reached the end of the array
        if idx == len(data_list):
            idx = 0

If `get_valid_index` was defined correctly, the following cells should output `True`.

In [33]:
# Create an empty hash table
data_list2 = [None] * MAX_HASH_TABLE_SIZE

# New key 'listen' should return expected index
get_valid_index(data_list2, 'listen') == 655

True

In [34]:
# Insert a key-value pair for the key 'listen'
data_list2[get_index(data_list2, 'listen')] = ('listen', 99)

# Colliding key 'silent' should return next index
get_valid_index(data_list2, 'silent') == 656

True

### Hash Table with Linear Probing

We can now implement a hash table with linear probing.

**QUESTION 5: Complete the hash table (with linear probing) implementation below by following the instructions in the comments.**

In [35]:
class ProbingHashTable:
    def __init__(self, max_size=MAX_HASH_TABLE_SIZE):
        # 1. Create a list of size `max_size` with all values None
        self.data_list = [None] * max_size
     
    
    def insert(self, key, value):
        # 1. Find the index for the key using get_valid_index
        idx = get_valid_index(self.data_list, key)
        
        # 2. Store the key-value pair at the right index
        self.data_list[idx] = (key, value)
    
    
    def find(self, key):
        # 1. Find the index for the key using get_valid_index
        idx = get_valid_index(self.data_list, key)
        
        # 2. Retrieve the data stored at the index
        kv = self.data_list[idx]
        
        # 3. Return the value if found, else return None
        return None if kv is None else kv[1]
    
    
    def update(self, key, value):
        # 1. Find the index for the key using get_valid_index
        idx = get_valid_index(self.data_list, key)
        
        # 2. Store the new key-value pair at the right index
        self.data_list[idx] = (key, value)

    
    def list_all(self):
        # 1. Extract the key from each key-value pair 
        return [kv[0] for kv in self.data_list if kv is not None]

If the `ProbingHashTable` class was defined correctly, the following cells should output `True`.

In [36]:
# Create a new hash table
probing_table = ProbingHashTable()

# Insert a value
probing_table.insert('listen', 99)

# Check the value
probing_table.find('listen') == 99

True

In [37]:
# Insert a colliding key
probing_table.insert('silent', 200)

# Check the new and old keys
probing_table.find('listen') == 99 and probing_table.find('silent') == 200

True

In [38]:
# Update a key
probing_table.insert('listen', 101)

# Check the value
probing_table.find('listen') == 101

True

In [39]:
probing_table.list_all() == ['listen', 'silent']

True

In [None]:
jovian.commit()

<IPython.core.display.Javascript object>