# Assignment 2 - Hash Tables in Python

_This assignment is a part of the course ["Data Structures and Algorithms in Python"](https://jovian.ai/learn/data-structures-and-algorithms-in-python)._

In this assignment, you will re-implement Python dictionaries from scratch using hash tables. As you go through this notebook, you will find a **???** in certain places. To complete this assignment, you must replace all the **???** with appropriate values, expressions or statements to ensure that the notebook runs properly end-to-end. 

**Guidelines**

1. Make sure to run all the code cells, otherwise you may get errors like `NameError` for undefined variables.
2. Do not change variable names, delete cells or disturb other existing code. It may cause problems during evaluation.
3. In some cases, you may need to add some code cells or new statements before or after the line of code containing the **???**. 
4. Since you'll be using a temporary online service for code execution, save your work by running `jovian.commit` at regular intervals.
5. Questions marked **(Optional)** will not be considered for evaluation, and can be skipped. They are for your learning.
6. If you are stuck, you can ask for help on the [community forum](https://jovian.ai/forum/c/data-structures-and-algorithms-in-python/assignment-2/88). You can get help with errors or ask for hints, but **please don't ask for OR share the full working answer code** on the forum.
7. There are some tests included with this notebook to help you test your implementation. However, after submission your code will be tested with some hidden test cases. Make sure to test your code exhaustively to cover all edge cases.


**Important Links**

* Submit your work here: https://jovian.ai/learn/data-structures-and-algorithms-in-python/assignment/assignment-2-hash-table-and-python-dictionaries
* Ask questions and get help: https://jovian.ai/forum/c/data-structures-and-algorithms-in-python/assignment-2/88
* Lesson 2 video for review: https://jovian.ai/aakashns/python-binary-search-trees
* Lesson 2 notebook for review: https://jovian.ai/aakashns/python-binary-search-trees





### How to Run the Code and Save Your Work

**Option 1: Running using free online resources (1-click, recommended)**: Click the **Run** button at the top of this page and select **Run on Binder**. You can also select "Run on Colab" or "Run on Kaggle", but you'll need to create an account on [Google Colab](https://colab.research.google.com) or [Kaggle](https://kaggle.com) to use these platforms.


**Option 2: Running on your computer locally**: To run the code on your computer locally, you'll need to set up [Python](https://www.python.org) & [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/), download the notebook and install the required libraries. Click the **Run** button at the top of this page, select the **Run Locally** option, and follow the instructions.

**Saving your work**: You can save a snapshot of the assignment to your [Jovian](https://jovian.ai) profile, so that you can access it later and continue your work. Keep saving your work by running `jovian.commit` from time to time.

In [1]:
project='python-hash-tables-assignment'

In [3]:
!pip install jovian --upgrade --quiet

In [7]:
import jovian
jovian.commit(project=project, privacy='secret', environment=None)

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "aakashns/python-hash-tables-assignment" on https://jovian.ai/[0m
[jovian] Uploading notebook..[0m
[jovian] Committed successfully! https://jovian.ai/aakashns/python-hash-tables-assignment[0m


'https://jovian.ai/aakashns/python-hash-tables-assignment'

## Problem Statement - Hash Tables

In this assignment, you will recreate Python dictionaries from scratch using data structure called *hash table*. Dictionaries in Python are used to store key-value pairs. Keys are used to store and retrieve values. For example, here's a dictionary for storing and retrieving phone numbers using people's names. 

In [1]:
phone_numbers = {
  'Aakash' : '9489484949',
  'Hemanth' : '9595949494',
  'Siddhant' : '9231325312'
}

In [2]:
phone_numbers

{'Aakash': '9489484949', 'Hemanth': '9595949494', 'Siddhant': '9231325312'}

You can access a person's phone number using their name as follows:

In [3]:
phone_numbers['Aakash']

'9489484949'

In [6]:
phone_numbers['aakash']

KeyError: 'aakash'

You can store new phone numbers, or update existing ones as follows:

In [9]:
phone_numbers['Vishal'] = '8787878787'

In [10]:
phone_numbers['Aakash'] = '7878787878'

In [12]:
phone_numbers

{'Aakash': '7878787878',
 'Hemanth': '9595949494',
 'Siddhant': '9231325312',
 'Vishal': '8787878787'}

You can also view all the names and phone numbers stored in `phone_numbers` using a `for` loop.

In [13]:
for name in phone_numbers:
    print('Name:', name, ', Phone Number:', phone_numbers[name])

Name: Aakash , Phone Number: 7878787878
Name: Hemanth , Phone Number: 9595949494
Name: Siddhant , Phone Number: 9231325312
Name: Vishal , Phone Number: 8787878787


Dictionaries in Python are implemented using Hash Tables. A hash table uses a list/array to store the key-value pairs, and uses a _hashing function_ to determine the index for storing or retrieving the data associated with a given key. Here's a visual representation:

<img src="https://i.imgur.com/5dPEmuY.png" width="480">

Your object is to implement a `HashTable` class which supports the following operations:

1. **Insert**: Insert a new key-value pair
2. **Find**: Find the value associated with a key
3. **Update**: Update the value associated with a key
5. **List**: List all the keys stored in the hash table

The `HashTable` class will have the following structure (note the function signatures):

In [14]:
class HashTable:
    def insert(self, key, value):
        """Insert a new key-value pair"""
        pass
    
    def find(self, key):
        """Find the value associated with a key"""
        pass
    
    def update(self, key, value):
        """Change the value associated with a key"""
        pass
    
    def list_all(self):
        """List all the keys"""
        pass

Let's save our work before continuing.

In [None]:
jovian.commit(project=project)

### Data List

We'll build the `HashTable` class step-by-step. The first step is to create a Python list which will hold all the key-value pairs. We'll start by creating a list of a fixed size.



In [11]:
MAX_HASH_TABLE_SIZE = 4096

**QUESTION 1: Create a Python list of size `MAX_HASH_TABLE_SIZE`, with all the values set to `None`.**

_Hint_: Use the [`*` operator](https://stackoverflow.com/questions/3459098/create-list-of-single-item-repeated-n-times).

In [12]:
# List of size MAX_HASH_TABLE_SIZE with all values None
data_list = [None]*MAX_HASH_TABLE_SIZE

In [24]:
data_list

[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,

If the list was created successfully, the following cells should output `True`.

In [25]:
len(data_list) == 4096

True

In [26]:
data_list[99] == None

True

In [28]:
for i in data_list:
    assert i==None

Let's save our work before continuing.

In [None]:
jovian.commit(project=project)

### Hashing Function

A _hashing function_ is used to convert strings and other non-numeric data types into numbers, which can then be used as list indices. For instance, if a hashing function converts the string `"Aakash"` into the number `4`, then the key-value pair `'Aakash': '7878787878'` will be stored at the position `4` within the data list.

Here's a simple algorithm for hashing, which can convert strings into numeric list indices.

1. Iterate over the string, character by character
2. Convert each character to a number using the `ord` function.
3. Add the numbers for each character to obtain the hash for the entire string 
4. Take the remainder of the result with the size of the data list


**QUESTION 2: Complete the `get_index` function below which implements the hashing algorithm described above.**

In [13]:
ord('b')

98

In [14]:
hash('b')

-1921694451942413134

In [36]:
z=ord('x')+ord('y')
z

241

In [35]:
z%MAX_HASH_TABLE_SIZE

241

In [6]:
def get_index(data_list, a_string):
    # Variable to store the result (updated after each iteration)
    result = 0
    
    for a_character in a_string:
        # Convert the character to a number
        a_number = ord(a_character)
        # Update result by adding the number
        result += a_number
    
    # Take the remainder of the result with the size of the data list
    list_index = result % len(data_list)# not a global variable like MAX_HASH_TABLE_SIZE
    return list_index

If the `get_index` function was defined correctly, the following cells should output `True`.

In [40]:
get_index(data_list, '') == 0

True

In [159]:
get_index(data_list, 'Aakash') == 585

True

In [160]:
get_index(data_list, 'Don O Leary') == 941

941

In [24]:
get_index(data_list, 'Gowtham')

727

(Optional) Try out the `get_index` function using the empty cells below.

In [33]:
data_list2=[None]*100

In [34]:
get_index(data_list2,'b')

98

To insert a key-value pair into a hash table, we can simply get the hash of the key, and store the pair at that index in the data list.

In [36]:
key, value = 'Aakash', '7878787878'

In [37]:
idx = get_index(data_list, key)
idx

585

In [38]:
data_list[idx] = (key, value)

In [43]:
data_list[585]

('Aakash', '7878787878')

Here's the same operation expressed in a single line of code.

In [78]:
data_list[get_index(data_list, 'Hemanth')] = ('Hemanth', '9595949494')

The retrieve the value associated with a pair, we can get the hash of the key and look up that index in the data list.

In [182]:
idx = get_index(data_list, 'Aakash')
idx

585

In [183]:
key, value = data_list[idx]
value

'7878787878'

To get the list of keys, we can use a simple [list comprehension](https://www.w3schools.com/python/python_lists_comprehension.asp).

In [112]:
kv='x','abc'
kv

('x', 'abc')

In [54]:
s='e',2
list(s)

['e', 2]

In [113]:
kv[0]

'x'

In [57]:
data_list[get_index(data_list,'Hemanth')]=('Hemanth',57654321567)

In [69]:
keys = [kv[0] for kv in data_list if kv is not None]

In [70]:
keys

['Aakash', 'Hemanth']

In [105]:
list1=[1,2,3,4,5,6]
list2=[x*x for x in list1 if x%2==0]
list2

[4, 16, 36]

Let's save our work before continuing.

In [None]:
jovian.commit(project=project)

### Basic Hash Table Implementation

We can now use the hashing function defined above to implement a hash table in Python.


**Question 4: Complete the hash table implementation below by following the instructions in the comments.**

_Hint_: Insert and update may have identical implementations.


In [3]:
class BasicHashTable:
    def __init__(self, max_size==MAX_HASH_TABLE_SIZE):
        # 1. Create a list of size `max_size` with all values None
        self.data_list = [None]*max_size
     
    
    def insert(self, key, value):
        # 1. Find the index for the key using get_index
        idx = get_index(self.data_list, key)
        
        # 2. Store the key-value pair at the right index
        self.data_list[idx] = (key,value)
    
    
    def find(self, key):
        # 1. Find the index for the key using get_index
        idx = get_index(self.data_list, key)
        
        # 2. Retrieve the data stored at the index
        kv = self.data_list[idx]
        
        # 3. Return the value if found, else return None
        if kv is None:
            return None
        else:
            key, value = kv
            return value
    
    
    def update(self, key, value):
        # 1. Find the index for the key using get_index
        idx = get_index(self.data_list, key)
        
        # 2. Store the new key-value pair at the right index
        self.data_list[idx] = (key,value)

    
    def list_all(self):
        # 1. Extract the key from each key-value pair 
        return [kv[0] for kv in self.data_list if kv is not None]

If the `BasicHashTable` class was defined correctly, the following cells should output `True`.

In [4]:
basic_table = BasicHashTable(1000)
len(basic_table.data_list) == 1000

True

In [9]:
# Insert some values
basic_table.insert('Aakash', '9999999999')
basic_table.insert('Hemanth', '8888888888')
basic_table.insert('Gowtham', 7387287278)
# Find a value
basic_table.find('Hemanth') == '8888888888'

True

In [13]:
# Update a value
basic_table.update('Aakash', '7777777777')

# Check the updated value
basic_table.find('Aakash') == '7777777777'

True

In [118]:
basic_table.find('Hemanth') == '8888888888'

True

In [10]:
basic_table.find('Gowtham')

7387287278

In [12]:
# Get the list of keys
basic_table.list_all() #== ['Aakash', 'Hemanth']

['Aakash', 'Hemanth', 'Gowtham']

(Optional) Test your implementation of `BasicHashTable` with some more examples below.

Let's save our work before continuing.

In [None]:
jovian.commit(project=project)

### Handling Collisions with Linear Probing

As you might have wondered, multiple keys can have the same hash. For instance, the keys `"listen"` and `"silent"` have the same hash. This is referred to as _collision_. Data stored against one key may override the data stored against another, if they have the same hash.


In [124]:
get_index(data_list,'listen'),get_index(data_list,'silent')

(655, 655)

In [130]:
basic_table.insert('listen', 99)

In [16]:
basic_table.insert('silent', 200)

In [136]:
basic_table.find('listen')

200

In [137]:
basic_table.list_all()

['Aakash', 'silent', 'Hemanth']

As you can see above, the value for the key `listen` was overwritten by the value for the key `silent`. Our hash table implementation is incomplete because it does not handle collisions correctly.

To handle collisions we'll use a technique called linear probing. Here's how it works: 

1. While inserting a new key-value pair if the target index for a key is occupied by another key, then we try the next index, followed by the next and so on till we the closest empty location.

2. While finding a key-value pair, we apply the same strategy, but instead of searching for an empty location, we look for a location which contains a key-value pair with the matching key.

2. While updating a key-value pair, we apply the same strategy, but instead of searching for an empty location, we look for a location which contains a key-value pair with the matching key, and update its value.


We'll define a function called `get_valid_index`, which starts searching the data list from the index determined by the hashing function `get_index` and returns the first index which is either empty or contains a key-value pair matching the given key.

**Question 5: Complete the function `get_valid_index` below by following the instructions in the comments.**

In [21]:
def get_valid_index(data_list, key):
    # Start with the index returned by get_index
    # idx = get_index(data_list,key)  or
    result=0
    for i in key:
        n=ord(i)
        result+=n
    idx=result % len(data_list)
    
    while True:
        # Get the key-value pair stored at idx
        kv = data_list[idx]
        
        # If it is None, return the index
        if kv is None:
            return idx
        
        # If the stored key matches the given key, return the index
        k, v = kv
        if k==key:
            return idx
        
        # Move to the next index
        idx += 1
        
        # Go back to the start if you have reached the end of the array
        if idx == len(data_list):
            idx = 0

If `get_valid_index` was defined correctly, the above cells should output `True`.

In [14]:
basic_table.list_all()

['Aakash', 'Hemanth', 'Gowtham']

In [26]:
# Create an empty hash table
data_list = [None] * MAX_HASH_TABLE_SIZE

# New key 'listen' should return expected index
get_valid_index(data_list, 'listen') == 655

True

In [27]:
# Insert a key-value pair for the key 'listen'
data_list[get_index(data_list, 'listen')] = ('listen', 99)

# Colliding key 'silent' should return next index
get_valid_index(data_list, 'silent') == 656

True

(Optional) Test your implementation of `get_valid_index` on some more examples using the empty cells below.

In [16]:
MAX_HASH_TABLE_SIZE=1060

In [17]:
data_list = [None] * MAX_HASH_TABLE_SIZE

In [28]:
idx=get_valid_index(data_list,'Gowtham')
data_list[idx]='Gowtham',6888

In [29]:
idx1=get_valid_index(data_list,'maGowth')
idx1

728

In [23]:
index=get_valid_index(data_list,'acb')
index

294

In [162]:
get_valid_index(data_list,'acb')==294

True

In [25]:
data_list[get_valid_index(data_list,'acb')]=('acb', 'valu1')

In [26]:
get_valid_index(data_list,'bac')==295

True

In [None]:
get

Let's save our work before continuing.

In [245]:
jovian.commit(project=project)

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "aakashns/python-hash-tables-assignment" on https://jovian.ai/[0m
[jovian] Uploading notebook..[0m
[jovian] Capturing environment..[0m
[jovian] Committed successfully! https://jovian.ai/aakashns/python-hash-tables-assignment[0m


'https://jovian.ai/aakashns/python-hash-tables-assignment'

### Hash Table with Linear Probing

**Question 6: Complete the hash table (with linear probing) implementation below by following the instructions in the comments.**

In [45]:
class ProbingHashTable:
    def __init__(self, max_size=MAX_HASH_TABLE_SIZE):
        # 1. Create a list of size `max_size` with all values None
        self.data_list = [None]*max_size
     
    
    def insert(self, key, value):
        # 1. Find the index for the key using get_valid_index
        idx = get_valid_index(self.data_list, key)
        
        # 2. Store the key-value pair at the right index
        self.data_list[idx] = (key,value)
    
    
    def find(self, key):
        # 1. Find the index for the key using get_valid_index
        idx = get_valid_index(self.data_list,key)
        
        # 2. Retrieve the data stored at the index
        kv = self.data_list[idx]
        
        # 3. Return the value if found, else return None
        if kv is None:
            print('no value is found for given key')
            return None
        else:
            key, value = kv
            return value
    
    
    def update(self, key, value):
        # 1. Find the index for the key using get_valid_index
        idx = get_valid_index(self.data_list,key)
        
        # 2. Store the new key-value pair at the right index
        self.data_list[idx] = (key, value)

    
    def list_all(self):
        # 1. Extract the key from each key-value pair 
        return [kv[0] for kv in self.data_list if kv is not None]

If the `ProbingHashTable` class was defined correctly, the following cells should output `True`.

In [28]:
# Create a new hash table
probing_table = ProbingHashTable()

# Insert a value
probing_table.insert('listen', 99)

# Check the value
probing_table.find('listen') == 99

True

In [29]:
# Insert a colliding key
probing_table.insert('silent', 200)

# Check the new and old keys
probing_table.find('listen') == 99 and probing_table.find('silent') == 200

True

In [30]:
# Update a key
probing_table.insert('listen', 101)

# Check the value
probing_table.find('listen') == 101

True

In [31]:
probing_table.list_all() == ['listen', 'silent']

True

(Optional) Test your implementation of `ProbingHashTable` using the empty cells below.

In [48]:
pb=ProbingHashTable(100)
pb.insert('Gowtham',4567890)

In [49]:
pb.insert('Gothamw',4567890)

In [50]:
pb.update('Gowtham',456789)

In [51]:
pb.data_list

[None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 ('Gowtham', 456789),
 ('Gothamw', 4567890),
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None,
 None]

In [52]:
pb.find('dkm')

no value is found for given key


Save your work before continuing.

In [280]:
jovian.commit(project=project)

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "aakashns/python-hash-tables-assignment" on https://jovian.ai/[0m
[jovian] Uploading notebook..[0m
[jovian] Capturing environment..[0m
[jovian] Committed successfully! https://jovian.ai/aakashns/python-hash-tables-assignment[0m


'https://jovian.ai/aakashns/python-hash-tables-assignment'

### Make a Submission

Congrats! You have now implemented hash tables from scratch. The rest of this assignment is optional. You can make a submission on this page: https://jovian.ai/learn/data-structures-and-algorithms-in-python/assignment/assignment-2-hash-table-and-python-dictionaries . Submit the link to your Jovian notebook (the output of the previous cell).


You can also make a direct submission by executing the following cell:

In [281]:
jovian.submit(assignment="pythondsa-assignment2")

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m
[jovian] Updating notebook "aakashns/python-hash-tables-assignment" on https://jovian.ai/[0m
[jovian] Uploading notebook..[0m
[jovian] Capturing environment..[0m
[jovian] Committed successfully! https://jovian.ai/aakashns/python-hash-tables-assignment[0m
[jovian] Submitting assignment..[0m


[31m[jovian] Error: Jovian submit failed. (HTTP 400) Not enrolled in the course[0m


### (Optional) Python Dictionaries using Hash Tables

We can now implement Python dictionaries using hash tables. Also, Python provides a built-in function called `hash` which we can use instead of our custom hash function. It is likely to have far fewer collisions

**(Optional) Question: Implement a python-friendly interface for the hash table.**

In [53]:
MAX_HASH_TABLE_SIZE = 4096

class HashTable:                ## fallow below table (not this hashtable)
    count=0
    def __init__(self, max_size=MAX_HASH_TABLE_SIZE):
        self.data = [None] * max_size
        HashTable.count=0
        self.tombstone=0
        self.count=0
        
    def get_valid_index(self, key):
        # Use Python's in-built `hash` function and implement linear probing
        idx=get_index(self.data,key)
        while True:
            kv=self.data[idx]
            if kv is None or kv == 'tombstone':
                return idx
            k,v=kv
            if k==key:
                return idx
            idx+=1
            if idx==len(self.data):
                idx=0
        
    def __getitem__(self, key):
        # Implement the logic for "find" here
        idx=self.get_valid_index(key)
        sidx=idx
        while True:
            kv=self.data[idx]
            if kv is None:
                return None
            if kv!='tombstone':
                k,v=kv
                if key==k:
                    return v
            idx=(idx+1)%len(self.data)
            if idx==sidx:
                return None
              
    
    def __setitem__(self, key, value):
        # Implement the logic for "insert/update" here
        idx=self.get_valid_index(key)
        kv=self.data[idx]
        if kv is None:
            self.count+=1
        self.data[idx]=(key,value)
        print(self.data[idx])

        
        load_fact= self.count/len(self.data) #if it will becomw 0.7
        if load_fact>0.7:
            self.resize_rehashing()

    def delete(self,key):
        idx=self.get_valid_index(key)
        sidx=idx
        while True:
            kv= self.data[idx]
            if kv is None:
                return
            if kv!='tombstone':
                k,v=kv
                if k==key:
                    self.data[idx]='tombstone'
                    self.tombstone+=1
                    if self.tombstone>len(self.data)//2:
                        self.resize_rehashing()
                    return
            idx=(idx+1)%len(self.data)
            if idx==sidx:
                return None   
        
        
    def resize_rehashing(self):
        old_data=self.data
        self.data=[None]*(len(old_data)*2) #resizing
        self.count=0
        self.tumbstone=0

        for kv in old_data:
            if kv is not None and kv!='tombstone':
                k,v=kv
                self[k]=v #reinserting so it will go to existing place
    
    def __iter__(self):
        return (x for x in self.data if x is not None)
    
    def __len__(self):
        return self.count# o(1) complexity   #we can also use this return len([x for x in self]) but o(n) will be complexity
    
    def __repr__(self):
        from textwrap import indent
        pairs = [indent("{} : {}".format(repr(kv[0]), repr(kv[1])), '  ') for kv in self]
        return "{\n" + "{}".format('\n'.join(pairs)) + "\n}"
    
    def __str__(self):
        return repr(self)

If the `HashTable` class was defined correctly, the following cells should output `True`.

In [55]:
# Create a hash table
table = HashTable()

In [56]:
# Insert some key-value pairs
table['a'] = 1
table['b'] = 34
table['ab']=100
table['ba']=101

('a', 1)
('b', 34)
('ab', 100)
('ba', 101)


In [58]:
table.delete('ba')

In [298]:
list(table)

[('a', 1), ('b', 34), ('ab', 100), 'tombstone']

In [299]:
# Retrieve the inserted values
table['a'] == 1 and table['b'] == 34

True

In [143]:
# Update a value
table['a'] = 99

# Check the updated value
table['a'] == 99

True

In [300]:
table['ba']

In [277]:
table['ba']=101

('ba', 101)


In [178]:
get_index([None]*4096,'ab')

195

In [180]:
data=[None]*4096

In [206]:
del list

In [289]:
table.delete('ab')

In [205]:
# Get a list of key-value pairs
list(table) 

[('a', 1), ('b', 34), ('ab', 100), ('ba', 101)]

Since we have also implemented the `__repr__` and `__str__` functions, the output of the next cell should be:

```
{
  'b' : 34
  'a' : 99
}
```

In [202]:
table

{
  'a' : 1
  'b' : 34
  'ab' : 100
  'ba' : 100
}

In [158]:
table['d']='69'

In [59]:
len(table)

4

Let's save our work before continuing.

In [49]:
import jovian

<IPython.core.display.Javascript object>

In [None]:
jovian.commit(project=project)

<IPython.core.display.Javascript object>

[jovian] Attempting to save notebook..[0m


### (Optional) Hash Table Improvements

Here are some more improvements/changes you can make to your hash table implementation:

* **Track the size of the hash table** i.e. number of key-value pairs so that `len(table)` has complexity O(1). **Done by me**
* **Implement deletion with tombstones** as described here: https://research.cs.vt.edu/AVresearch/hashing/deletion.php
* **Implement dynamic resizing** to automatically grow/shrink the data list: https://charlesreid1.com/wiki/Hash_Maps/Dynamic_Resizing
* **Implement separate chaining**, an alternative to linear probing for collision resolution: https://www.youtube.com/watch/T9gct6Dx-jo


## Implementation above Three

In [61]:
def get_valid_index(data_list, key):
    # Start with the index returned by get_index
    # idx = get_index(data_list,key)  or
    result=0
    for i in key:
        n=ord(i)
        result+=n
    idx=result % len(data_list)
    first_tumbstone=None
    sidx=idx
    
    while True:
        # Get the key-value pair stored at idx
        kv = data_list[idx]
        if kv =='tombstone' and first_tumbstone==None:
            first_tumbstone=idx
        # If it is None, return the index
        elif kv is None:
            if first_tumbstone is not None:
                return first_tumbstone
            else:
                return idx
        # If the stored key matches the given key, return the index
        elif kv is not None:
            k, v = kv
            if k==key:
                return idx
        
        # Move to the next index
        idx += 1
        
        # Go back to the start if you have reached the end of the array
        if idx == len(data_list):
            idx = 0
        if idx==sidx:
            if first_tombstone is not None:
                return first_tombstone     #if key has no None
            raise Exception("Hashtable is full")

In [177]:
class HashTable:
    def __init__(self,initial=8):
        self.data=[None]*initial
        self.count=0
       
    def resize(self,newsize):
        old=self.data
        self.data=[None]*newsize
        self.count=0

        for kv in old:
            if kv is not None and kv !='tombstone':
                k,v=kv
                self[k]=v
    def __setitem__(self,key,value):
        idx=get_valid_index(self.data,key)
        kv=self.data[idx]
        if kv is None or kv=='tombstone':
            self.data[idx]=(key,value)
            self.count+=1  
            if self.count/len(self.data)>0.7:
                self.resize(len(self.data)*2)
            return
        k,v=kv
        if k==key:
            self.data[idx]=(key,value)
        
        

    def __getitem__(self,key):
        idx=get_valid_index(self.data,key)
        kv=self.data[idx]
        if kv is None or kv=='tombstone':
            return None
        else:
            k,v=kv
            if k==key:
                return v
            
                
    def delete(self,key):
        idx=get_valid_index(self.data,key)
        kv=self.data[idx]
        if kv is None or kv == 'tombstone':
            print('It is None already')
            return None 
        else:
            self.data[idx]='tombstone'
            self.count-=1
            if self.count<=len(self.data)//4 and self.count>8:
                self.resize(len(self.data)//2)

    def __len__(self):
        return self.count
    def __iter__(self):
        return ((k,v) for kv in self.data if kv is not None and kv!='tombstone' for k, v in [kv])
    def __repr__(self):
            return "{ "+", ".join(f"{k}:{v}"for k,v in self if k!='tombstone')+ " }"
    def __str__(self):
        return self.__repr__(self)
                

In [195]:
table=HashTable()

In [211]:
table.data

[None,
 None,
 None,
 ('ab', 90),
 None,
 ('abr', 90),
 ('abc', 3),
 ('cab', 3),
 ('abe', 90),
 None,
 ('abw', 90),
 None,
 None,
 None,
 None,
 None]

In [201]:
list(table)

[('bca', 2), ('abc', 3), ('cab', 3)]

In [226]:
table['abc']=3

In [225]:
table['cab']=3

In [222]:
table['bca']=2

In [223]:
table['ab']=90

In [220]:
table['abe']=90
table['abw']=90
table['abr']=90

In [221]:
table.delete('bca')

It is None already


In [215]:
table['bca']

2

In [71]:
len(table)

3

In [219]:
table.delete('ab')

It is None already


### (Optional) Complexity Analysis

With choice of a good hashing function and other improvements like dynamic resizing, you can 


| Operation      | Average-case time complexity | Worst-case time complexity |
| ----------- | ----------- |---------|
| Insert/Update      | **O(1)**    | **O(n)**|
| Find   | **O(1)**    | **O(n)**|
| Delete   | **O(1)**    | **O(n)**|
| List   | **O(n)**    | **O(n)**|


Here are some questions to ponder upon?

- What is average case complexity? How does it differ from worst-case complexity?
- Do you see why insert/find/update have average-case complexity of O(1) and worst-case complexity of O(n) ?
- How is the complexity of hash tables different from binary search trees?
- When should you prefer using hash table over binary trees or vice versa?

Discuss your answers on the forum: https://jovian.ai/forum/c/data-structures-and-algorithms-in-python/assignment-2/88