In [1]:
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline  

# CMP 3002 
## Hash Tables

## Review

## Binary Search Trees (BST)

![](./binary_search_tree.drawio.png)

- Special form of a binary tree. 
- The value of each node must be greater than (or equal to) any values in the left subtree
- The value of each node must be less than (or equal to) any values in the right subtree


## Search in a BST

Binary search tress support the following operations:
- search 
- insertion 
- deletion

Following the main property of BST, for each node we visit while search:

- return the node is the target is equal to the value of the node
- continue searching in the left subtree if the target value is less than the value of the node
- continue searching in the more subtree if the target value is more than the value of the node
- return false if there are no children

In [4]:
class Node:
    def __init__(self, val=0, left=None, right=None):
        self.val = x
        self.left = None
        self.right = None

def search_BST(root, val):
    if root is None or val == root.val:
        node = root
    elif val < root.val:
        node = search_BST(root.left, val)
    else:
        node = search_BST(root.right, val)
    return node        

## Insertion in a BST

- The goal is to minimize the number of changes.
- To do so, we find a leaf position for the target node
- Insert it as a leaf

**Note:** Insertion starts with a search


1. Search the tree until we reach an external node
2. Add the node as its left or right child (depending on whether is lower or greater than the leaf's value)

**Note:** This way we maintain the property of the BST




In [6]:
def insert_BST(root, val):
    if not root:
        return Node(val)
        
    if val > root.val:            
        root.right = insert_BST(root.right, val)
    else:    
        root.left = insert_BST(root.left, val)
    return root

## Deletion in a BST

- Deletion requires search and insertion. 
- The goal is to replace the target node with a proper child
- Three cases to consider:
    1. Target node has no child, so we simply remove the node
    2. Target node has 1 child. We use the child to replace the target
    3. Target node has 2 children, replace the node with its in-order successor or predecessor node




## Deletion in a BST - Implementation

How to get the successor?
- Go to the right once, and then as many as possible to the left

How to get the predecessor?
- Go to the left once, and then as many as possible to the right


In [7]:
def successor(self, root):
    root = root.right
    while root.left:
        root = root.left
    return root.val

def predecessor(self, root):
    root = root.left
    while root.right:
        root = root.right
    return root.val

In [8]:
def delete_BST(root, val):
    if not root:
        return None

    # right subtree
    if val > root.val:
        root.right = delete_BST(root.right, val)
        
    # left subtree
    elif val < root.val:
        root.left = delete_BST(root.left, val)
        
    # current node
    else:
        # case 1
        if not (root.left or root.right):
            root = None
        # has a right child (case 2 or 3)
        elif root.right:
            root.val = successor(root)
            root.right = delete_BST(root.right, root.val)
        # has no right child and has a left child (case 2)  
        else:
            root.val = predecessor(root)
            root.left = delete_BST(root.left, root.val)

    return root

## Height-Balanced BST

- BST that automatically keeps its height small in the face of arbitrary item insertions and deletions
- The height of a balanced BST with N nodes is always $log(N)$
- The height of the two subtrees of every node never differs by more than 1

### Exercise:

- Validate that a BST is height balanced

In [2]:
def height_BST(root):    
    if not root:
        return -1
    return max(height_BST(root.left), height_BST(root.right)) + 1


def validate_BST(root):
    
    if not root:
        return True
    
    return abs(height_BST(root.left) - height_BST(root.right) <= 1) \
            and validate_BST(root.left) and validate_BST(root.right)
    

## Hash Table

- Data structure that organizes data using hash functions to support fast insertion and search

- Two kinds of hash tables:
    - hash set
    - hash map

- Hash set is an implementation to avoid storing repeated values
- Hash map allows us to store key-value (k,v) pairs
    - Can't have duplicate keys


### Hash tables

- Use a hash funciton to map keys to buckets
- When we insert a new key, the hash function decides which bucket they key should be assigned 
- When we search for a key, the hash table will use the same hash function to find the bucket

By hashing the indexing, we can do the mapping between the index and the location in memory where we can read the value quickly.


### Hash functions

- Function that can be used to map data of any size to a fixed-size values
- A hash function is usually a one-way function (it can't be inverted)
- Used to index hash tables 
- Cryptographic applications



### Hash functions

- The hash function is the most important componen of a hash tacle
- Example: $F(x) = x % 5$
- We need to pick a function with a wide range to avoid collisions 
- The function should assign the key to the bucket in a uniform manner
- Ideally a one-one mapping between the key and the bucket
- Hash functions are usually not perfect and there is a tradeoff between the number of buckets and the capacity of a bucket


### Hash functions - collisions

- Collisions are inevitable
- We need an algorithm to solve the following questions:

    - how do we organize values in the same bucket?
    - what happens if the bucket has too many keys assigned?
    - how do we search a target value in a bucket?

## Complexity Analysis

Assuming $N$ keys in total:

- Space complexity is $O(N)$
- Search $O(1)$, depends on the design of the table. In the worst case this can be $O(N)$


### Exercise 1

Design a HashSet without using any built-in hash table libraries.

Implement HashSet class:

- void add(key) Inserts the value key into the HashSet.
- bool contains(key) Returns whether the value key exists in the HashSet or not.
- void remove(key) Removes the value key in the HashSet. If key does not exist in the HashSet, do nothing.
 

### Exercise 2

Design a HashMap without using any built-in hash table libraries.

Implement the HashMap class:

- HashMap() initializes the object with an empty map.
- void put(int key, int value) inserts a (key, value) pair into the HashMap. If the key already exists in the map, update the corresponding value.
- int get(int key) returns the value to which the specified key is mapped, or -1 if this map contains no mapping for the key.
- void remove(key) removes the key and its corresponding value if the map contains the mapping for the key.