### 11. Search Trees

In this notebook we will look at searching algorithms and sub data structures thatr can let us perform fast search operations. Naive search algorithm can search for value in the data structure in linear time. The data structures we will look at will letr us perform that operation much faster than that.

We previously looked at heaps which are very fast at finding the minimum/maximum value from an evolving set of values. If our usecase is just to find min/max (only one of min/max) values, heaps are very simple yet efficient data structures. If we want to extend our set of operations to the following list *to start with*

- *Search*: Given a key, check if the key exists in the data structure
- *Min/Max*: Find the minimum and maximum (both operations supported on the same datastructure, unlike heap) key
- *Predecessor/Successor*: Given a key, find an object which is the predecessor/successor of the object with given key
- *SortedOutput* : Return the keys in the data structure in sorted order
- *Select*: Given a number i between 1 to n (number of keys in the data structure), return the $i^{th}$ smallest key
- *Rank*: Given a key k, give number of keys in the data structure with keys atmost k.

We will start with a super simple data structure, sorted arrays to implement the above mentioned operation

#### Sorted Arrays


In [1]:
class BinarySearchArray:
    
    def __init__(self, nums):
        self.array = sorted(nums)
        
    def search(self, n):
        
        def search_rec_(left, right):
            if left >= right:
                return None
            else:
                mid = (left + right) // 2
                if self.array[mid] == n:
                    return mid
                elif self.array[mid] > n:
                    return search_rec_(left = left, right = mid)
                else:
                    return search_rec_(left = mid + 1, right = right)    
            
        return search_rec_(left = 0, right = len(self.array))
        
        

    def min_(self):
        return self.array[0]

    def max_(self):        
        return self.array[-1]
    
    def predecessor(self, n):
        idx = self.search(n)
        if idx is not None:
            #This can make the running time linear. The fix if to enhance search to either return 
            #the left most of right most match and not the first match, 
            #this can easily be implemented by continuing to recurse till the left/right
            #of element index we return has a value != n. That way we still have log(n) guarantee. Here we 
            #want to keep the implementation of search simple
            while idx >= 0 and self.array[idx] == n:
                idx -= 1

        return (None, None) if idx is None or idx < 0 else (self.array[idx], idx)
        
    def successor(self, n):
        idx = self.search(n)
        if idx is not None:
            #See comment in predecessor on the linear time worst case and how we can make it logarithmic
            while idx < len(self.array) and self.array[idx] == n:
                idx += 1
        return (None, None) if idx is None or idx == len(self.array) else (self.array[idx], idx)
        
    def sorted_output(self):
        return list(self.array)
    
    def select(self, i):
        return self.array[i - 1] if i > 0 and i <= len(self.array) else None
    
    def rank(self, n):
        _, idx = self.successor(n)
        return None if idx is None else idx
        

In [2]:
unsorted_array = [5, 3, 8, 2, 5, 1, 7]
a = BinarySearchArray(unsorted_array)

In [3]:
print('Numbers in the array are', a.array)
print('Searching numbers 1 to 10 in the array gives index', [(i, a.search(i)) for i in range(1, 11)])

Numbers in the array are [1, 2, 3, 5, 5, 7, 8]
Searching numbers 1 to 10 in the array gives index [(1, 0), (2, 1), (3, 2), (4, None), (5, 3), (6, None), (7, 5), (8, 6), (9, None), (10, None)]


Notice that we have two 5s in the array and how we return in index of the first one we find when we recursively find. There is no guarantee we will find the first or the last value in search

Lets continue testing remaining operations


In [4]:
print('Predecessor of 1 should be None and we get', a.predecessor(1)[0])
print('Successor of 8 should be None and we get', a.successor(8)[0])
print('Predecessor of 5 should be 3 and we get', a.predecessor(5)[0])
print('Successor of 5 should be 7 and we get', a.successor(5)[0])
print('Sorted output is ', a.sorted_output())
print('Rank of 5 is expected to be 5(as we have 5 numbers <= 5), got', a.rank(5))
print('Rank of 7 is expected to be 6(as we have 6 numbers <= 7), got', a.rank(7))
print('Select 10 should return None, got', a.select(10))
print('Select 6 should return 7 (now how this is reverse of rank), got', a.select(6))
print('Select 5 should return 5 (now how this is reverse of rank), got', a.select(5))

Predecessor of 1 should be None and we get None
Successor of 8 should be None and we get None
Predecessor of 5 should be 3 and we get 3
Successor of 5 should be 7 and we get 7
Sorted output is  [1, 2, 3, 5, 5, 7, 8]
Rank of 5 is expected to be 5(as we have 5 numbers <= 5), got 5
Rank of 7 is expected to be 6(as we have 6 numbers <= 7), got 6
Select 10 should return None, got None
Select 6 should return 7 (now how this is reverse of rank), got 7
Select 5 should return 5 (now how this is reverse of rank), got 5


Lets summarize the running times of our search using sorted arrays in the folowing table

|Operation|Running Time|
|--|--|
|Search|$\theta(log(n))$|
|Min|$\theta(1)$|
|Max|$\theta(1)$|
|Predecessor|$\theta(log(n))$|
|Successor|$\theta(log(n))$|
|SortedOutput|$\theta(n)$|
|Select|$\theta(1)$|
|Rank|$\theta(log(n))$|


Now that we have a working implemention of these 8 operations on arrays which gives is reasonably fast running time, we see there are a couple of issues with Sorted arrays. Many real world applications are dynamic in nature and supporting insertion and deletion of elements is curcial. Inserting and deleting elements from Sorted array is linear time operation in the best case and thus we will need a different data structure to handle dynamic data.

We will now look at Search trees.

#### Search Trees

Search Trees will achieve all operations in $\theta(log(n))$ time except sorted output which will run in $\theta(n)$ time.

Following table summarizes the running times

|Operation|Sorted Array|*Balanced* Search Tree|
|--|--|--|
|Search|$\theta(log(n))$|$\theta(log(n))$|
|Min|$\theta(1)$|$\theta(log(n))$|
|Max|$\theta(1)$|$\theta(log(n))$|
|Predecessor|$\theta(log(n))$|$\theta(log(n))$|
|Successor|$\theta(log(n))$|$\theta(log(n))$|
|SortedOutput|$\theta(n)$|$\theta(n)$|
|Select|$\theta(1)$|$\theta(log(n))$|
|Rank|$\theta(log(n))$|$\theta(log(n))$|
|**Insert**|$\theta(n)$|$\theta(log(n))$|
|**Delete**|$\theta(n)$|$\theta(log(n))$|


The search tree running time guarantees are for balances binary tree.

Implementing self balancing trees is not trivial and our initial implementation will not be looking to implement self balancing trees but rely on the fact that the numbers being added to th tree are not sorted and random in nature. We will see how providing sorted keys for insertion in simple


In [92]:
class BinarySearchTree:
    
    class Node:
        
        def __init__(self, v):
            self.parent = None
            self.left = None
            self.right = None
            self.count = 1
            self.value = v
            
    
    def __init__(self):
        self.root = None
        
    def insert(self, v):
        def insert_(current, parent, isLeft):
            if current is None:
                n = self.Node(v)
                n.parent = parent
                if isLeft:
                    parent.left = n
                else:
                    parent.right = n
            else:
                if v == current.value:
                    current.count += 1
                elif v < current.value:
                    insert_(current.left, current, True)
                else:
                    insert_(current.right, current, False)
    
        if self.root is None:
            self.root = self.Node(v)
        else:
            insert_(self.root, None, None)
            
        
    def delete(self, v):
        pass
        
    def search(self, n):        
        def search_(current):
            if current is None:
                return None
            elif current.value == n:
                return current
            elif current.value > n:
                return search_(current.left)
            else:
                return search_(current.right)
            
        return search_(self.root)

    def __min__(self, node):
        if node is None:
            return None
        if node.left is None:
            return node
        else:
            return self.__min__(node.left)
        
    def min_(self):
        return None if self.root is None else self.__min__(self.root).value

    def __max__(self, node):
        if node is None:
            return None
        if node.right is None:
            return node
        else:
            return self.__max__(node.right)
        
    def max_(self):
        return None if self.root is None else self.__max__(self.root).value
    
    def predecessor(self, n):
        node = self.search(n)
        if node is not None:
            if node.left is None:
                while node.parent is not None and node == node.parent.left:
                    node = node.parent
                
                if node.parent is None:
                    #Case when we are trying to find the predecessor of minimum
                    return None
                else:
                    return node.parent.value
            else:
                return self.__max__(node.left).value
        else:
            return None
            
        
    def successor(self, n):
        pass

    def sorted_output(self):
        res = []
        def inorder_(node):
            if node is not None:
                inorder_(node.left)
                res.extend([node.value] * node.count)
                inorder_(node.right)
        inorder_(self.root)
        return res
    
    def select(self, i):
        pass
    
    def rank(self, n):
        pass

In [93]:
bst = BinarySearchTree()
bsa = BinarySearchArray(unsorted_array)
for i in unsorted_array:
    bst.insert(i)
print('Searching numbers 1 to 10 in the Tree gives', [(i, bst.search(i) is not None) for i in range(1, 11)])
print('\nSearching numbers 1 to 10 in the Array gives', [(i, bsa.search(i) is not None) for i in range(1, 11)])

Searching numbers 1 to 10 in the Tree gives [(1, True), (2, True), (3, True), (4, False), (5, True), (6, False), (7, True), (8, True), (9, False), (10, False)]

Searching numbers 1 to 10 in the Array gives [(1, True), (2, True), (3, True), (4, False), (5, True), (6, False), (7, True), (8, True), (9, False), (10, False)]



As we see above, the binary search tree and search array gives identical results on searching.

In [102]:
print('Minimum in BinarySearchArray is', bsa.min_(), ', minimum in BinarySearchTree is', bst.min_())
print('Maximum in BinarySearchArray is', bsa.max_(), ', maximum in BinarySearchTree is', bst.max_())
print("=" * 100)
print('Predecessors using BinarySearchArray for all elements give', 
      [(i, bsa.predecessor(i)[0]) for i in unsorted_array])
print('Predecessors using BinarySearchTree for all elements give', 
      [(i, bst.predecessor(i)) for i in unsorted_array])
print("=" * 100)
print('Successor using BinarySearchArray for all elements give', 
      [(i, bsa.successor(i)[0]) for i in unsorted_array])
#TODO
print('Successor using BinarySearchTree for all elements give', 
      [(i, bst.successor(i)) for i in unsorted_array])

print("=" * 100)
print('Sorted output using BinarySearchArray for all elements give', bsa.sorted_output())
print('Sorted output using BinarySearchTree for all elements give', bst.sorted_output())


Minimum in BinarySearchArray is 1 , minimum in BinarySearchTree is 1
Maximum in BinarySearchArray is 8 , maximum in BinarySearchTree is 8
Predecessors using BinarySearchArray for all elements give [(5, 3), (3, 2), (8, 7), (2, 1), (5, 3), (1, None), (7, 5)]
Predecessors using BinarySearchTree for all elements give [(5, 3), (3, 2), (8, 7), (2, 1), (5, 3), (1, None), (7, 5)]
Successor using BinarySearchArray for all elements give [(5, 7), (3, 5), (8, None), (2, 3), (5, 7), (1, 2), (7, 8)]
Successor using BinarySearchTree for all elements give [(5, None), (3, None), (8, None), (2, None), (5, None), (1, None), (7, None)]
Sorted output using BinarySearchArray for all elements give [1, 2, 3, 5, 5, 7, 8]
Sorted output using BinarySearchTree for all elements give [1, 2, 3, 5, 5, 7, 8]
