# COMP0005 - GROUP COURSEWORK
# Experimental Evaluation of Search Data Structures and Algorithms

The cell below defines **AbstractSearchInterface**, an interface to support basic insert/search operations; you will need to implement this three times, to realise your three search data structures of choice among: (1) *2-3 Tree*, (2) *AVL Tree*, (3) *LLRB BST*; (4) *B-Tree*; and (5) *Scapegoat Tree*. <br><br>**Do NOT modify the next cell** - use the dedicated cells further below for your implementation instead. <br>

In [25]:
# DO NOT MODIFY THIS CELL

from abc import ABC, abstractmethod  

class AbstractSearchInterface(ABC):
    '''
    Abstract class to support search/insert operations (plus underlying data structure)
    
    '''
        
    @abstractmethod
    def insertElement(self, element):     
        '''
        Insert an element in a search tree
            Parameters:
                    element: string to be inserted in the search tree (string)

            Returns:
                    "True" after successful insertion, "False" if element is already present (bool)
        '''
        
        pass 
    

    @abstractmethod
    def searchElement(self, element):
        '''
        Search for an element in a search tree
            Parameters:
                    element: string to be searched in the search tree (string)

            Returns:
                    "True" if element is found, "False" otherwise (bool)
        '''

        pass

Use the cell below to define any auxiliary data structure and python function you may need. Leave the implementation of the main API to the next code cells instead.

In [26]:
# ADD AUXILIARY DATA STRUCTURE DEFINITIONS AND HELPER CODE HERE

class Node:
    def __init__(self, key):
            self.key = key
            self.left = None
            self.right = None

class LLRBNode(Node):
    def __init__(self, key, is_red=True):
            super().__init__(key)
            self.is_red = is_red

Use the cell below to implement the requested API by means of **2-3 Tree** (if among your chosen data structure).

In [27]:
class TwoThreeTree(AbstractSearchInterface):
        
    def insertElement(self, element):
        inserted = False
        # ADD YOUR CODE HERE
      
        
        return inserted
    
    

    def searchElement(self, element):     
        found = False
        # ADD YOUR CODE HERE

        
        return found    

Use the cell below to implement the requested API by means of **AVL Tree** (if among your chosen data structure).

In [28]:
class AVLTree(AbstractSearchInterface):
        
    def insertElement(self, element):
        inserted = False
        # ADD YOUR CODE HERE
      
        
        return inserted
    
    

    def searchElement(self, element):     
        found = False
        # ADD YOUR CODE HERE

        
        return found  

Use the cell below to implement the requested API by means of **LLRB BST** (if among your chosen data structure).

In [50]:


class LLRBBST(AbstractSearchInterface):
        
    def __init__(self):
        self.root = None

    def isRed(self, node):
        if node is None:
            return False
        return node.is_red

    def rotateLeft(self, h):
        x = h.right
        h.right = x.left
        x.left = h
        # Inherit h's color; h becomes red
        x.is_red = h.is_red
        h.is_red = True
        return x

    def rotateRight(self, h):
        x = h.left
        h.left = x.right
        x.right = h
        # Inherit h's color; h becomes red
        x.is_red = h.is_red
        h.is_red = True
        return x

    def flipColors(self, h):
        h.is_red = not h.is_red
        if h.left:
            h.left.is_red = not h.left.is_red
        if h.right:
            h.right.is_red = not h.right.is_red

    def _insert(self, h, key):
        if h is None:
            return LLRBNode(key, True), True

        inserted = False 
        if key < h.key:
            h.left, inserted = self._insert(h.left, key)
        elif key > h.key:
            h.right, inserted = self._insert(h.right, key)
        else:
            return h, False

        if self.isRed(h.right) and not self.isRed(h.left):
            h = self.rotateLeft(h)
        if self.isRed(h.left) and self.isRed(h.left.left):
            h = self.rotateRight(h)
        if self.isRed(h.left) and self.isRed(h.right):
            self.flipColors(h)

        return h, inserted

    def insertElement(self, key):
        self.root, inserted = self._insert(self.root, key)
        if inserted:
            # Root must always be black after insertion
            self.root.is_red = False
        return inserted

    def searchElement(self, key):
        current = self.root
        while current is not None:
            if key < current.key:
                current = current.left
            elif key > current.key:
                current = current.right
            else:
                return True
        return False

    def inorderTraversal(self):
        """
        Return a list of all keys in ascending order.
        """
        result = []
        self._inorderTraversalHelper(self.root, result)
        return result

    def _inorderTraversalHelper(self, node, result):
        """
        Private helper for the in-order traversal.
        """
        if node is None:
            return
        self._inorderTraversalHelper(node.left, result)
        result.append(node.key)
        self._inorderTraversalHelper(node.right, result)
    
    
if __name__ == "__main__":
    tree = LLRBBST()
    tree.insertElement("hello")
    tree.insertElement("hell")
    tree.insertElement("hel")
    tree.insertElement("he")
    tree.inorderTraversal()

Use the cell below to implement the requested API by means of **B-Tree** (if among your chosen data structure).

In [30]:
class BTree(AbstractSearchInterface):
        
    def insertElement(self, element):
        inserted = False
        # ADD YOUR CODE HERE
      
        
        return inserted
    
    

    def searchElement(self, element):     
        found = False
        # ADD YOUR CODE HERE

        
        return found

Use the cell below to implement the requested API by means of **Scapegoat Tree** (if among your chosen data structure).

In [31]:
import math

class ScapegoatTree(AbstractSearchInterface):

    def __init__(self, alpha):
        self.root = None
        self.noOfNodes = 0
        self.alpha = alpha      #Constant in the range (0.5,1)
        self.maxDepth = 1


    def size(self, node: Node) -> int:
        if node == None:
            return 0
        return 1 + self.size(node.left) + self.size(node.right)


    def updateMaxDepth(self):
        #This is only called when a node is inserted
        self.maxDepth = math.floor(math.log(self.noOfNodes, 1/self.alpha))

    def insertHelper(self, insertNode: Node, treeNode: Node, nodePath: list[Node]):
        nodePath.append(treeNode)
        if insertNode.key < treeNode.key:
            if treeNode.left is None:
                return nodePath, "l"
            return self.insertHelper(insertNode, treeNode.left, nodePath)
        elif insertNode.key > treeNode.key:
            if treeNode.right is None:
                return nodePath, "r"
            return self.insertHelper(insertNode, treeNode.right, nodePath)
        return None             #If None is returned then insertNode.key == treeNode.key (insertNode key already exists in tree)
    

    def connectInsertionNode(self, insertedPathAndDir, insertedElement):
        if insertedPathAndDir[1] == "l":
            insertedPathAndDir[0][-1].left = insertedElement
        else:
            insertedPathAndDir[0][-1].right = insertedElement


    def insertElement(self, element):
        inserted = False

        if self.root is None:           #inserting initial item
            self.root = Node(element)
            self.noOfNodes += 1
            self.updateMaxDepth()
            return True
        
        insertedElement = Node(element)
        insertedPathAndDir = self.insertHelper(insertedElement, self.root, [])
        if insertedPathAndDir:
            self.connectInsertionNode(insertedPathAndDir, insertedElement)
            inserted = True
            self.noOfNodes += 1
            self.updateMaxDepth()
            if len(insertedPathAndDir[0]) > self.maxDepth:
                self.rebuildSubtree(self.scapegoat(insertedPathAndDir))
        return inserted
    

    def scapegoat(self, insertedPathAndDir):
        path = insertedPathAndDir[0]   
        if insertedPathAndDir[1] == "l":
            currentNode = path[-1].left
        else:
            currentNode = path[-1].right
        
        scapegoatFound = False
        currentSize = -1

        while not(scapegoatFound):
            parentNode = path.pop()

            if currentNode == parentNode.left:
                siblingNode = parentNode.right
            else:
                siblingNode = parentNode.left

            if currentSize < 0:
                currentSize = self.size(currentNode)
            siblingSize = self.size(siblingNode)
            parentSize = currentSize + siblingSize + 1

            if currentSize / parentSize <= self.alpha:
                currentNode = parentNode
                currentSize = parentSize
            else:
                scapegoatFound = True
        return parentNode, path.pop() #Returns scapegoat and parent of scapegoat so we can reattach rebuilt subtree
    

    def inOrderTraversal(self, node: Node):
        if node == None:
            return []
        return self.inOrderTraversal(node.left) + [node] + self.inOrderTraversal(node.right)


    def rebuildSubtree(self, scapegoatPair):
        scapegoatNode = scapegoatPair[0]
        parentNode = scapegoatPair[1]
        orderedNodeList = self.inOrderTraversal(scapegoatNode)
        end = len(orderedNodeList)-1
        rootIndex = end // 2
        subRoot = orderedNodeList[rootIndex]
        if parentNode.left == scapegoatNode:
            parentNode.left = subRoot
        else:
            parentNode.right = subRoot
        subRoot.left = self.rebuildHelper(orderedNodeList, 0, rootIndex-1)
        subRoot.right = self.rebuildHelper(orderedNodeList, rootIndex+1, end)


    def rebuildHelper(self, nodeList: list[Node], start: int, end: int):
        if start <= end:
            rootIndex = (end + start) // 2
            root = nodeList[rootIndex]
            root.left = self.rebuildHelper(nodeList, start, rootIndex-1)
            root.right = self.rebuildHelper(nodeList, rootIndex+1, end)
            return root
            

    def searchElement(self, element):     
        found = False
        if self.insertHelper(Node(element), self.root, []) == None:
            found = True        #insertHelper takes in an insert Node and starting Node (root) and returns the correct Parent Node and direction for us to insert the Node in
                                #if None is returned then we couldn't find a suitable place to put it in (i.e. Node with that key already exists)
        return found 
    


#Test data mainly copied from geeksforgeeks; when opening debugger the final tree is the same as theirs so im assuming this works for now
tree = ScapegoatTree(0.67)
tree.insertElement("7") 
tree.insertElement("6") 
tree.insertElement("8") 
tree.insertElement("5") 
tree.insertElement("9") 
tree.insertElement("2") 
tree.insertElement("1") 
tree.insertElement("4") 
tree.insertElement("0") 
tree.insertElement("3") 
tree.insertElement("3.5")
print(tree.searchElement("4"))
print(tree.searchElement("11"))
print([x.key for x in tree.inOrderTraversal(tree.root)])
    

True
False
['0', '1', '2', '3', '3.5', '4', '5', '6', '7', '8', '9']


Use the cell below to implement the **synthetic data generator** needed by your experimental framework (be mindful of code readability and reusability).

In [51]:
import string
import random

class TestDataGenerator():
    
    def __init__(self, num_samples=1000):
        self.num_samples = num_samples
        self.letters = string.ascii_letters
        self.lowercase = string.ascii_lowercase

    #random strings of fixed length
    def gen_rand(self, strlen):
        result = []
        for _ in range(self.num_samples):
            result.append(''.join(random.choices(self.letters, k=strlen)))
        return result
    
    # of the form: [aaa, aab, aac, aad, aae, aaf..... aba, abb, abc, abd]
    def gen_sorted(self, strlen, reversed=False):
        if len(self.letters)**strlen < self.num_samples:
            print("Not enough combinations. Use a larger string length")
            return
        
        
        characters = self.lowercase if not reversed else self.lowercase[::-1]
        
        def generate_combinations(prefix, length):
            if length == 0:
                result.append(prefix)
                return
            for char in characters:
                if len(result) >= self.num_samples:
                    return
                generate_combinations(prefix + char, length - 1)
        
        result = []
        generate_combinations('', strlen)
        return result
            
    
    def gen_high_dup(self):
        pass
    
    def gen_rand_len(self):
        pass
    
    def gen_increasing_len(self):
        pass
    
    def gen_huge_len(self):
        pass
    
test_code = TestDataGenerator(10)
llrbtree = LLRBBST()
data = test_code.gen_sorted(5, True)
print(data)
for str in data:
    llrbtree.insertElement(str)
    
for str in data:
    print(llrbtree.searchElement(str))

llrbtree.inorderTraversal()

['zzzzz', 'zzzzy', 'zzzzx', 'zzzzw', 'zzzzv', 'zzzzu', 'zzzzt', 'zzzzs', 'zzzzr', 'zzzzq']
True
True
True
True
True
True
True
True
True
True


['zzzzq',
 'zzzzr',
 'zzzzs',
 'zzzzt',
 'zzzzu',
 'zzzzv',
 'zzzzw',
 'zzzzx',
 'zzzzy',
 'zzzzz']

Use the cell below to implement the requested **experimental framework** (be mindful of code readability and reusability).

In [None]:
import timeit, random, string
import matplotlib.pyplot as plt

class ExperimentalFramework():

    def __init__(self):

    def run_experiment(self):
        pass

    def plot_results(self):
        pass
    

IndentationError: expected an indented block after function definition on line 6 (2767399655.py, line 8)

Use the cell below to illustrate the python code you used to **fully evaluate** your three chosen search data structures and algortihms. The code below should illustrate, for example, how you made used of the **TestDataGenerator** class to generate test data of various size and properties; how you instatiated the **ExperimentalFramework** class to  evaluate each data structure using such data, collect information about their execution time, plot results, etc. Any results you illustrate in the companion PDF report should have been generated using the code below.

In [None]:
# ADD YOUR TEST CODE HERE 



