# COMP0005 - GROUP COURSEWORK
# Experimental Evaluation of Search Data Structures and Algorithms

The cell below defines **AbstractSearchInterface**, an interface to support basic insert/search operations; you will need to implement this three times, to realise your three search data structures of choice among: (1) *2-3 Tree*, (2) *AVL Tree*, (3) *LLRB BST*; (4) *B-Tree*; and (5) *Scapegoat Tree*. <br><br>**Do NOT modify the next cell** - use the dedicated cells further below for your implementation instead. <br>

In [11]:
# DO NOT MODIFY THIS CELL

from abc import ABC, abstractmethod  

class AbstractSearchInterface(ABC):
    '''
    Abstract class to support search/insert operations (plus underlying data structure)
    
    '''
        
    @abstractmethod
    def insertElement(self, element):     
        '''
        Insert an element in a search tree
            Parameters:
                    element: string to be inserted in the search tree (string)

            Returns:
                    "True" after successful insertion, "False" if element is already present (bool)
        '''
        
        pass 
    

    @abstractmethod
    def searchElement(self, element):
        '''
        Search for an element in a search tree
            Parameters:
                    element: string to be searched in the search tree (string)

            Returns:
                    "True" if element is found, "False" otherwise (bool)
        '''

        pass

Use the cell below to define any auxiliary data structure and python function you may need. Leave the implementation of the main API to the next code cells instead.

In [12]:
# ADD AUXILIARY DATA STRUCTURE DEFINITIONS AND HELPER CODE HERE



Use the cell below to implement the requested API by means of **2-3 Tree** (if among your chosen data structure).

In [13]:
class TwoThreeTree(AbstractSearchInterface):
        
    def insertElement(self, element):
        inserted = False
        # ADD YOUR CODE HERE
      
        
        return inserted
    
    

    def searchElement(self, element):     
        found = False
        # ADD YOUR CODE HERE

        
        return found    

Use the cell below to implement the requested API by means of **AVL Tree** (if among your chosen data structure).

In [14]:
class AVLTree(AbstractSearchInterface):
        
    def insertElement(self, element):
        inserted = False
        # ADD YOUR CODE HERE
      
        
        return inserted
    
    

    def searchElement(self, element):     
        found = False
        # ADD YOUR CODE HERE

        
        return found  

Use the cell below to implement the requested API by means of **LLRB BST** (if among your chosen data structure).

In [15]:
class LLRBBST(AbstractSearchInterface):
        
    def insertElement(self, element):
        inserted = False
        # ADD YOUR CODE HERE
      
        
        return inserted
    
    

    def searchElement(self, element):     
        found = False
        # ADD YOUR CODE HERE

        
        return found  

Use the cell below to implement the requested API by means of **B-Tree** (if among your chosen data structure).

In [16]:
class BTree(AbstractSearchInterface):
        
    def insertElement(self, element):
        inserted = False
        # ADD YOUR CODE HERE
      
        
        return inserted
    
    

    def searchElement(self, element):     
        found = False
        # ADD YOUR CODE HERE

        
        return found

Use the cell below to implement the requested API by means of **Scapegoat Tree** (if among your chosen data structure).

In [17]:
class ScapegoatTree(AbstractSearchInterface):
        
    def insertElement(self, element):
        inserted = False
        # ADD YOUR CODE HERE
      
        
        return inserted
    
    

    def searchElement(self, element):     
        found = False
        # ADD YOUR CODE HERE

        
        return found 

Use the cell below to implement the **synthetic data generator** needed by your experimental framework (be mindful of code readability and reusability).

In [18]:
import string
import random

class TestDataGenerator():
    def generateString(length):
        return ''.join(random.choices(string.ascii_letters + string.digits, k=length))
    
    def generateData(numOfElements, minLength = 10, maxLength = 20):
        testData = []
        for i in range(numOfElements):
            length = random.randint(minLength, maxLength)
            testData.append(TestDataGenerator.generateString(length))
        return testData
    
    # Generate a list of sorted random strings.
    def generate_sorted_data(self, num_of_elements, min_length=10, max_length=20):
            test_data = self.generate_data(num_of_elements, min_length, max_length)
            return sorted(test_data)
    
    # Generate a list of reverse sorted random strings.
    def generate_reverse_sorted_data(self, num_of_elements, min_length=10, max_length=20):
        test_data = self.generate_data(num_of_elements, min_length, max_length)
        return sorted(test_data, reverse=True)
    
    # Generate a list of random strings with a specified ratio of duplicates.
    def generate_duplicates_data(self, num_of_elements, min_length=10, max_length=20, duplicate_ratio=0.1):
        unique_data = self.generate_data(int(num_of_elements * (1 - duplicate_ratio)), min_length, max_length)
        duplicates = random.choices(unique_data, k=int(num_of_elements * duplicate_ratio))
        return unique_data + duplicates
    
    # Generate a list of random strings with a fixed length.
    def generate_fixed_length_data(self, num_of_elements, fixed_length=15):
        return [self.generate_string(fixed_length) for _ in range(num_of_elements)]
    
    

Use the cell below to implement the requested **experimental framework** (be mindful of code readability and reusability).

In [19]:
import timeit
import matplotlib.pyplot as plt

class ExperimentalFramework():
    def runExperiment(self, dataStructure, testData, operation):
        if operation == "insert":
            def experiment():
                for data in testData:
                    dataStructure.insertElement(data)
        elif operation == "search":
            def experiment():
                for data in testData:
                    dataStructure.searchElement(data)
        else:
            raise ValueError("Invalid operation. Use 'insert' or 'search'.")
        
        time = timeit.timeit(experiment, number=1)
        return time

    def evaluateExperiment(self, dataStructures, dataSetSizes, operation):
            results = {ds.__class__.__name__: [] for ds in dataStructures}
            for size in dataSetSizes:
                testData = TestDataGenerator.generateData(size)
                for ds in dataStructures:
                    time = self.runExperiment(ds, testData, operation)
                    results[ds.__class__.__name__].append(time)
                    print(f"{ds.__class__.__name__} - {operation} - Dataset size {size}: {time:.6f} seconds")
            return results
   
    def plotGraph(self, results, dataSetSizes, operation):
        plt.figure(figsize=(10, 6))
        for ds_name, times in results.items():
            plt.plot(dataSetSizes, times, label=ds_name, marker='o')

        plt.title(f"Performance Comparison: {operation.capitalize()} Operation")
        plt.xlabel("Dataset Size")
        plt.ylabel("Execution Time (seconds)")
        plt.legend()
        plt.grid(True)
        plt.show()

Use the cell below to illustrate the python code you used to **fully evaluate** your three chosen search data structures and algortihms. The code below should illustrate, for example, how you made used of the **TestDataGenerator** class to generate test data of various size and properties; how you instatiated the **ExperimentalFramework** class to  evaluate each data structure using such data, collect information about their execution time, plot results, etc. Any results you illustrate in the companion PDF report should have been generated using the code below.

In [20]:
# ADD YOUR TEST CODE HERE 



