# Q4 String Register (25 marks)

We want to design a data structure which we will call a *register* and which has the following properties:
* It stores strings of characters belonging to an alphabet $A$. We denote $c$ the total number of characters in the alphabet $A$. $A$ can be set to be any set of *comparable* characters, e.g. $\{a, b, c\}$, or $\{0, \dots,9\}$, or $\{a, \dots, z, A, \dots, Z\}$.
* The operation to determine whether a string of size $k$ belongs to the register takes $O(k \log c)$ runtime in the worst case.
* Adding a string to the register takes $O(k \times c)$ runtime in the worst case.
* Removing a string from the register takes $O(k \log c)$ runtime in the worst case.

Note that the runtime complexities are independent of the number of elements stored in the register.
Finally, remark that a string and (some of) its substring can all belong to the register.

## Q4.1 (22 marks)
Without using other data structures than Python lists, describe a data structure which meets the requirements described above. We recommend using a tree.

If you cannot find a way to meet the runtime requirements, provide an algorithm and implementation anyway. You may get up to half of the marks for doing so.

The register is implemented by AVL tree.

The AVL tree guarantees insertion, deletion, and searching time in $O(\log n)$. 

### Sturcture
The register contains a character set and their storing entries, both of which are AVL tree. The entry is the door to subtrees. The character set, called `alphabet`, is used to create a new subtree that connects the the inner most node.

![WX20191019-225330.png](https://i.loli.net/2019/10/19/Vbxl1krAcp9ehBY.png)
![WX20191019-225605.png](https://i.loli.net/2019/10/19/cEmDiUz6HSYsGjA.png)

For instance, when storing "FIT5211", its structure is following.
![WX20191019-225735.png](https://i.loli.net/2019/10/19/RdAuq5OjTeUcGBg.png)

### Preprocessing
Insert, search, and delete operation all include a preprocess where appending a special symbol `'\0'` at the end of string. This is the mark of string termination. As a result, the length of processed string is $k+1$

### Insert
To store a string, the register will firstly determine all the characters are in the alphabet. If it is the case, the register will add the occurence times of character by 1, and create the new entry for next character at this node. This process is performed iteratively and terminate when reaching the last character, which is `'\0'`.

### Search
While querying for a particular string, the register will check whether the occurence time is 0 or not, starting from outer most entry. If it is non-zero, register will enter next character's entry and redo the occurence times check. When all the characters of the string are confirmed existence, register will return `True`, otherwise `False`.

### Delete
Deletion is the inverse process of insert. Before deleting, the register checks its existence. If `True`, the register will read each character and deduct the occurence times by 1.

Implement this algorithm:

In [0]:
from copy import deepcopy

class Register:
    def __init__(self, charSet):
        # Initialize alphabet
        self.alphabet = BinarySearchTree()
        # Add end symbol to the set
        # Check the parameter type
        if type(charSet) == list:   # If it is a list
            charSet.append('\0')
        elif type(charSet) == str:  # If it is a string
            charSet = charSet + '\0'
        else:   # Otherwise, crash
            raise Exception("charSet must be list or string")
        # Build alphabet tree.
        for c in charSet:
            self.alphabet.put(c, [0, None])
        # Make a copy of alphabet as the entry of string tree
        self.entry = deepcopy(self.alphabet)
    
    def insert(self, s):
        # If given string is empty
        if len(s) == 0:
            return False
        # Pre-process of given string
        s = s + '\0'
        # Check whether the string contains invalid characters.
        if self._alphabetContainHelper(s) == False:
            return False
        # Use the starting character as the entry of this string.
        curr = self.entry[s[0]] # O(log c)
        # If given string is not empty, iteratively read and insert one by one.
        for i in range(1, len(s)+1):    # n times
            # If current node is not assigned an entry to the next character. Initialize one.
            if curr[1] == None:
                # Make an entry to next characters to the current letter by cloning the alphabet.
                curr[1] = deepcopy(self.alphabet)   # O(c)
            curr[0] += 1    # Add the counter by one, meaning that a string at this position contain this character. 
            # If not reach the last character, go to the entry to the next letter at this node.
            if i < len(s):
                curr = curr[1][s[i]]
        # Finish insertion
        return True

    # Time complexity: O(k*log c)
    def _alphabetContainHelper(self, s):
        # Iteratively check the characters
        for c in s: # k times.
            # If current character c is not in the alphabet, return False.
            if c not in self.alphabet:  # O(log c)
                return False
        # Successfully passed the check.
        return True

    def contain(self, s):
        # If given string is empty
        if len(s) == 0:
            return False
        # Pre-process
        s = s + '\0'
        # Find the entry.
        curr = self.entry[s[0]]
        # If given string is not empty, iteratively read and check one by one.
        for i in range(1, len(s)):
            # The counter is zero, indicating there is no string at this position has this letter.
            if curr[0] == 0:
                return False
            # If reached the end of alphabet tree chain, 
            # but there is still some characters needed to check. The result should be False.
            if curr[1] == None:
                return False
            # Go to next letter's node.
            curr = curr[1][s[i]]
        # Having determined the existence of characters from 1 to k-1. Now check the last one.
        # If existed, the given string belongs to the register.
        return curr[0] != 0

    def remove(self, s):
        # If given string is empty
        if len(s) == 0:
            return False
        # If the given string belongs to the register.
        if self.contain(s):
            # Pre-process
            s = s + '\0'
            # Find the entry.
            curr = self.entry[s[0]]
             # If given string is not empty, iteratively read and delete one by one.
            for i in range(1, len(s)):
                # Reduce the counter by 1.
                curr[0] -= 1
                curr = curr[1][s[i]]
            # Finished character deletion to k-1. Now do the same thing for the last one.
            curr[0] -= 1
            # Finished remove.
            return True
        # If not, deletion failed.
        return False

'''
    AVL Tree Implementation
'''
class BinarySearchTree:
    # Constructor
    def __init__(self):
        self.root = None
        self.size = 0

    # Get the number of items.
    def length(self):
        return self.size

    # overload for len()
    def __len__(self):
        return self.size

    # Inorder traversal
    def inorderRead(self, node):
        # Initialize a list to keep the node.
        l = []
        # If exists left child, go to left child and read.
        if node.hasLeftChild():
            l.extend(self.inorderRead(node.leftChild))
        # Read the current node.
        l.append(node.key)
        # If exists right child, go to right child and read.
        if node.hasRightChild():
            l.extend(self.inorderRead(node.rightChild))
        # Return result.
        return l

    # BFS traversal
    # Delete before submission
    def bfs_read(self):
        if not self.root:
            return
        # Initialize the queue
        queue = [self.root]
        while len(queue) != 0:
            nextQueue = []
            for node in queue:
                print(node)
                if node.hasLeftChild():
                    nextQueue.append(node.leftChild)
                if node.hasRightChild():
                    nextQueue.append(node.rightChild)
            queue = nextQueue

    # Insert new node.
    def put(self,key,val):
        # Increase the size by 1
        self.size = self.size + 1
        # If the root is not None
        if self.root:
            self._put(key,val,self.root)
        # Else, initialize the root
        else:
            self.root = TreeNode(key,val)
            self.root.payload

    # Insert helper
    def _put(self,key,val,currentNode):
        # If current node's key is greater than the inserted one.
        if key < currentNode.key:
            # Find a appropriate position in left subtree, if existed left child.
            if currentNode.hasLeftChild():
                   self._put(key,val,currentNode.leftChild)
            # If not, insert the new node as left child.
            else:
                   currentNode.leftChild = TreeNode(key,val,parent=currentNode)
                   # Update balance of left subtree
                   self.updateBalance(currentNode.leftChild)
        # If current node's key is smaller than the inserted one.
        elif key > currentNode.key:
            # Find a appropriate position in right subtree
            if currentNode.hasRightChild():
                   self._put(key,val,currentNode.rightChild)
            # If not, insert the new node as right child.
            else:
                   currentNode.rightChild = TreeNode(key,val,parent=currentNode)
                   # Update balance of right subtree
                   self.updateBalance(currentNode.rightChild)
        # If current node's key is equal to the inserted one.
        else:
            currentNode.payload = val

    # Overload of [] operator for insertion
    def __setitem__(self,k,v):
       self.put(k,v)

    # Get the payload by key
    def get(self,key):
       if self.root:
           res = self._get(key,self.root)
           if res:
                return res.payload
           else:
                return None
       else:
           return None

    # Get helper
    def _get(self,key,currentNode):
       # If current node is None, return None.
       if not currentNode:
           return None
       # If the key of current node matches the given one. Return its payload.
       elif currentNode.key == key:
           return currentNode
       # If the key of current node greater the given one. Go to left subtree.
       elif key < currentNode.key:
           return self._get(key,currentNode.leftChild)
       # If the key of current node greater the given one. Go to right subtree.
       else:
           return self._get(key,currentNode.rightChild)

    # Overload of [] operator for get.
    def __getitem__(self,key):
       return self.get(key)

    # Overload of 'in' operator.
    def __contains__(self,key):
       if self._get(key,self.root):
           return True
       else:
           return False

    # Delete the node by key.
    def delete(self,key):
      # If the size is greater than one, it means the tree is not a single node.
      if self.size > 1:
         # Get the node to be deleted by key.
         nodeToRemove = self._get(key,self.root)
         # If the retrieved node is not None, remove it.
         if nodeToRemove:
             self.remove(nodeToRemove)
             # Reduce the size by 1.
             self.size = self.size-1
         # Otherwise, report error.
         else:
             raise KeyError('Error, key not in tree')
      # If the size is 1, it means the tree is a single node, which is root itself.
      # Delete when root's key equal to the given one.
      elif self.size == 1 and self.root.key == key:
         self.root = None
         self.size = self.size - 1
      # Otherwise, report error.
      else:
         raise KeyError('Error, key not in tree')

    # Overload 'del' operator.
    def __delitem__(self,key):
       self.delete(key)

    # Delete helper.
    def remove(self,currentNode):
        if currentNode.isLeaf(): #leaf
            parentNode = currentNode.parent # Get its parent node.
            # If it is left child of parent's.
            if currentNode == currentNode.parent.leftChild:
                currentNode.parent.leftChild = None
            # If not, it is right child of parent's
            else:
                currentNode.parent.rightChild = None
        elif currentNode.hasBothChildren(): #interior
            succ = currentNode.findSuccessor()  # Find successor
            parentNode = succ.parent    # get the parent node of successor.
            succ.spliceOut()    # Disconnect the successor.
            currentNode.key = succ.key  # Move the key to current node
            currentNode.payload = succ.payload  # Move the payload to current node.
        else: # this node has one child
            # If it has left child.
            if currentNode.hasLeftChild():
                parentNode = currentNode.parent # Get its parent node.
                # If it is left child of parent's.
                if currentNode.isLeftChild():
                   # Redirect left child's parent to its parent.
                   currentNode.leftChild.parent = currentNode.parent
                   # Redirect its parent's left child to its left child.
                   currentNode.parent.leftChild = currentNode.leftChild
                # If it is right child of parent's.
                elif currentNode.isRightChild():
                    parentNode = currentNode.parent # Get its parent node.
                    # Redirect left child's parent to its parent.
                    currentNode.leftChild.parent = currentNode.parent
                    # Redirect its parent's right child to its right child.
                    currentNode.parent.rightChild = currentNode.leftChild
                # Otherwise, it is root.
                else:
                    parentNode = currentNode    # Assign root as parent node.
                    # Replace root node with left child
                    currentNode.replaceNodeData(currentNode.leftChild.key, \
                                    currentNode.leftChild.payload, \
                                    currentNode.leftChild.leftChild, \
                                    currentNode.leftChild.rightChild)
            # Otherwise, it has right child.
            else:
                if currentNode.isLeftChild():
                    parentNode = currentNode.parent # Get its parent node.
                    # Redirect right child's parent to its parent.
                    currentNode.rightChild.parent = currentNode.parent
                    # Redirect its parent's left child to its right child.
                    currentNode.parent.leftChild = currentNode.rightChild
                elif currentNode.isRightChild():
                    parentNode = currentNode.parent # Get its parent node.
                    # Redirect right child's parent to its parent.
                    currentNode.rightChild.parent = currentNode.parent
                    # Redirect its parent's right child to its right child.
                    currentNode.parent.rightChild = currentNode.rightChild
                else:
                    parentNode = currentNode    # Assign root as parent node.
                    # Replace root node with right child.
                    currentNode.replaceNodeData(currentNode.rightChild.key, \
                                    currentNode.rightChild.payload, \
                                    currentNode.rightChild.leftChild, \
                                    currentNode.rightChild.rightChild)
        self.updateBalance(parentNode)  # Update balance based on previously assigned parent node.

    def updateBalance(self, node):
        # If balance factor is either smaller than -1 or greater than 1. Rebalance.
        if node.balanceFactor > 1 or node.balanceFactor < -1:
            self.rebalance(node)
            return
        # If not root node.
        if node.parent != None:
            # If node is left child, add its parent's balance factor by 1.
            if node.isLeftChild():
                node.parent.balanceFactor += 1
            # If node is left child, decrease its parent's balance factor by 1.
            elif node.isRightChild():
                node.parent.balanceFactor -= 1
            # If parent's node factor is not zero, go up and update the parent's one.
            if node.parent.balanceFactor != 0:
                self.updateBalance(node.parent)

    def rebalance(self, node):
        # left rotate
        # Balance factor smaller than 0, right heavy.
        if node.balanceFactor < 0:
            # Check whether its right subtree is left heavy or not.
            if node.rightChild.balanceFactor > 0:
                # If yes, right rotate.
                self.rightRotate(node.rightChild)
                # Then, left rotate the tree
                self.leftRotate(node)   # Can it merge with the statement in else?
            else:
                # If not, left rotate directly.
                self.leftRotate(node)
        # right rotate
        # Balance factor smaller than 0, left heavy.
        elif node.balanceFactor > 0:
            # Check whether its left subtree is right heavy or not.
            if node.leftChild.balanceFactor < 0:
                # If yes, left rotate.
                self.leftRotate(node.leftChild)
                # Then, right rotate the tree
                self.rightRotate(node)
            else:
                # If not, right rotate directly.
                self.rightRotate(node)

    def leftRotate(self, rootNode):
        # Swap the right child and root.
        newRoot = rootNode.rightChild   # make a new variable to refer to the right child.
        rootNode.rightChild = newRoot.leftChild # Move the left child of new root to the old root's right child.
        if newRoot.hasLeftChild():  # The previous step could not guarantee there is a left child in new root.
            newRoot.leftChild.parent = rootNode # If had one, adjust its parent to the old root.
        newRoot.parent = rootNode.parent    # adjust the reference of parent to the new root.
        if rootNode.isRoot():   # If we are adjusting the root node of the entire tree, re-assign the tree reference
            self.root = newRoot
        else:
            # Adjust the parent reference
            if rootNode.isLeftChild():  # If the subtree attaches to the parent's left
                rootNode.parent.leftChild = newRoot
            else:   # Otherwise
                rootNode.parent.rightChild = newRoot
        newRoot.leftChild = rootNode    # Make the old root to be the left of new root
        rootNode.parent = newRoot   # Reflect this change to the old root's parent
        rootNode.balanceFactor = rootNode.balanceFactor + 1 - min(newRoot.balanceFactor, 0)
        newRoot.balanceFactor = newRoot.balanceFactor + 1 + max(rootNode.balanceFactor, 0)

    # Right rotate includes symmetric operation to left rotate.
    def rightRotate(self, rootNode):
        # Swap the left child and root.
        newRoot = rootNode.leftChild
        rootNode.leftChild = newRoot.rightChild
        if newRoot.hasRightChild():
            newRoot.rightChild.parent = rootNode
        newRoot.parent = rootNode.parent
        if rootNode.isRoot():
            self.root = newRoot
        else:
            # Adjust the parent reference
            if rootNode.isLeftChild():
                rootNode.parent.leftChild = newRoot
            else:
                rootNode.parent.rightChild = newRoot
        newRoot.rightChild = rootNode
        rootNode.parent = newRoot
        rootNode.balanceFactor = rootNode.balanceFactor - 1 - max(newRoot.balanceFactor, 0)
        newRoot.balanceFactor = newRoot.balanceFactor + rootNode.balanceFactor - 1

class TreeNode:
    # Constructor for TreeNode
    def __init__(self,key,val,left=None,right=None,parent=None):
        self.key = key
        self.payload = val
        self.leftChild = left
        self.rightChild = right
        self.parent = parent
        self.balanceFactor = 0

    def hasLeftChild(self):
        return self.leftChild

    def hasRightChild(self):
        return self.rightChild

    def isLeftChild(self):
        return self.parent and self.parent.leftChild == self

    def isRightChild(self):
        return self.parent and self.parent.rightChild == self

    def isRoot(self):
        return not self.parent

    def isLeaf(self):
        return not (self.rightChild or self.leftChild)

    def hasAnyChildren(self):
        return self.rightChild or self.leftChild

    def hasBothChildren(self):
        return self.rightChild and self.leftChild

    def spliceOut(self):
        if self.isLeaf():
            if self.isLeftChild():
                self.parent.leftChild = None
            else:
                self.parent.rightChild = None
        elif self.hasAnyChildren():
            if self.hasLeftChild():
                if self.isLeftChild():
                    self.parent.leftChild = self.leftChild
                else:
                    self.parent.rightChild = self.leftChild
                self.leftChild.parent = self.parent
            else:
                if self.isLeftChild():
                    self.parent.leftChild = self.rightChild
                else:
                    self.parent.rightChild = self.rightChild
                self.rightChild.parent = self.parent

    # Find successor to current node.
    # The successor of a node is always the minimum node in its right subtree.
    def findSuccessor(self):
        succ = None # Initialize variable as None
        # If has right child
        if self.hasRightChild():
            # Its successor is the minimum node in right subtree.
            succ = self.rightChild.findMin()
        # Otherwise, find one in upper level.
        else:
            # Parent node is not none.
            if self.parent:
                   # itself is the left child of parent's, using parent node as successor
                   if self.isLeftChild():
                       succ = self.parent
                    # Otherwise
                   else:
                       # Break down the connection between itself and parent
                       self.parent.rightChild = None
                       # Find the successor from parent
                       succ = self.parent.findSuccessor()
                       # Reconnect
                       self.parent.rightChild = self
        return succ

    # Find the minimum node to the current.
    def findMin(self):
        current = self
        while current.hasLeftChild():
            current = current.leftChild
        return current

    # Replace current node data.
    def replaceNodeData(self,key,value,lc,rc):
        self.key = key
        self.payload = value
        self.leftChild = lc
        self.rightChild = rc
        if self.hasLeftChild():
            self.leftChild.parent = self
        if self.hasRightChild():
            self.rightChild.parent = self

    def __str__(self):
        return "Key: %s, "%(self.key) + "parent, left, right: %s, %s, %s"%(self.parent.key if self.parent is not None else -1, \
                                                                        self.leftChild.key if self.leftChild is not None else -1, \
                                                                            self.rightChild.key if self.rightChild is not None else -1) + \
                                                                            ", balance factor: %d"%self.balanceFactor

Using the module *unittest*, write unit tests for your class. To obtain full marks, you need to write unit tests which extensively cover all cases. We recommend using the module *random*. Note that this question will only be marked if you provide __both__ a functional program and unit tests. You will only receive marks for features which are implemented __and__ tested convincingly.

In [0]:
import unittest
import random
import string
random.seed(a=114.514)

class RegisterTestCase(unittest.TestCase):
    def setUp(self):
        self.register = Register(string.ascii_letters + string.digits)
        strList = []
        for i in range(100):
            strList.append(''.join(random.choice(string.ascii_letters + string.digits) for _ in range(1, int(random.random()*32)+2)))
        strList = list(set(strList))
        self.insertList = strList[:len(strList)//2]
        self.notInsertList = strList[len(strList)//2:]

    def test_non_empty(self):
        print("test non-empty string")
        # Test insert
        print("> Test insert\n")
        for s in self.insertList:
            if not self.register.insert(s) and not self.register.contain(s):
                print(">> insert {} failed.".format(s))
        # Test contain
        print("> Test contain\n")
        for s in self.insertList:
            if not self.register.contain(s):
                print(">> Test {} failed. It should belong to the register.".format(s))
        for s in self.notInsertList:
            if self.register.contain(s):
                print(">> Test {} failed. It should not belong to the register.".format(s))
        # Test remove
        print("> Test remove\n")
        for s in self.insertList:
            self.register.remove(s)
            if self.register.contain(s):
                print(">> Test {} failed. It should have been removed.".format(s))
        for s in self.notInsertList:
            if self.register.remove(s):
                print(">> Remove {} failed. You shouldn't delete a not inserted string.".format(s))
    
    def test_empty(self):
        print("test empty string")
        assert self.register.insert("") == False
        assert self.register.contain("") == False
        assert self.register.remove("") == False
        print()

testlist = RegisterTestCase()
suite = unittest.TestLoader().loadTestsFromModule(testlist)
unittest.TextTestRunner().run(suite)

.

test empty string

test non-empty string
> Test insert



.

> Test contain

> Test remove




----------------------------------------------------------------------
Ran 2 tests in 0.947s

OK


<unittest.runner.TextTestResult run=2 errors=0 failures=0>

## Q4.2 (3 marks)
Determine and prove the worst-case runtime complexity of each of the *core* operations of the register. 

### Insertion
Worst case: When the starting character on the node of outer entry tree does not contain any subtree in payload, it has to initializes it level by level.
1. Confirm all the characters of the string are contained by the alphabet. Since the alphabet is a AVL tree. Querying operation will cost $\log c$ in worst case for each character. In total, this step costs $O(k\log c)$.
2. Iterative operation for the entire string.
    1. Register find the entry for current character. This operation cost $O(\log c)$.
    2. Register increase the count for this character by 1. This step costs $O(1)$
    3. A new alphabet tree is created and attached to this node. By using deepcopy, this step cost $O(c)$. (Copy time is linear increase to the size of the target.)
3. Return the insertion result. This step costs $O(1)$.

The insertion operation costs $O(k\log c) + O(k\times (c+\log c + 1)) + O(1)$. **Therefore, the time complexity in worst case is $O(k\times c)$.**

### Searching
Register keeps track of the occurence of each character and inter-connection between two adjacent characters. 
Searching operation do the following steps iteratively and terminate when all characters are queried:
1. Register find the entry for current character. This operation cost $O(\log c)$.
2. Register checks whether the occurence times for this character at this node is zero or not. This step costs $O(1)$.
3. If it is non-zero, back to the start and check the next character. If not, return `False`. This step costs $O(1)$.

The worst case is when the string belongs to the register. It cost $O(k\times (\log c + 1 + 1))$. **Therefore, the time complexity in worst case is $O(k \log c)$.**

### Deleting
Before starting to delete the string, register confirms it contains this string, consuming $O(k\log c)$. If it contains, register do the following steps:
1. Register find the entry for current character. This operation cost $O(\log c)$.
2. Register checks whether the occurence times for this character at this node is zero or not. This step costs $O(1)$.
3. Decrease the occurence times for this character at this node by 1. If there is next character, back to step 1 for next character. Otherwise, return deletion result. $O(1)$.

**The worst case for deletion is when the string belongs to the register, costing doubled $2k \log c$. It is still in $O(k \log c)$.**