# Serialization / Deserialization of Binary Search Tree

- Serialization is the process of converting a data structure or object into a sequence of bits so that it can be stored in a file or memory buffer, or transmitted across a network connection link to be reconstructed later in the same or another computer environment.

- Design an algorithm to serialize and deserialize a binary search tree. There is no restriction on how your serialization/deserialization algorithm should work. You just need to ensure that a binary search tree can be serialized to a string and this string can be deserialized to the original tree structure.

- The encoded string should be **as compact as possible**.

- Note: Do not use class member/global/static variables to store states. Your serialize and deserialize algorithms should be stateless.

- Note: Although not explicitly stated in the original leetcode problem, the values of the nodes in the BST should all be integers!

## Serialization Strategy

- The basic approach is to parse the BST in "pre-order" fashion: root, left, right

- To accomplish this parsing, the following 4 steps are repeated until the whole tree is traversed:

  1. each node is added to a stack (for backtracking down the right path later)
  
  2. The value stored in each node is added to a "results" list as the tree is parsed
  
  3. Keep going down the left branch (collecting values along the way), until you run out of nodes
  
  4. Then backtrack to the last non-null node, and go one step down the right path
  
- Once the tree traversal is completed, the values are all converted into strings for serialization

- Note: to make the serialized string "as compact as possible", a compression algorithm is applied to the integers as they are encoded into strings

  - Instead of just storing the numbers in decimal format [0-9], a custom alphabet is used for encoding. 
  
  - The default alphabet includes the numerical digits, along with all letters [A-Z] and [a-z], giving a base 62 encoding
    
  - Negative numbers are handled with a leading '-' sign
  
  - Modular division is used to convert the integer into the encoding string, with the remainder of each modular division step translated into one of the characters in codec.ALPHABET
  
  - The string order is reversed so that it could be "readable" from left to right (as most numbers are). This step could be skipped for a bit more optimization if desired...
  
  - NOTE: A custom alphabet may be defined as an optional argument when constructing the serialization codec

- After the list of integers is converted (mapped) into a list of encoded strings, these are joined together with a delimiter (default is " ", but this may also be customized when constructing the codec)

## Deserialization Strategy

- First, the string is split into a list of string using the delimiter (default " ")

- Each string is decoded back into an integer using "codec.my_decoder"

  - Basically, you loop through the string in referse order
  
  - Convert each character back to a number using a dictionary called "codec.REVERSE_ALPHABET"
  
  - Take the BASE (length of the ALPHABET) to the power of the place number, and multiply it by the character number
  
  - Add these all up to get the final resulting number
  
- The root of the tree is created using the first number in the list

- A stack is created for backtracking once we have reached the limit down the left branch of the tree

- Then the rest of the tree is created by looping through the remaining numbers, using the following steps:

  1. If the current number is less than the value stored in the current node (starts at the root):
  
    - Add the current node to the stack (so we can come back here later)
   
    - Make a new node to the left, holding the current value
   
    - Move to that node
  
  2. If the current number is greater (or equal) to the value stored in the current node:
  
    - Start popping nodes of the stack until we get the last node with a value less than the current value
  
    - Make a new node to the right of that last node
    
    - Move to that node

- Once the whole tree has been created (we run out of numbers in our list), then return a reference to the root node

In [1]:
# Definition for a binary tree node.
class TreeNode(object):
     def __init__(self, x):
         self.val = x
         self.left = None
         self.right = None

       
        
        
class Codec:
    
    def __init__(
            self, 
            alphabet = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz",
            delimiter = " ",
            ):
        
        self.ALPHABET = alphabet
        self.REVERSE_ALPHABET = {char:index for index,char in enumerate(self.ALPHABET)}
        self.BASE = len(self.ALPHABET)
        self.DELIMITER = delimiter
        

    def my_encoder(self,n):
        if n == 0:
            return self.ALPHABET[0].encode('ascii')
        stack = []
        negative = ''
        if n < 0:
            n = -n
            negative = '-'

        while n > 0:
            quotient, remainder = divmod(n, self.BASE)
            stack.append(self.ALPHABET[remainder])
            n = quotient

        return negative+''.join(stack)[::-1] # .encode('ascii')

    def my_decoder(self,s):
        if s is None or s == '':
            return 0

        negative = False

        if '-' == s[0]:
            negative = True
            s = s[1:]

        result = 0

        for place, char in enumerate(s[::-1]):
            #print('place: '+str(place)+', value: '+str(char))
            result += self.BASE**place * self.REVERSE_ALPHABET[char]

        return -result if negative else result

    def serialize(self, root):
        """Encodes a tree to a single string.
           The tree will be traversed in "pre-order" fashion:
           (meaning root->left->right at each stage)
           The results will be recorded directly as strings
           with spaces (" ") between each number
        
        :type root: TreeNode
        :rtype: str
        """
        # keep a stack of nodes we have traversed past
        # so that we can backtrack when we reach the end of a branch
        stack = []
        # keep track of the values of all of the nodes we have traversed
        result = []
        # keep track of the current node in the tree (starting at the root)
        node = root
        
        
        # as long as the current node is not null
        # or there are still nodes left on the stack
        while node or stack:
            # if the current node is not null
            if node:
                # add the current node's value
                result.append(node.val)
                stack.append(node)
                node = node.left
            else:
                node = stack.pop().right
        
        return self.DELIMITER.join( list( map( self.my_encoder, result ) ) )

    def deserialize(self, data):
        """Decodes your encoded data to tree.
        
        :type data: str
        :rtype: TreeNode
        """
        
        if not data:
            return
        
        nums = list( map( self.my_decoder, data.split(self.DELIMITER) ) )
        
        return self.make_tree(nums)
            
    def make_tree(self,numbers):
        
        # make the root
        root = node = TreeNode(numbers[0])
        stack = []
        
        for val in numbers[1:]:
            
            if val < node.val:
                node.left = TreeNode(val)
                stack.append(node)
                node = node.left
            else:
                while stack and stack[-1].val < val:
                    node = stack.pop()
                
                node.right = TreeNode(val)
                
                node = node.right
                
        return root

In [3]:
# Your Codec object will be instantiated and called as such:
codec = Codec()
#codec.deserialize( 
print(codec.serialize( codec.make_tree([20,110,30]) ) )

K 1m U


In [4]:
x = -62*62*62+1
x

-238327

In [5]:

a = codec.my_encoder(x)
a

'-zzz'

In [6]:
b = codec.my_decoder(a)
b

-238327