# Persistent Data Structures

#### Goncalo Pinto - fc58178


<div align="center">
<a href="#introduction"><kbd> <br>Introduction<br> </kbd></a>&ensp;&ensp;
<a href="#motivation"><kbd> <br>Motivation & historical background<br> </kbd></a>&ensp;&ensp;
<a href="#design"><kbd> <br>Design of the algorithms<br> </kbd></a>&ensp;&ensp;
<a href="#reference"><kbd> <br>References<br> </kbd></a>&ensp;&ensp;
</div>


---

### `setup`

In [160]:
# Install the pyrsistent library
!pip install pyrsistent

from pyrsistent import pvector, pmap


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


---
<a id="introduction"></a>
<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=Introduction" width="600"/>

In this chapter, we will analyze concepts that define a persistent data structure as well as present algorithmic tools that attempt to implement them.

## Persistent Data Structure

# Problem

A persistent data structure or **not ephemeral** data structure is a data structure that preserves the previous counters of itself when modified, allowing access to any historical counter. In other words, once a change is made to the structure, both the original and modified counters remain accessible. This is particularly useful in scenarios where you need to keep track of the history of updates or backtrack to previous states of the data structure.

Persistent data structures are usually related to functional and logical programming due to their nature of avoiding **mutable data**. Persistent data structures are **immutable**. Once written they can never change. Once you read it, there are guarantees that nothing can change its state. Any updates to the persistent data structures result in the creation of a new version, so any older states of the data stay intact.

There are multiple types of persistent data structures:

- A data structure is `partially persistent` if all counters can be accessed but only the newest counter can be modified. 
- `Fully persistent` if every counter can be both accessed and modified. 
- `Confluently persistent` is when we merge two or more counters to get a new counter. 

### Example of Persistent binary search tree

The code was taken from the article on Persistent Data Structures available at [GeeksforGeeks](https://www.geeksforgeeks.org/persistent-data-structures/).

Imagine a binary tree structure where instead of modifying the fields of a node whenever we want to update it, we create a new version of the same tree, keeping the older ones in memory. Due to that decision, we still have access to the previous versions of the tree, being able to revert to any state.

<center><img src='https://raw.githubusercontent.com/GoncaloP0710/Desenho-Anlise-Algoritmos/master/imgs/persistent-Tree.png' width=500px></center>

In [161]:
# Node class represents a node in the binary search tree
class Node:
    def __init__(self, key):
        self.key = key  # key value of the node
        self.left = None  # pointer to the left child
        self.right = None  # pointer to the right child
 
# BST class implements a binary search tree
class BST:
    def __init__(self):
        self.root = None  # root node of the tree
 
    # Function to create a new node with given key
    def create_node(self, key):
        return Node(key)
 
    # Function to insert a new key into the tree
    def insert(self, root, key):
        if root is None:
            return self.create_node(key)
        if key < root.key:
            root.left = self.insert(root.left, key)
        elif key > root.key:
            root.right = self.insert(root.right, key)
        return root
 
    # Function to create a copy of the tree
    def copy_tree(self, root):
        if root is None:
            return None
        new_node = self.create_node(root.key)
        new_node.left = self.copy_tree(root.left)
        new_node.right = self.copy_tree(root.right)
        return new_node
 
    # Function to create a persistent copy of the tree and insert a new key
    def make_persistent(self, key):
        new_bst = BST()
        new_bst.root = self.copy_tree(self.root)
        new_bst.root = new_bst.insert(new_bst.root, key)
        return new_bst
 
    # Function to print the keys of the nodes in ascending order
    def print_in_order(self, root):
        if root is not None:
            self.print_in_order(root.left)
            print(root.key, end=" ")
            self.print_in_order(root.right)


In [162]:
# Driver Code
if __name__ == "__main__":
    bst1 = BST()
    bst1.root = bst1.insert(bst1.root, 50)
    bst1.root = bst1.insert(bst1.root, 30)
    bst1.root = bst1.insert(bst1.root, 20)
    bst1.root = bst1.insert(bst1.root, 40)
    bst1.root = bst1.insert(bst1.root, 70)
    bst1.root = bst1.insert(bst1.root, 60)
    bst1.root = bst1.insert(bst1.root, 80)
 
    # Create a persistent copy of bst1 and insert a new key
    bst2 = bst1.make_persistent(55)
 
    # Print the keys of the nodes in bst1
    print("BST1: ", end="")
    bst1.print_in_order(bst1.root)
    print()
 
    # Print the keys of the nodes in bst2
    print("BST2: ", end="")
    bst2.print_in_order(bst2.root)
    print()

BST1: 20 30 40 50 60 70 80 
BST2: 20 30 40 50 55 60 70 80 


#### Persistent binary search trees: BST1 & BST2

BST1:
- This is the original binary search tree.
- The keys in the tree are [20, 30, 40, 50, 60, 70, 80].

BST2:
- This is a persistent copy of BST1 with an additional key 55 inserted.
- The keys in the tree are [20, 30, 40, 50, 55, 60, 70, 80].
- The new key 55 is inserted in the correct position to maintain the binary search tree property.

> This demonstrates how each version of the tree is immutable and builds upon the previous version.

### Example of Persistent List & Dictionary

In [163]:
# Immutable list
original_list = pvector([1, 2, 3])
new_list = original_list.append(4)

print("Original list:", original_list)
print("New list:     ", new_list)

# Immutable dictionary
original_dict = pmap({'a': 1, 'b': 2, 'c': 3})
new_dict = original_dict.set('d', 4)

print("Original dictionary:", original_dict)
print("New dictionary:     ", new_dict)

Original list: pvector([1, 2, 3])
New list:      pvector([1, 2, 3, 4])
Original dictionary: pmap({'a': 1, 'c': 3, 'b': 2})
New dictionary:      pmap({'d': 4, 'a': 1, 'c': 3, 'b': 2})


The code above shows a persistent list and dictionary. As you can see, each update creates a new version of the data structure. 

## Implementing Persistent Data Structures

In order to implement a well-structured persistent data structure, we first need to create algorithms to manage and control all of the existing versions in an optimized way in order to prevent excessive memory overhead.

- Structural Sharing
- Copy-on-Write
- Fat Nodes
- Path Copying

### Structural Sharing

In functional programming, it's possible to save a large amount of memory by employing structural sharing.

Structural sharing is an efficient technique used in data structures to build new versions of data without creating unchanged data again. It’s especially common in persistent (immutable) data structures. The basic idea of structural sharing is to reuse data instead of copying it once more.

> " It’s kind of similar to the way Git manages multiple versions of your source code: Git doesn’t copy all the files on each commit. Instead, the files that are not changed by a commit are shared with previous commits." (Yehonathan Sharvit, Developer. Author. Speaker.)

Path copying is usually efficient in terms of memory and computation because of the way the copies are done. When copying a node, only its reference is in fact being copied and not the entire object. This is called a shallow copy.

Path copying works fine with deeply nested data where at each nesting level we don’t have too many elements. When we have many elements at some level, shallow copying might be an issue. Suppose we have a million users in our system—copying a million references each time we update the password of a user won’t scale.
- The same issue occurs with Git if you have a folder with too many files.

<center><img src='https://raw.githubusercontent.com/GoncaloP0710/Desenho-Anlise-Algoritmos/master/imgs/structural_sharing.png' width=500px></center>

### Copy-on-Write

> Copy-on-Write is at its most useful when the data is mutable.

Copy-on-write, also called implicit sharing or shadowing, is a resource-management technique used to manage shared data efficiently. If multiple resources are working with the same data, this technique won’t let them create a new copy of the data, in order to save resources, and will force them to share the same memory initially. Instead, a copy is only created when a process needs to write something on the data. Only then, a private copy is done!

This approach is commonly used on forking operations and virtual memory.
- > Note: virtual memory abstracts physical RAM, giving each process its own virtual address space.

Copy-on-write can be implemented efficiently using the page table data structure. The page table maps virtual pages to physical frames. When a process is forked using the Copy-on-Write approach, the memory is not all copied and instead part of it becomes read-only. When a write operation takes place, the OS will get a page fault due to the write attempt on a read-only page and will check for a Copy-on-Write flag. Only then, a new physical copy is created and updated.


#### Pros
- Reduces unnecessary copying (memory wasted)
- Improves performance (less copying)


<center><img src='https://raw.githubusercontent.com/GoncaloP0710/Desenho-Anlise-Algoritmos/master/imgs/Copy-on-write-example.png' width=500px></center>

### The Fat Node Method

General theoretical schemes are known (e.g. the fat node method ) for making any data structure partially persistent. The fat node method as proposed by Driscoll[4] is used to transform an ephemeral structure into a partially persistent structure in which versions of fields are saved in the node itself without erasing old values of fields. Although the fat node method was originally described only for data structures in the pointer model of computation, it can be generalized as seen on the paper: Implementing Partial Persistence in Object-Oriented Languages[6].

> This means each field of a node must contain its corresponding version.

#### Pros
- Enables easy access to past versions of the data.
- No need to copy nodes.


<font size="+4" color="blue;green"><b>?</b></font> Wont the Node become a super complex structure the more updates are made? And wont that impact Structural Sharing?

Yes, if nothing is done the node will eventualy carry too many versions and make the Structural Sharing impossible to work well.

In that case, we must add another startagy -> `Node-Copying Method`
- Each node has a limit to the amount of fields it has
- When the limit is reached it must be copied (separated in 2 different nodes)


### Path copying

Path copying is a technique used to make data structures partially or fully persistent. The core idea is to create a copy of the path that leads to the node being updated. Once the copy is made, the root of the structure is updated to point to the new root, which reflects the changes. This approach allows us to preserve all previous versions of the data structure, enabling access to previous states.

To maintain consistency, it is crucial to ensure that all nodes pointing to the old node are updated to point to the newly created node. This guarantees that the structure remains valid and that no references to outdated nodes exist in the updated version.

Path copying can lead to increased memory usage, as multiple copies of the path are created for each update. To mitigate this, techniques like structural sharing can be employed to reuse unchanged parts of the structure, reducing memory overhead.

<center><img src='https://raw.githubusercontent.com/GoncaloP0710/Desenho-Anlise-Algoritmos/master/imgs/path-copying.png' width=500px></center>

## Partially persistent data structure

A partially persistent data structure is a data structure where all of its versions can be observed and accessed. However, only the latest version can be modified.This makes it particularly useful in scenarios where tracking changes over time is important.

#### Pros

- You have access to any previous version of the data.
- Easier to implement and more efficient than fully or confluently persistent structures.
- Extra memory needed is often limited.
- Offers worst-case guarantees, complexity of updates and queries are predictable.


## Fully persistent data structure

A fully persistent structure allows to observe and modify all of the versions that exist, where each update operation on a version creates a new branch from this version to the new one (kinda like github in a way).

In a fully persistent structure, if update operation i applies to version j < i, the result of the update is version i; version j is not changed by the update. We denote by n the number of nodes in the current version.

#### Pros

- Any version can be modified, not just the latest — each update creates a new branch in the version tree.
- Ideal for situations where different branches of computation must operate independently from the same data snapshot.
- Old versions remain unchanged, ensuring a safe enviroment to modify other versions.



## Confluently persistent data structure

A data structure is called confluently persistent if there is a meld operation that creates a new version from two previous versions so that branches in a version tree are joined and a version DAG (Directed Acyclic Graph) is formed.
- It’s acyclic because versions never point backward — no version can be its own ancestor.


Confluently Persistent Sets and Maps are functional binary search trees that support efficient set operations both when the structures are disjoint and when they are overlapping. Confluently persistent functional sets and maps can be implemented using avl trees, red-black, or treaps.

#### Pros

- Supports a meld operation: two or more past versions can be merged to create a new version
- Unlike full persistence (which forms a tree), confluent persistence allows merging branches into a directed acyclic graph (DAG) of versions.
- Ideal for use cases involving merging divergent states (distributed systems, version control...)


<a id="motivation"></a>

---

<a id="motivation&historicalbackground"></a>
<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=Motivation &" width="600"/>



The motivation for using and continuously improving persistent data structures lies in their ability to maintain previous states and support various operations on those states.

Some pratical exemples where this structures are optimaly used are:
- Version Control (ex: git)
- Undo/Redo Functionality (ex: text editor)


<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=Background" width="600"/>

The concept of persistent data structures was introduced in Driscoll, Sarnak, Sleator, and Tarjan's 1986 article "Making Data Structures Persistent".

The paper explores the advantages of using a data structure powerful enough to store previous stages of itself and compares it to an ephemeral one. It also exposed the 3 different types of persistence: Partial Persistence, Full Persistence, and Confluent Persistence.

The study introduced different techniques such as: the Fat Node Method, the Node-Copying Method, or the Node-Splitting Method.

Over the years, there has been significant progress in this area, leading to more efficient implementations of persistent data structures.

The paper titled "Partially Persistent Data Structures of Bounded Degree with Constant Update Time" by Gerth Stølting Brodal (1996) presents a method for making data structures partially persistent while guaranteeing constant worst-case time for updates and access operations — a significant improvement over earlier techniques that only offered amortized efficiency.



---

<a id="design"></a>
<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=Algorithms Design" width="600"/>

## Persistent Linked List

The persistent linked list contains Nodes that have pointers to the next Node and also may contain a version tracker. When we want to modify the list, instead of changing the existing nodes, we simply create new ones. When a new Node is created, it becomes the head of the list and points to the old head.

### Structure

The persistent linked list is based on the structure of a normal linked list. Every Node on the list has a pointer to the next Node. The list object contains a point5er to its head (first Node on the list).
- PersistentLinkedList
    - head (First Node of the List)
    - version (Version of the List)
- Node
    - value (Object the Node carry)
    - version (Version of the List on wich the Node was created)
    - next (Pointer to the next Node on the List)

### Operations

- Add: Insert a new Node on the List. 
- Remove: Remove a Node from the List.

### Limitations
- Consumes more memory than traditional linked lists due to multiple versions.
- More difficult to implement than traditional linked lists.
- Due to the need to create new Nodes, it may be slower than traditional linked lists.


<center><img src='https://raw.githubusercontent.com/GoncaloP0710/Desenho-Anlise-Algoritmos/master/imgs/Purely_functional_list_after.png' width=500px></center>

## Partially persistent data structure - Balanced search tree[2], [4]

This model can be implemented using a balanced binary search tree (bst) with a couple of changes. A binary tree is balanced if the height of the tree is O(Log n) where n is the number of nodes. 

First we need to implement the 'Fat Node' method wich states that all the changes to a node must be recorded and the old values must not be lost. That means that all the fields of a node must have a version stamp associated to it.

Secondly, in order to counter the limitation that a node becomes obselite after a change, the 'Node-Copying Method' was invented. It forces nodes to hold a specific number of atributes and once there is no more space for them, a new copy of the node is made with only the new field values. All the predecessors most contain a pointer for the new node and if they dont have space for the new pointer, they also must be copied. 

Finaly, the path copying property must be acomplished. It states that a copy of all the nodes in the same path as the one that suffered changes mut be copied and all the other ones that point to it must point to the new one that was created due to the changes made. 

### Structure

The persistent balanced search tree is based on the structure of a normal binary search tree. Every Node on the tree has a list of pointers and a list of fields.

- Field
    - name (Alias of the object)
    - value (Object the field carry)
    - version (Version of the field)
- FatNode
    - max_fields (Number of fields a Node can have at max)
    - fields (List of fields the Node has)
    - right (List of pointers to the next Node on the right)
    - left (List of pointers to the next Node on the left)
    - order (Index of the Node)
- PartiallyPersistentBST
    - max_fields (Number of fields a Node can have at max)
    - counter (Number of the current version of the tree)
    - nodes (Dictionary of the Nodes in the tree)
    - root (First node of the tree)

### Operations

- Insert (Insert a new FatNode into the tree)
- Find (Find the value of a field on the tree)
- Delete (Delete the latest version of the given field)
- IsBalanced (Check if tree is balanced)
- Rebalance (Rebalance the tree)
- Height (Calculate the height of the tree)

### Limitations

- You can’t modify old versions — only the latest one can be updated.
- New version created on every update, which can increase memory usage over time.

### Fully persistent data structure - Balanced search tree[4]

One of the problems of Fully Persistent compared to only Partially Persistent, where its versions have a natural linear order, is the fact that the order becomes partially ordered.

This problem can be addressed with a version list which can be represented by a tree. When a new version is created, a new node can be inserted after its parent.

The differences in the Fat Node method are the following ones:

- Versions of the node are now related to the version tree and no longer to a numeric value.

- The update now takes one more action. The new fat node is created after adding a new node to the version list.

As well as the Fat Node method, the Node Copying also changed to a different variant named Node-Splitting Method. It differs from the previous one because now only half the attributes of the node are sent to the new one, that way leaving space on both nodes (the old and the new).

### Limitations

- More difficult to implement than partial persistence
- Branching versions can lead to faster memory growth, especially if changes are frequent or deep.
- Multiple active branches may make the logic and debugging of versioned behavior more complex.
- Some operations on older versions may become slower (e.g., in ropes or trees) due to versioning overhead.

---

<a id="implementation"></a>
<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=Implementation" width="600"/>


# Persistent Linked List


In [164]:
# ==========================================================
# ----------------------- Node Class -----------------------
# ==========================================================

class Node:
    def __init__(self, value: object, version: int, next: 'Node' = None) -> None:
        """
        Initialize a Node with a value and a reference to the next node.
        :param value: The value stored in the node.
        :param version: The version of the node.
        :param next: A reference to the next node in the list.
        :return: A new Node object.
        """
        self.value = value
        self.version = version
        self.next = next

    def __repr__(self) -> str:
        """
        Return a string representation of the Node.
        :return: A string representation of the Node.
        """
        return f"Node({self.value}, {self.version}, {self.next})"


In [None]:
# ===========================================================
# ---------------- Persistent Linked List Class ------------
# ===========================================================

class PersistentLinkedList:
    def __init__(self, head: Node = None, version: int = 0) -> None:
        """
        Initialize a Persistent Linked List with a head node.
        :param head: The head node of the list.
        :param version: The version of the list.
        :return: A new PersistentLinkedList object.
        """
        self.head = head
        self.version = version

    def __repr__(self) -> str:
        """
        Return a string representation of the Persistent Linked List.
        :return: A string representation of the Persistent Linked List.
        """
        return self.head.__repr__() if self.head else "Empty List"
    
    def add(self, value: object) -> 'PersistentLinkedList':
        """
        Add a new value to the list and return a new version of the list.
        :param value: The value to be added.
        :return: A new PersistentLinkedList with the added value.
        """
        new_node = Node(value, self.version, self.head)
        return PersistentLinkedList(new_node, self.version + 1)

In [166]:
# =========================================================
# --------------------- Example Usage ---------------------
# =========================================================

pll = PersistentLinkedList()
pllV1 = pll.add(1)
pllV2 = pllV1.add(2)
pllV3 = pllV2.add(3)

print("pllV1:", pllV1)
print("pllV2:", pllV2)
print("pllV3:", pllV3)

pllV1: Node(1, 0, None)
pllV2: Node(2, 1, Node(1, 0, None))
pllV3: Node(3, 2, Node(2, 1, Node(1, 0, None)))


# Partially persistent data structure
- Fat-Node Method
- Node-Copying Method
- Balanced Binary Search Tree

This implementation is based on the paper: "Making Data Structures Persistent" [4]

In [None]:
from typing import Optional, List, Dict, Tuple
import hashlib

# =========================================================
# ---------------------- Field Class ----------------------
# =========================================================

class Field:
    def __init__(self, name: str, value: object, version: int):
        """
        Initialize a Field with a name, value, and version.
        :param name: The name of the field.
        :param value: The value of the field.
        :param version: The version of the field.
        :return: A new Field object.
        """
        self.name = name
        self.value = value
        self.version = version # satisfy the Fat-Node Method

    def __repr__(self):
        """
        Return a string representation of the Field.
        :return: A string representation of the Field.
        """
        return(
            f"Field(\n"
            f"  name={self.name},\n"
            f"  value={self.value},\n"
            f"  version={self.version},\n"
            f")"
        )
    

In [None]:
# =========================================================
# --------------------- FatNode Class ---------------------
# =========================================================

class FatNode:
    def __init__(self, max_fields: int, counter: int):
        """
        Initialize a FatNode with a maximum number of fields and a counter.
        :param max_fields: The maximum number of fields in the node.
        :param counter: The counter used to generate a unique identifier for the node.
        :return: A new FatNode object.
        """
        self.max_fields = max_fields # satisfy the Node-Copying Method
        self.fields = []
        self.right = []
        self.left = []
        self.order = custom_hash(counter) # used to decide where to insert in the tree

    def __repr__(self):
        """
        Return a string representation of the FatNode.
        :return: A string representation of the FatNode.
        """
        def format_children(children):
            return [{"name": field.name, "version": field.version} for child in children for field in child.fields]

        return (
            f"FatNode(\n"
            f"  max_fields={self.max_fields},\n"
            f"  fields={self.fields},\n"
            f"  right={format_children(self.right)},\n"
            f"  left={format_children(self.left)},\n"
            f")"
        )
    
    def _add_field(self, other):
        """
        Add a field to the FatNode. If a field with the same name and version already exists, it will be updated.
        :param other: The field to be added.
        :return: The updated FatNode.
        """
        if isinstance(other, Field):
            if self.fields is None:
                self.fields = [other]
            else:
                # Check if a field with the same name and version already exists
                for field in self.fields:
                    if field.name == other.name and field.version == other.version:
                        field.value = other.value
                        return
                if len(self.fields) >= self.max_fields:
                    raise OverflowError("No space left for new field in fat node.")
                self.fields.append(other)
            return self
        else:
            raise TypeError("Unsupported operand type(s) for +: 'FatNode' and '{}'".format(type(other).__name__))
        
    def add_fields(self, fields: List[Tuple[str, object]], counter: int):
        """
        Add multiple fields to the FatNode. If a field with the same name and version already exists, it will be updated.
        :param fields: A list of tuples containing the name and value of the fields to be added.
        :param counter: The counter used to generate a unique identifier for the node.
        :return: The updated FatNode.
        """
        for name, value in fields:
            field = Field(name, value, counter)
            self._add_field(field)
        return self
    
    def get_latest_child(self, right: bool) -> Optional['FatNode']: # used to traverse the tree
        """
        Get the latest child node based on the order attribute. If right is True, it will return the right child, otherwise the left child.
        :param right: A boolean indicating whether to return the right child or the left child.
        :return: The latest version of the child node.
        """
        children = self.right if right else self.left
        return max(children, key=lambda child: child.order, default=None)
    
def custom_hash(value: int) -> str:
    """
    Generate a SHA-256 hash for the given integer. Used to create a unique identifier for the node and improve performance.
    :param value: The integer to be hashed.
    :return: The SHA-256 hash of the integer as a hexadecimal string.
    """
    value_bytes = str(value).encode('utf-8')
    hash_object = hashlib.sha256(value_bytes)
    return hash_object.hexdigest()
    

In [None]:
# =========================================================
# ---------------- PartiallyPersistentBST Class -----------
# =========================================================

class PartiallyPersistentBST:

    def __init__(self, max_fields: int):
        """
        Initialize a Partially Persistent Binary Search Tree with a maximum number of fields.
        :param max_fields: The maximum number of fields in each node.
        :return: A new PartiallyPersistentBST object.
        """
        self.max_fields = max_fields
        self.counter = 0
        self.nodes: Dict[int, FatNode] = {}
        self.root: Optional[FatNode] = None

    def __repr__(self):
        """
        Return a string representation of the Partially Persistent Binary Search Tree.
        :return: A string representation of the Partially Persistent Binary Search Tree.
        """
        return (
            f"PartiallyPersistentBST(\n"
            f"  max_fields={self.max_fields},\n"
            f"  index={self.counter},\n"
            f"  nodes={self.nodes},\n" 
            f")"
        )
        
    def insert(self, fields: List[Tuple[str, object]]):
        """
        Insert a new FatNode into the tree.
        """
        if self.counter == 0:
            try:
                new_node = FatNode(self.max_fields, self.counter)
                new_node.add_fields(fields, self.counter)
                self.nodes[self.counter] = new_node
                self.root = new_node
                self.counter += 1
            except OverflowError:
                self._copy_node(fields)
        else:
            self._insert(self.root, fields, self.counter)
    
    def _insert (self, node: FatNode, fields: List[Tuple[str, object]], order: int):
        direction = 'left' if custom_hash(order) < node.order else 'right'
        is_right = direction == 'right'
        child_list = getattr(node, direction)

        if not child_list:
            try:
                new_node = FatNode(self.max_fields, order)
                new_node.add_fields(fields, order)
                child_list.append(new_node)
                self.nodes[self.counter] = new_node
                self.counter += 1
                self._rebalance()
                return
            except OverflowError:
                self._copy_node(fields)
        else:
            return self._insert(node.get_latest_child(is_right), fields, order)

    def find(self, name: str, version: int) -> Optional[object]:
        """
        Find the value of a field with the given name and version.
        :param name: The name of the field to be found.
        :param version: The version of the field to be found.
        :return: The value of the field if found, otherwise None.
        """
        if self.nodes:
            node = self._find(self.root, version)
            if node:
                for field in node.fields:
                    if field.name == name and field.version == version:
                        return field.value
        return None
    
    def _find(self, node: FatNode, version: int) -> Optional[FatNode]:
        if node:
            if node.order == custom_hash(version):
                return node
            elif node.order > custom_hash(version):
                return self._find(node.get_latest_child(False), version)
            else:
                return self._find(node.get_latest_child(True), version)
        return None
    
    def delete(self, name: str):
        """
        Delete a field with the given name. only the latest version will be deleted.
        :param name: The name of the field to be deleted.
        :return: None
        """
        if self.nodes:
            node = self._find(self.root, self.counter-1)
            if node:
                for field in node.fields:
                    if field.name == name:
                        node.fields.remove(field)
                        return
                    
    def _is_balanced(self) -> bool:
        """
        Check if the tree is balanced.
        A tree is balanced if the height difference between the left and right subtrees is at most 1.
        """
        if not self.root: # Tree is empty
            return True
        left_height = self._height(self.root.get_latest_child(False))
        right_height = self._height(self.root.get_latest_child(True))
        return abs(left_height - right_height) <= 1
    
    def _rebalance(self):
        """
        Rebalance the tree if it is unbalanced.
        """
        if not self._is_balanced():
            flat_nodes = self._flatten_tree(self.root)
            balanced_root = self._build_balanced_tree(flat_nodes)
            self.root = balanced_root

            new_node_map = {}
            self._rebuild_node_map(balanced_root, new_node_map)
            self.nodes = new_node_map


    def _build_balanced_tree(self, nodes: List[FatNode]) -> Optional[FatNode]:
        """
        Recursively build a balanced BST from sorted FatNodes.
        Got help from Co-Pilot to implement this method.
        """
        if not nodes:
            return None

        mid = len(nodes) // 2
        root = nodes[mid]

        # Clear child lists before rebuilding to avoid linking old state
        root.left = []
        root.right = []

        left_child = self._build_balanced_tree(nodes[:mid])
        right_child = self._build_balanced_tree(nodes[mid+1:])

        if left_child:
            root.left.append(left_child)
        if right_child:
            root.right.append(right_child)

        return root
    
    def _rebuild_node_map(self, node: Optional[FatNode], nodes_map: Dict[int, FatNode]):
        """
        Rebuild the node map with the latest version of each FatNode (cause the right and left child may be changed).
        """
        if node is None:
            return
        for field in node.fields:
            nodes_map[field.version] = node
        self._rebuild_node_map(node.get_latest_child(False), nodes_map)
        self._rebuild_node_map(node.get_latest_child(True), nodes_map)


    def _flatten_tree(self, node: Optional[FatNode]) -> List[FatNode]:
        """
        In-order traversal to flatten the tree into a list.
        Got help from Co-Pilot to implement this method.
        """
        if node is None:
            return []
        
        left = self._flatten_tree(node.get_latest_child(False))
        right = self._flatten_tree(node.get_latest_child(True))
        
        return left + [node] + right

    def _height(self, node: FatNode) -> int:
        """
        Calculate the height of the tree.
        The height of a tree is the number of edges on the longest path from the root to a leaf.
        """
        if node is None:
            return 0
        left_height = self._height(node.get_latest_child(False))
        right_height = self._height(node.get_latest_child(True))
        return max(left_height, right_height) + 1

    def _copy_node(self, fields: List[Tuple[str, object]]):
        """
        Node-Copying Method -> Used when a node is full of fields. Create more nodes to store the fields.
        """
        fields_divided = self._divide_fields(fields)
        if len(fields_divided[0]) > self.max_fields or len(fields_divided[1]) > self.max_fields:
            self._copy_node(fields_divided[0])
            self._copy_node(fields_divided[1])    
            return
        if len(fields_divided[0]) > 0:
            self.insert(fields_divided[0])
        if len(fields_divided[1]) > 0:
            self.insert(fields_divided[1])

    def _divide_fields(self, fields: List[Tuple[str, object]]) -> Tuple[Tuple[str, object], Tuple[str, object]]:
        """
        Divide the fields into two lists.
        """
        mid = len(fields) // 2
        return fields[:mid], fields[mid:]

In [170]:
# =========================================================
# --------------------- Example Usage ---------------------
# =========================================================

bst = PartiallyPersistentBST(max_fields=2)
bst.insert([("a", 10), ("b", 20), ("c", 30)])
bst.insert([("g", 80), ("h", 90), ("i", 100)])
bst.insert([("j", 110)])
bst.insert([("p", 170)])
bst.insert([("s", 200)])

print(bst)
print(bst.find("a", 0))
print(bst.find("p", 5))

print(f"Is the tree balanced? {bst._is_balanced()}")

PartiallyPersistentBST(
  max_fields=2,
  index=7,
  nodes={0: FatNode(
  max_fields=2,
  fields=[Field(
  name=a,
  value=10,
  version=0,
)],
  right=[{'name': 'g', 'version': 2}],
  left=[{'name': 'h', 'version': 3}, {'name': 'i', 'version': 3}],
), 3: FatNode(
  max_fields=2,
  fields=[Field(
  name=h,
  value=90,
  version=3,
), Field(
  name=i,
  value=100,
  version=3,
)],
  right=[],
  left=[{'name': 'j', 'version': 4}],
), 4: FatNode(
  max_fields=2,
  fields=[Field(
  name=j,
  value=110,
  version=4,
)],
  right=[],
  left=[],
), 2: FatNode(
  max_fields=2,
  fields=[Field(
  name=g,
  value=80,
  version=2,
)],
  right=[{'name': 'p', 'version': 5}],
  left=[{'name': 'b', 'version': 1}, {'name': 'c', 'version': 1}],
), 1: FatNode(
  max_fields=2,
  fields=[Field(
  name=b,
  value=20,
  version=1,
), Field(
  name=c,
  value=30,
  version=1,
)],
  right=[],
  left=[],
), 5: FatNode(
  max_fields=2,
  fields=[Field(
  name=p,
  value=170,
  version=5,
)],
  right=[],
  left=[

<a id="analysis"></a>
<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=Complexity analysis" width="600"/>



# Persistent Linked List

The add method of the data structure has a complexity of O(1) because it does not traverse or modify the existing list, it simply creates a new version.

The only method that has a different complexity is the __repr__ with O(n) due to the fact it traverses the entire list.

# Partially persistent data structure - Balanced  Binary Search Tree

Based on the paper that inspired the structure, the fat node method in the worst case presents an O(log n) time and an O(1) space complexity per update. The node-copying method is said to have a worst case of O(1) amortized and O(1) in time and space complexity, respectively.

The insert method can have multiple cases of a difference in its complexity. In the best case, the tree is empty and there will be no need to use node-copying, then it will have a time complexity of O(1). The worst case would involve node-copying and balancing the tree.

To check if the tree is balanced, the time complexity of the operation is O(n), and in the best case O(1) if the tree is empty. This is because the function _height traverses every node no matter what.

If the tree needs to be balanced, the time complexity of the operation would be O(n). Firstly, it uses the function _flatten_tree that performs an in-order traversal of the tree, meaning a complexity of O(n). Secondly, it calls the _build_balanced_tree function that is also O(n). Finally, the _rebuild_node_map function traverses the whole tree again to update the node_map, meaning it has a complexity of O(n) as well.

The best case for the find and delete operation is if the node is root. If that's the case, the complexity will be O(1). If not, then we will have a complexity of O(log n).

Finally, the copy_node operation must divide the fields and then insert nodes into the tree. The division of the fields is made with a time of O(1). For every copy of the node, the tree will have to be traversed (O(log n)), and in the worst case, balance the tree (O(n)). 

<a id="exercise"></a>
<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=Exercise" width="600"/>


# Debugging

Because a partially persistent data structure allows us to access previous states, we can, for example, store different versions of the same variable and implement a rudimentary debugging tool.

In [171]:
import random

def test_function(counter, max_calls=100):
    """
    A recursive function that randomly decrements the counter but stops after 100 calls.
    """
    if max_calls == 0:
        return 0
    else:
        decrement = random.randint(0, 20)
        return test_function(counter - decrement, max_calls - 1)


In [172]:
import sys

bst = PartiallyPersistentBST(max_fields=2)

def trace(frame, event, arg):
    if event == "line" and frame.f_lineno == 10:
        bst.insert([(str(frame.f_lineno), frame.f_locals)])
    return trace

def run_with_trace(func):
    sys.settrace(trace)
    try:
        result = func()
    finally:
        sys.settrace(None)
    return result

run_with_trace(lambda: test_function(10))

# 10 represents the line number and 60 represents the version (the counter)
print(bst.find("10", 60))

{'counter': -574, 'max_calls': 40, 'decrement': 2}



---

<a id="references"></a>
<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=References" width="600"/>


### Articles and Tutorials
1. [Introduction to Persistent Data Structures](https://arpitbhayani.me/blogs/persistent-data-structures-introduction/) - A beginner-friendly overview of persistent data structures.
2. [Partial Persistence](https://sungsoo.github.io/2014/01/18/partial-persistence.html) - A concise explanation of partial persistence.
3. [Introduction to Balanced Binary Tree](https://www.geeksforgeeks.org/balanced-binary-tree/) - A beginner-friendly overview of Balanced Binary Trees

### Research Papers
4. [Making Data Structures Persistent](https://www.cs.cmu.edu/~sleator/papers/making-data-structures-persistent.pdf) - The foundational paper by Driscoll, Sarnak, Sleator, and Tarjan (1986) introducing persistent data structures.
5. [Partially Persistent Data Structures of Bounded Degree with Constant Update Time](https://www.cs.au.dk/~gerth/papers/njc96.pdf) - Gerth Stølting Brodal's seminal paper on partially persistent data structures.
6. [Partial Persistence in Object-Oriented Languages](https://fpluquet.be/Publications_&_Talks_files/Implementing%20Partial%20Persistence%20in%20Object-Oriented%20Languages.pdf) - The paper presents a way to making data structures partially persistent in object-oriented programming languages, specifically Java.
7. [Persistence in Data Structures](https://arxiv.org/pdf/1301.3388) - A modern exploration of persistence in data structures.
8. [MIT Advanced Algorithms Lecture Notes](https://ocw.mit.edu/courses/6-854j-advanced-algorithms-fall-2005/2165d83010dc7633bce397ea75f889f9_lec05_1999.pdf) - Lecture notes discussing persistence in data structures.


### Online Guides
9. [USACO Guide: Persistent Data Structures](https://usaco.guide/adv/persistent?lang=cpp) - A practical guide for competitive programming.

### Encyclopedic Resources
10. [Wikipedia: Persistent Data Structure](https://en.wikipedia.org/wiki/Persistent_data_structure) - A general overview of persistent data structures.

### Additional Resources
11. [Lirias Repository](https://lirias.kuleuven.be/retrieve/19369) - A collection of academic resources on persistence.
