# Persistent Data Structures

#### Goncalo Pinto - fc58178


<div align="center">
<a href="#introduction"><kbd> <br>Introduction<br> </kbd></a>&ensp;&ensp;
<a href="#motivation"><kbd> <br>Motivation & historical background<br> </kbd></a>&ensp;&ensp;
<a href="#design"><kbd> <br>Design of the algorithms<br> </kbd></a>&ensp;&ensp;
<a href="#reference"><kbd> <br>References<br> </kbd></a>&ensp;&ensp;
</div>


---
<a id="introduction"></a>
<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=Introduction" width="600"/>

A persistent data structure is a data structure that preserves the previous counters of itself when modified, allowing access to any historical counter. In other words, once a change is made to the structure, both the original and modified counters remain accessible. This is particularly useful in scenarios where you need to keep track of the history of updates or backtrack to previous states of the data structure.

- A data structure is `partially persistent` if all counters can be accessed but only the newest counter can be modified. 

- `Fully persistent` if every counter can be both accessed and modified. 

- `Confluently persistent` is when we merge two or more counters to get a new counter.

These types of data structures are particularly common in logical and functional programming, as languages in those paradigms discourage (or fully forbid) the use of mutable data. 


## Partially persistent data structure

A partially persistent data structure is a data structure in which old versions are remembered and can always be inspected. However, only the latest version of the data structure can be modified.

General theoretical schemes are known (e.g. the fat node method ) for making any data structure partially persistent.

#### Pros

- You can query any previous version of the data structure
- Simpler to implement and more efficient than fully or confluently persistent structures.
- Number of versions and extra memory needed is often limited.
- Worst-case guarantees, updates and queries are predictable.


## Fully persistent data structure

A fully persistent structure offers accesses to its previous versions for queries and updates, where each update operation on a version of the data structure creates a new branch from this version for the new version.

In a fully persistent structure, if update operation i applies to version j < i, the result of the update is version i; version j is not changed by the update. We denote by n the number of nodes in the current version.

#### Pros

- You can update any version, not just the latest — each update creates a new branch in the version tree.
- Ideal for situations where different branches of computation must operate independently from the same data snapshot.
- Old versions remain unchanged, ensuring reproducibility and safety.



## Confluently persistent data structure

A data structure is called confluently persistent if there is a meld operation that creates a new version from two previous versions so that branches in a version tree are joined and a version DAG is formed.

Confluently Persistent Sets and Maps are functional binary search trees that support efficient set operations both when operands are disjoint and when they are overlapping.

#### Pros

- Supports a meld operation: you can combine two or more past versions to create a new version
- Unlike full persistence (which forms a tree), confluent persistence allows merging branches into a directed acyclic graph (DAG) of versions.
- Ideal for use cases involving merging divergent states (distributed systems, version control...)
- Excellent for multi-user environments or simulations that need to branch and later reconcile state.



<a id="motivation"></a>

---

<a id="motivation&historicalbackground"></a>
<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=Motivation &" width="600"/>



The motivation on the use and continuous work around 
Persistent data structures lies on the ability of maintaining previous states and enabling diferent operations on those.

Some pratical exemples where this structures are optimaly used are:
- Version Control (ex: git)
- Undo/Redo Functionality (ex: text editor)


<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=Background" width="600"/>

The concept of persistent data structure was introduced in Driscoll, Sarnak, Sleator, and Tarjan's 1986 article "Making Data Structures Persistent". 

The paper explores the advantages of using a data structure powerfull enghou to store previous stages of it self and compares it to a ephemeral one. It also exposed the 3 different types of persistence: Partial Persistence, Full Persistence and Confluent Persistence.

The study intruduced different thecniques such as: The Fat Node Method, The Node-Copying Method or the The Node-Splitting Method.

Over the years, there has been significant progress in this area, leading to more efficient implementations of persistent data structures.

The paper titled "Partially Persistent Data Structures of Bounded Degree with Constant Update Time" by Gerth Stølting Brodal (1996) presents a method for making data structures partially persistent while guaranteeing constant worst-case time for updates and access operations — a significant improvement over earlier techniques that only offered amortized efficiency.



---

<a id="design"></a>
<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=Algorithms Design" width="600"/>

## Partially persistent data structure

This model can be implemented using a balanced binary search tree (bst) with a couple of changes. A binary tree is balanced if the height of the tree is O(Log n) where n is the number of nodes. [2]

First we need to implement the 'Fat Node' method wich states that all the changes to a node must be recorded and the old values must not be lost. That means that all the fields of a node must have a version stamp associated to it. [2] [4]

Secondly, in order to counter the limitation that a node becomes obselite after a change, the 'Node-Copying Method' was invented. It forces nodes to hold a specific number of atributes and once there is no more space for them, a new copy of the node is made with only the new field values. All the predecessors most contain a pointer for the new node and if they dont have space for the new pointer, they also must be copied. [4]

Finaly, the path copying property must be acomplished. It states that a copy of all the nodes in the same path as the one that suffered changes mut be copied and all the other ones that point to it must point to the new one that was created due to the changes made. [2]

#### Limitations

- You can’t modify old versions — only the latest one can be updated.
- New version created on every update, which can increase memory usage over time.
- Requires extra logic for version tracking. Compared to regular (non-persistent) data structures, persistence adds code and logic complexity.

## Fully persistent data structure

One of the problems of Fully persistent compared to only Partially persistent where its versions have a natural linear order is the fact that the order becomes partially ordered.

This problem can be adressed with a version list wich can be represented by a tree. When a new version is created, a new node can be inserted after its parent.

The differences in the Fat node method are the following ones:
- Versions of the node are now related to the version tree and no longer to a numeric value.
- The update now takes one more action. The new fat node is created after adding a new node to the version list.

As well as the Fat node method, the node copying also changed to a diferent varient named Node-Splitting Method. It differs from the previous one cause now only half the atributes of the node are sent to the new one, that way leaving space on both nodes (the old and the new).

#### Limitations

- More difficult to implement than partial persistence
- Branching versions can lead to faster memory growth, especially if changes are frequent or deep.
- Multiple active branches may make the logic and debugging of versioned behavior more complex.
- Some operations on older versions may become slower (e.g., in ropes or trees) due to versioning overhead.

## Confluently persistent data structure



#### Limitations

- Requires careful tracking of merged nodes and resolution of conflicts — significant algorithmic overhead.
- If two merged versions changed the same data, you need a way to resolve which value wins.
- Tracking and combining overlapping histories can result in significant duplication.
- Ensuring structural invariants hold after merges adds implementation complexity.

---

<a id="implementation"></a>
<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=Implementation" width="600"/>


# Partially persistent data structure
- Fat-Node Method
- Node-Copying Method

In [497]:
from typing import Optional, List, Dict, Tuple
import hashlib

# =========================================================
# ---------------------- Field Class ----------------------
# =========================================================

class Field:
    def __init__(self, name: str, value: object, version: int):
        self.name = name
        self.value = value
        self.version = version # satisfy the Fat-Node Method

    def __repr__(self):
        return(
            f"Field(\n"
            f"  name={self.name},\n"
            f"  value={self.value},\n"
            f"  version={self.version},\n"
            f")"
        )

    def __eq__(self, other):
        return isinstance(other, Field) and self.name == other.name and self.version == other.version and self.value == other.value

    def __hash__(self):
        return hash((self.name, self.version, self.value))
    

In [498]:
# =========================================================
# --------------------- FatNode Class ---------------------
# =========================================================

class FatNode:
    def __init__(self, max_fields: int, counter: int):
        self.max_fields = max_fields # satisfy the Node-Copying Method
        self.fields = []
        self.right = []
        self.left = []
        self.order = (counter) # used to decide where to insert in the tree
        self.copy_pointer = []

    def __repr__(self):
        def format_children(children):
            return [
                {"name": field.name, "version": field.version}
                for child in children
                for field in child.fields
            ]

        return (
            f"FatNode(\n"
            f"  max_fields={self.max_fields},\n"
            f"  fields={self.fields},\n"
            f"  right={format_children(self.right)},\n"
            f"  left={format_children(self.left)},\n"
            f")"
        )

    def __eq__(self, other):
        return isinstance(other, FatNode) and self.max_fields == other.max_fields and self.fields == other.fields and self.right == other.right and self.left == other.left
    
    def __hash__(self):
        return hash((self.max_fields, tuple(self.fields), self.right, self.left))
    
    def add_field(self, other):
        if isinstance(other, Field):
            if self.fields is None:
                self.fields = [other]
            else:
                # Check if a field with the same name and version already exists
                for field in self.fields:
                    if field.name == other.name and field.version == other.version:
                        field.value = other.value
                        return
                if len(self.fields) >= self.max_fields:
                    raise OverflowError("No space left for new field in fat node.")
                self.fields.append(other)
            return self
        else:
            raise TypeError("Unsupported operand type(s) for +: 'FatNode' and '{}'".format(type(other).__name__))
        
    def add_fields(self, fields: List[Tuple[str, object]], counter: int):
        for name, value in fields:
            field = Field(name, value, counter)
            self.add_field(field)
        return self
    
    def get_latest_child(self, right: bool) -> Optional['FatNode']: # used to traverse the tree
        children = self.right if right else self.left
        return max(children, key=lambda child: child.order, default=None)
    
def custom_hash(value: int) -> str:
    """
    Generate a SHA-256 hash for the given integer.
    """
    value_bytes = str(value).encode('utf-8')
    hash_object = hashlib.sha256(value_bytes)
    return hash_object.hexdigest()
    

In [None]:
# =========================================================
# ---------------- PartiallyPersistentBST Class -----------
# =========================================================

class PartiallyPersistentBST:

    def __init__(self, max_fields: int):
        self.max_fields = max_fields
        self.counter = 0
        self.nodes: Dict[int, FatNode] = {}
        self.root: Optional[FatNode] = None

    def __repr__(self):
        return (
            f"PartiallyPersistentBST(\n"
            f"  max_fields={self.max_fields},\n"
            f"  index={self.counter},\n"
            f"  nodes={self.nodes},\n" 
            f")"
        )
        
    def insert(self, fields: List[Tuple[str, object]]):
        """
        Insert a new FatNode into the tree.
        """
        if self.counter == 0:
            try:
                new_node = FatNode(self.max_fields, self.counter)
                new_node.add_fields(fields, self.counter)
                self.nodes[self.counter] = new_node
                self.root = new_node
                self.counter += 1
                print(f"Node {new_node.order} added to the root, with fields {new_node.fields}")
            except OverflowError:
                self._copy_node(fields, self.counter)
        else:
            self._insert(self.root, fields, self.counter)
    
    def _insert (self, node: FatNode, fields: List[Tuple[str, object]], order: int):
        direction = 'left' if custom_hash(order) < node.order else 'right'
        is_right = direction == 'right'
        child_list = getattr(node, direction)

        if not child_list:
            try:
                new_node = FatNode(self.max_fields, order)
                new_node.add_fields(fields, order)
                child_list.append(new_node)
                self.nodes[self.counter] = new_node
                self.counter += 1
                print(f"Node {new_node.order} added to the {direction} of {node.order}, with fields {new_node.fields}")
                self._rebalance()
                return
            except OverflowError:
                self._copy_node(fields, order)
        else:
            return self._insert(node.get_latest_child(is_right), fields, order)

    def find(self, name: str, version: int) -> Optional[object]:
        """
        Find the value of a field with the given name and version.
        """
        if self.nodes:
            node = self._find(self.root, version)
            if node:
                for field in node.fields:
                    if field.name == name and field.version == version:
                        return field.value
        return None
    
    def _find(self, node: FatNode, version: int) -> Optional[FatNode]:
        if node:
            if node.order == custom_hash(version):
                return node
            elif node.order > custom_hash(version):
                return self._find(node.get_latest_child(False), version)
            else:
                return self._find(node.get_latest_child(True), version)
        return None
    
    def delete(self, name: str):
        """
        Delete a field with the given name and version.
        """
        if self.nodes:
            node = self._find(self.root, self.counter-1)
            if node:
                for field in node.fields:
                    if field.name == name:
                        node.fields.remove(field)
                        print(f"Field {field.name} deleted from node {node.order}")
                        return
                    
    def _is_balanced(self) -> bool:
        if not self.root:
            return True  # An empty tree is balanced
        print(f"Checking balance for node {self.root.order}...")
        left_height = self._height(self.root.get_latest_child(False))
        right_height = self._height(self.root.get_latest_child(True))
        print(f"Left height: {left_height}, Right height: {right_height}")
        return abs(left_height - right_height) <= 1

    
    def _rebalance(self):
        """
        Rebalance the tree if it is unbalanced.
        """
        if not self._is_balanced():
            print("Tree is unbalanced. Rebalancing now...")
            flat_nodes = self._flatten_tree(self.root)
            balanced_root = self._build_balanced_tree(flat_nodes)
            self.root = balanced_root

            new_node_map = {}
            self._rebuild_node_map(balanced_root, new_node_map)
            self.nodes = new_node_map

            print("Tree rebalanced.")
        else:
            print("Tree is balanced.")


    def _build_balanced_tree(self, nodes: List[FatNode]) -> Optional[FatNode]:
        """
        Recursively build a balanced BST from sorted FatNodes.
        """
        if not nodes:
            return None

        mid = len(nodes) // 2
        root = nodes[mid]

        # Clear child lists before rebuilding to avoid linking old state
        root.left = []
        root.right = []

        left_child = self._build_balanced_tree(nodes[:mid])
        right_child = self._build_balanced_tree(nodes[mid+1:])

        if left_child:
            root.left.append(left_child)
        if right_child:
            root.right.append(right_child)

        return root
    
    def _rebuild_node_map(self, node: Optional[FatNode], nodes_map: Dict[int, FatNode]):
        if node is None:
            return
        for field in node.fields:
            nodes_map[field.version] = node
        self._rebuild_node_map(node.get_latest_child(False), nodes_map)
        self._rebuild_node_map(node.get_latest_child(True), nodes_map)


    def _flatten_tree(self, node: Optional[FatNode]) -> List[FatNode]:
        """
        In-order traversal to flatten the tree into a list.
        """
        if node is None:
            return []
        
        left = self._flatten_tree(node.get_latest_child(False))
        right = self._flatten_tree(node.get_latest_child(True))
        
        return left + [node] + right

    def _height(self, node: FatNode) -> int:
        if node is None:
            return 0
        left_height = self._height(node.get_latest_child(False))
        right_height = self._height(node.get_latest_child(True))
        return max(left_height, right_height) + 1

    def _copy_node(self, fields: List[Tuple[str, object]], version: int):
        """
        Node-Copying Method -> Used when a node is full of fields. Create more nodes to store the fields.
        """
        fields_divided = self._divide_fields(fields)
        if len(fields_divided[0]) > self.max_fields or len(fields_divided[1]) > self.max_fields:
            self._copy_node(fields_divided[0], self.counter)
            self._copy_node(fields_divided[1], self.counter)    
            return
        if len(fields_divided[0]) > 0:
            self.insert(fields_divided[0])
        if len(fields_divided[1]) > 0:
            self.insert(fields_divided[1])

    def _update_pointers(self, starting_version: int):
        ...

    def _divide_fields(self, fields: List[Tuple[str, object]]) -> Tuple[Tuple[str, object], Tuple[str, object]]:
        """
        Divide the fields into two lists.
        """
        mid = len(fields) // 2
        return fields[:mid], fields[mid:]

In [500]:
# =========================================================
# --------------------- Example Usage ---------------------
# =========================================================

bst = PartiallyPersistentBST(max_fields=1)
bst.insert([("a", 10), ("b", 20), ("c", 30)])
bst.insert([("g", 80), ("h", 90), ("i", 100)])
bst.insert([("j", 110)])
bst.insert([("p", 170)])
bst.insert([("s", 200)])

print(bst)
print(bst.find("a", 0))

print(f"Is the tree balanced? {bst._is_balanced()}")

Node 0 added to the root, with fields [Field(
  name=a,
  value=10,
  version=0,
)]
Node 1 added to the right of 0, with fields [Field(
  name=b,
  value=20,
  version=1,
)]
Checking balance for node 0...
Left height: 0, Right height: 1
Tree is balanced.
Node 2 added to the right of 1, with fields [Field(
  name=c,
  value=30,
  version=2,
)]
Checking balance for node 0...
Left height: 0, Right height: 2
Tree is unbalanced. Rebalancing now...
Tree rebalanced.
Node 3 added to the right of 2, with fields [Field(
  name=g,
  value=80,
  version=3,
)]
Checking balance for node 1...
Left height: 1, Right height: 2
Tree is balanced.
Node 4 added to the right of 3, with fields [Field(
  name=h,
  value=90,
  version=4,
)]
Checking balance for node 1...
Left height: 1, Right height: 3
Tree is unbalanced. Rebalancing now...
Tree rebalanced.
Node 5 added to the right of 4, with fields [Field(
  name=i,
  value=100,
  version=5,
)]
Checking balance for node 2...
Left height: 2, Right height: 2
Tr

# Persistent Rope

<a id="analysis"></a>
<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=Complexity analysis" width="600"/>



# PartiallyPersistentBST Class

<a id="exercise"></a>
<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=Exercise" width="600"/>



---

<a id="references"></a>
<img src="https://readme-typing-svg.herokuapp.com?font=Lexend+Giga&size=25&pause=1000&color=CCA9DD&vCenter=true&width=350&height=25&lines=References" width="600"/>


### Articles and Tutorials
1. [Introduction to Persistent Data Structures](https://arpitbhayani.me/blogs/persistent-data-structures-introduction/) - A beginner-friendly overview of persistent data structures.
2. [Partial Persistence](https://sungsoo.github.io/2014/01/18/partial-persistence.html) - A concise explanation of partial persistence.
3. https://www.geeksforgeeks.org/balanced-binary-tree/

### Research Papers
4. [Making Data Structures Persistent](https://www.cs.cmu.edu/~sleator/papers/making-data-structures-persistent.pdf) - The foundational paper by Driscoll, Sarnak, Sleator, and Tarjan (1986) introducing persistent data structures.
5. [Partially Persistent Data Structures of Bounded Degree with Constant Update Time](https://www.cs.au.dk/~gerth/papers/njc96.pdf) - Gerth Stølting Brodal's seminal paper on partially persistent data structures.
6. [Fully Persistent Lists with Catenation](https://www.cs.cmu.edu/~sleator/papers/another-persistence.pdf) - A paper discussing fully persistent lists and their applications.
7. [Persistence in Data Structures](https://arxiv.org/pdf/1301.3388) - A modern exploration of persistence in data structures.
8. [MIT Advanced Algorithms Lecture Notes](https://ocw.mit.edu/courses/6-854j-advanced-algorithms-fall-2005/2165d83010dc7633bce397ea75f889f9_lec05_1999.pdf) - Lecture notes discussing persistence in data structures.


### Online Guides
9. [USACO Guide: Persistent Data Structures](https://usaco.guide/adv/persistent?lang=cpp) - A practical guide for competitive programming.

### Encyclopedic Resources
10. [Wikipedia: Persistent Data Structure](https://en.wikipedia.org/wiki/Persistent_data_structure) - A general overview of persistent data structures.

### Additional Resources
11. [Lirias Repository](https://lirias.kuleuven.be/retrieve/19369) - A collection of academic resources on persistence.
