# Disjoint Set Union (DSU) / Union-Find

### Learning Objective
By the end of this notebook, you should be able to:
1.  Implement **Disjoint Set Union** with **Path Compression** and **Union by Rank/Size**.
2.  Solve connectivity problems efficiently (near constant time).
3.  Apply DSU to complex merging problems like **Accounts Merge**.

---

### Conceptual Notes

**1. What is DSU?**
A data structure that tracks elements partitioned into a number of disjoint (non-overlapping) sets. It provides near-constant time operations to:
*   **Find:** Determine which set an element belongs to.
*   **Union:** Join two sets together.

**2. Optimizations (Crucial!)**
*   **Path Compression:** When finding the parent of node `X`, point `X` directly to the ultimate root. Next time, lookup is O(1).
*   **Union by Rank/Size:** When merging two sets, attach the *smaller* tree to the *larger* tree. This keeps tree height minimal.

**3. Complexity**
`O(alpha(n))` which is effectively **O(1)** for all practical values of n.

---

### Core Task 1: DSU Implementation
Build a robust `DisjointSet` class.

In [None]:
class DisjointSet:
    def __init__(self, n):
        # TODO: Initialize 'parent' array where parent[i] = i.
        # TODO: Initialize 'size' array with 1s (or 'rank' with 0s).
        self.parent = list(range(n + 1))
        self.size = [1] * (n + 1)

    def find(self, node):
        # TODO: Path Compression.
        # If node == parent[node]: return node
        # Else: parent[node] = find(parent[node])
        # Return parent[node]
        if node == self.parent[node]:
            return node
        self.parent[node] = self.find(self.parent[node])
        return self.parent[node]

    def union(self, u, v):
        # TODO: Find root_u and root_v.
        # If root_u == root_v: return False (Already connected).
        
        # TODO: Union by Size/Rank.
        # If size[root_u] < size[root_v]:
        #     Attach u to v. Update size[v].
        # Else:
        #     Attach v to u. Update size[u].
        
        return True
        

### Core Task 2: Number of Operations to Make Network Connected (LeetCode 1319)
You have `n` computers and `connections`. Return min operations to connect all computers.
*   **Insight:** You need at least `n-1` cables to connect `n` nodes. If `len(connections) < n-1`, impossible (-1).
*   **Logic:** Count the number of disconnected components `k`. You need `k-1` cables to connect them. Do we have enough *extra* cables?
*   Actually simpler: If total cables >= n-1, the answer is just `Number of Components - 1`.

In [None]:
def makeConnected(n, connections):
    """
    Return min operations or -1.
    """
    if len(connections) < n - 1:
        return -1
    
    dsu = DisjointSet(n)
    components = n
    
    # TODO: Iterate connections.
    # If dsu.union(u, v) is True (successful merge):
    #    components -= 1
    
    # Answer is components - 1 (Need edges to bridge components).
    return components - 1

### Core Task 3: Accounts Merge (LeetCode 721)
Given list of accounts `[Name, Email1, Email2...]`, merge accounts that share an email.
*   **Challenge:** Strings not integers. Map each unique email to an ID or handle mapping logic.
*   **Logic:**
    1.  Map every email to the **index** of the account (or a group ID).
    2.  If an email appears in Account A and Account B, `union(A, B)`.
    3.  Group emails by their Root Parent.
    4.  Format output.

In [None]:
def accountsMerge(accounts):
    """
    accounts: List[List[str]]
    """
    # TODO: 1. Build DSU for 'n' accounts (indices 0 to n-1).
    n = len(accounts)
    dsu = DisjointSet(n)
    
    # TODO: 2. Map Email -> Account Index.
    # email_map = {}
    # For i, account in enumerate(accounts):
    #    For email in account[1:]:
    #        If email in email_map:
    #            dsu.union(i, email_map[email])
    #        Else:
    #            email_map[email] = i
    
    # TODO: 3. Group emails by Root Parent.
    # merged_emails = defaultdict(list)
    # For email, owner_idx in email_map.items():
    #    root = dsu.find(owner_idx)
    #    merged_emails[root].append(email)
    
    # TODO: 4. Format Output [Name, sorted(emails)].
    return []

In [None]:
# --- TEST CELL ---
print("Testing DSU Logic...")
ds = DisjointSet(5)
ds.union(1, 2)
ds.union(2, 3)
assert ds.find(1) == ds.find(3), "Transitive property failed"
assert ds.find(1) != ds.find(4), "Disjoint property failed"

print("Testing Make Connected...")
# 4 computers. 0-1, 0-2, 1-2. (Cycle 0-1-2). 3 is isolated.
# Cables: 3. Needed: 3. Components: 2 ({0,1,2}, {3}).
# Expected ops: 1 (Move 1-2 to 1-3).
ops = makeConnected(4, [[0,1],[0,2],[1,2]])
assert ops == 1, f"Expected 1 op, got {ops}"

print("Testing Accounts Merge...")
accs = [
    ["John", "johnsmith@mail.com", "john00@mail.com"],
    ["John", "johnnybravo@mail.com"],
    ["John", "johnsmith@mail.com", "john_new York@mail.com"],
    ["Mary", "mary@mail.com"]
]
# Expected: John's 1st and 3rd merge. John 2nd distinct. Mary distinct.
res_accs = accountsMerge(accs)
if res_accs: 
    assert len(res_accs) == 3, f"Expected 3 merged accounts, got {len(res_accs)}"

print("âœ… Tests Ready")