From f911a0d7d34d0561a52a49134686d0b32fc32a1e Mon Sep 17 00:00:00 2001
From: SaketaChalamchala <saketa.chalamchala@gmail.com>
Date: Fri, 22 May 2026 13:30:11 -0700
Subject: [PATCH 1/3] HDDS-9154. Design doc for snapshot diff optimization.

---
 .../docs/content/design/efficient-snapdiff.md | 234 ++++++++++++++++++
 1 file changed, 234 insertions(+)
 create mode 100644 hadoop-hdds/docs/content/design/efficient-snapdiff.md

diff --git a/hadoop-hdds/docs/content/design/efficient-snapdiff.md b/hadoop-hdds/docs/content/design/efficient-snapdiff.md
new file mode 100644
index 00000000000..722ede189b7
--- /dev/null
+++ b/hadoop-hdds/docs/content/design/efficient-snapdiff.md
@@ -0,0 +1,234 @@
+# Snapshot Diff Improvement POC - Technical Design Document
+
+## 1. Introduction
+This document outlines the technical design, architectural choices, and algorithmic improvements to optimize Ozone's Snapshot Diff feature. The design addresses performance bottlenecks in both the **Full Diff** and **DAG-based Diff** paths. The primary goals are to reduce random I/O, minimize CPU overhead from deserialization, and streamline the classification of differences.
+
+ ## Goals
+ - Reduce random I/O.
+ - Minimize CPU cost of deserializing KeyInfo and DirectoryInfo for comparisons.
+ - Keep baseline diff semantics for CREATE/DELETE/RENAME/MODIFY where possible.
+
+---
+
+## 2. Core Design Choices & Optimizations
+
+### 2.1. Sequential Reads & Table Iterators
+**Baseline Issue:** Baseline full diff enumerates keys via SST readers (plus per-key `db.get` lookups), and the DAG-based diff relies heavily on random point lookups (`db.get()`) against the snapshot RocksDB instances to fetch the old and new states of keys identified in the delta SST files. For buckets with millions of keys, this random I/O degrades performance and thrashes the OS page cache.
+**Optimized Design:** The optimization shifts mostly to sequential reads. For the Full Diff path, it uses native RocksDB **Table Iterators** to scan the entire `directoryTable` and `fileTable` sequentially. For the DAG-based path, it uses a **K-way Merge Iterator** over the delta SST files to sequentially extract the latest visible versions without needing to query the main snapshot DBs. This sequential I/O pattern maximizes disk throughput and cache efficiency.
+
+### 2.2. Lightweight Parsing
+**Baseline Issue:** The baseline implementation fully deserializes `OmKeyInfo` and `OmDirectoryInfo` protobuf messages to compare objects, which is extremely CPU and memory intensive when scanning millions of keys.
+**Optimized Design:** Introduces a lightweight `SnapshotDiffValueParser` that reads the raw protobuf byte stream directly. It extracts only the required fields (like `updateID`, `parentID`, `name` and compare signature fields) without instantiating full Java objects. It dynamically builds a compare signature by hashing only meaningful fields (content-change: latest block layout, size, `fileChecksum` and metadata-change: ACLs, metadata, tags), skipping volatile fields like `modificationTime` or `creationTime` to identify modified entries.
+
+#### Pseudo-code: Selective Parsing and Signature
+```java
+ParsedObjectInfo parseRequiredKeyInfo(byte[] raw, boolean meaningfulOnly) {
+  ParsedObjectInfo parsed = new ParsedObjectInfo();
+  CodedInputStream input = CodedInputStream.newInstance(raw);
+  while (!input.isAtEnd()) {
+    int tag = input.readTag();
+    switch (WireFormat.getTagFieldNumber(tag)) {
+    case KEYINFO_OBJECT_ID_FIELD:
+      parsed.setObjectId(input.readUInt64());
+      break;
+    case KEYINFO_PARENT_ID_FIELD:
+      parsed.setParentId(input.readUInt64());
+      break;
+    case KEYINFO_KEY_NAME_FIELD:
+      parsed.setName(input.readString());
+      break;
+    case KEYINFO_UPDATE_ID_FIELD:
+      parsed.setUpdateId(input.readUInt64());
+      break;
+    default:
+      input.skipField(tag);
+      break;
+    }
+  }
+  return parsed;
+}
+
+ParsedObjectInfo parseSignatureKeyInfo(byte[] raw, boolean meaningfulOnly) {
+  ParsedObjectInfo parsed = new ParsedObjectInfo();
+  CodedInputStream input = CodedInputStream.newInstance(raw);
+  while (!input.isAtEnd()) {
+    int tag = input.readTag();
+    switch (WireFormat.getTagFieldNumber(tag)) {
+    case KEYINFO_METADATA_FIELD:
+    case KEYINFO_ACLS_FIELD:
+    case KEYINFO_TAGS_FIELD:
+    case KEYINFO_FILE_CHECKSUM_FIELD:
+        updateSignature(tag, input, parsed);
+      break;
+    case KEYINFO_BLOCK_LOCATIONS_FIELD:
+        updateSignature(extractLatestBlockInfo(tag, input), parsed);
+    default:
+      input.skipField(tag);
+      break;
+    }
+  }
+  return parsed;
+}
+```
+
+### 2.3. UpdateID Gating
+**Baseline Issue:** The baseline performs full object comparisons including timestamps to detect modifications, which is susceptible to clock skew and is computationally expensive.
+**Optimized Design:** Uses the `dbTxSequenceNumber` of the `fromSnapshot` as a strict gate. During the `toSnapshot` scan, entries are only considered candidates for diff if their `updateID > fromSnapshotDbTxSequenceNumber`.
+
+### 2.4. Deferred Classification & Path Resolution
+**Baseline Issue:** Baseline builds the diff key set first and then classifies entries during `generateDiffReport`, which requires resolving paths for all candidates. This causes unnecessary path lookups for entries that might ultimately be ignored.
+**Optimized Design:** Diff classification is strictly deferred to the final **Merge Join** stage. Path resolution is also deferred until an entry is definitively classified as a diff. This prevents wasting I/O and CPU on resolving paths for entries that might ultimately be ignored or unchanged.
+
+### 2.5. Batch Puts to Snapshot Diff DB
+**Baseline Issue:** Writing intermediate lists and final diff reports often relies on individual RocksDB `put` operations, incurring high JNI overhead.
+**Optimized Design:** The design advocates for using RocksDB `WriteBatch` operations. By batching writes to the `snap-diff-report-table` and intermediate `PersistentList`/`PersistentMap` structures, we significantly improve write throughput and reduce disk sync overhead.
+
+### 2.6. Delete Report Consistency
+**Baseline Issue:** With baseline full diff, deleting a directory emits `DELETE` entries for the directory but reports sub-directories and sub-files inconsistently depending on how far deep cleaning of the `toSnapshot` progressed. In DAG-based diff, only the deleted directory and any sub-directory/sub-file that was explicitly deleted before the top-level directory are reported. For the same snapshots, diff output can vary based on timing (before vs after deep cleaning) or mode (full diff vs DAG-based diff).
+**Optimized Design:** Only top level deleted directories are reported. This keeps diff results stable regardless of snapshot deep cleaning and which diff path was used.
+
+### 2.7. Dependency Ordered Reporting
+**Baseline Issue:** With baselines, diff report entries are ordered by diff type, `DELETES` are reported first followed by `RENAMES, CREATES, MODIFIES` in order. When the report is replayed this order does not safely cover all scenarios.
+
+For example, 
+* Snapshot 1 has file `A/B` and directory `C`. 
+* Snapshot 2 renames `A/B` to `C/B` and deletes directory `A`. 
+* The diff entries are `RENAME A/B -> C/B` and `DELETE A`. 
+If deletes are replayed first, `A/B` is removed before the rename and the rename fails. The correct replay order is `RENAME A/B -> C/B` followed by `DELETE A`.
+
+**Optimized Design:** Ensure the report can be replayed safely by ordering entries based on their dependencies rather than their diff type.
+
+**Dependency Rules:**
+1. Parents must appear before children for `CREATE/RENAME/MODIFY`.
+2. Children must appear before parents for `DELETE`.
+3. If a rename or create targets a path that is being deleted, the delete must come first.
+4. If a rename frees a source path that is re-created in the same diff, the rename must come first.
+
+**Building the dependency graph:**
+- Each diff entry becomes a node in a directed graph.
+- Add edges using the rules above:
+  - For hierarchy ordering, add edges from parent to child for `CREATE/RENAME/MODIFY`.
+  - For deletes, use the same parent-child edges but emit them in reverse order later.
+  - For path conflicts, add edges from the delete node to the rename/create node that reuses the deleted path, and from rename to create if the rename frees a path that is re-created.
+
+**Emitting entries using the graph:**
+- Run Kahn's algorithm on the graph to produce a topological order for `CREATE/RENAME/MODIFY`.
+- Emit all `CREATE/RENAME/MODIFY` entries in that order (parents before children, and conflict edges respected).
+- Emit `DELETE` entries in reverse topological order (children before parents) so deletes do not remove parents before their children.
+
+**Note on OBS Buckets:** Since OBS buckets lack a directory hierarchy, dependency ordering simplifies to path-conflict rules (Rules 3 and 4), ensuring renames and deletes occur in the correct sequence to avoid collisions or missing sources.
+
+---
+
+## 3. Data Structures and Algorithms
+ 
+- **oldList/newList maps**: `PersistentMap<byte[], byte[]>` keyed by `objectId`, storing `EntryValue` (`parentId`, `name`, `isDir`, `signature`).
+- **Directory path lookup**:
+  - **Persisted BFS**: RocksDB CFs for edges storing `(parentID, objectID) -> name` and resolved paths `objectID -> fullPath`, with an LRU cache for hot path lookups.
+ - **DiffCandidateSet**: `Set<objectIds>/Set<keys>` whose `updateID` exceeds `fromSnapshot.dbTxSequenceNumber`.
+ - **SHA-256 Hashing:** Used to generate compact, fixed-size compare signatures for object metadata.
+- **Delete retention sets (full diff only):** `deletedDirSet` and `deletedRootSet` to suppress redundant deletes.
+- **Dependency ordering graph:** adjacency list of `objectId -> children`, in-degree map, and a queue of zero in-degree nodes for Kahn's algorithm.
+- **Raw SST iterators**: `ManagedRawSSTFileIterator` yielding `(userKey, sequence, type, value)` tuples including tombstones used during DAG based diff delta SST scan.
+- **K-way merge heap**: Min-heap ordered by `(userKey ASC, sequence DESC)` to dedupe to the latest visible version per userKey. It guarantees $O(N \log K)$ time complexity for $N$ keys across $K$ SST files, ensuring sequential disk I/O.
+
+---
+
+## 4. DAG-Based Diff POC Implementation Stages
+
+The DAG-based diff optimizes the process by only looking at SST files that changed between snapshots. It identifies the set of SST files that differ between `fromSnapshot` and `toSnapshot` using the `RocksDBCheckpointDiffer` (compaction DAG).
+
+### Stage 1: Sequential Read Flow + Batched Point Lookups + Directory Scans
+**Baseline Issue:** Baseline reads these delta files and then performs random reads against the snapshot DBs to find the old/new state of the keys, causing severe I/O bottlenecks.
+**Optimized Design (in order):**
+1. **Sequential scan of `toSnapshot` diff SSTs:** Use native iterators (`ManagedRawSSTFileIterator`) and a K-way merge to scan the delta SSTs **only in `toSnapshot`**. This yields the latest visible versions for changed keys and populates `newList` (for non-tombstones) plus the `DiffCandidateSet` (all tombstones + all keys with `updateID > fromSnapshotDbTxSequenceNumber`).
+2. **Full table scan of `toSnapshot.directoryTable` (FSO only):** Use `tableIterator` to scan all directory entries sequentially and populate `jobId-to-edges`.
+3. **Full table scan of `fromSnapshot.directoryTable` (FSO only):** Use `tableIterator` to scan all directory entries sequentially.
+   * Populate `jobId-from-edges` with `(parentId, objectId) -> name`.
+   * For directory objectIds that are in the `DiffCandidateSet`, populate `oldList` (build signatures using the value read from the table).
+4. **Batch point lookups of `fromSnapshot.file/keyTable`:** Use `multiGet` for keys in the `DiffCandidateSet` that correspond to files and populate `oldList`.
+
+### Stage 2: Merge Join & Classification
+A synchronized sequential iteration (merge join) is performed over the `oldList` and `newList` based on `objectID`. Since `oldList` and `newList` are backed by RocksDB the iteration is ordered by the key `objectID`.
+*   **Only in `newList`** → `CREATE`
+*   **Only in `oldList`** → `DELETE`
+*   **In both lists**:
+    *   If `parentId` or `name` differs → `RENAME`
+    *   If signatures differ → `MODIFY`
+    *   Else → ignore
+
+### Stage 3: Deferred BFS with Early Stop + Dependency ordering + Final Write
+1. **Run persisted BFS (FSO only)** to resolve paths only for diff entries:
+   * Resolve CREATE + RENAME paths from `jobId-to-edges`.
+   * Resolve RENAME + MODIFY + DELETE paths from `jobId-from-edges`.
+   * Stop once all diff entries are resolved or the entire directory tree is traversed.
+   * Remove entries with unresolvable paths from diff lists
+3. **Write dependency ordered report to table**
+   * Build a dependency graph described in Section 2.7 using `parentId` for resolved entries.
+   * Write the topologically sorted report to reportTable.
+
+---
+
+## 5. Full Diff POC Implementation Stages
+
+The Full Diff path is used when compaction DAGs are unavailable or a full recalculation is forced.
+
+### Stage 1: Sequential Table Scanning & Filtering
+Instead of random lookups, the optimization uses native RocksDB **Table Iterators** to sequentially scan the `directoryTable` and `fileTable` of both snapshots while deferring path resolution until after classification.
+
+**1. `toSnapshot` Directory Scan (FSO only):**
+*   Iterates sequentially through the `toSnapshot`'s `directoryTable`.
+*   Extracts `updateID` using the lightweight parser. If `updateID <= fromSnapshot.dbTxSequenceNumber`, the entry is unchanged (not created/renamed/modified) and is skipped. Otherwise, its compare signature is built and it is added to the `newList` and recorded in the `DiffCandidateSet`.
+*   **Graph Construction:** Regardless of whether the entry is a candidate, the `parentID` and `name` are extracted to build the foundational edges of the `toSnapshot` directory structure graph. This is done by writing `(parentID, objectID) -> name` entries into a temporary RocksDB Column Family (`jobId-to-edges`).
+
+**2. `fromSnapshot` Directory Scan (FSO only):**
+*   Iterates sequentially through the `fromSnapshot`'s `directoryTable`.
+*   Only processes entries whose `objectID` is in `DiffCandidateSet` during the `toSnapshot` scan. Adds these to the `oldList`.
+*   **Graph Construction:** Extracts `parentID` and `name` for all entries to build the `fromSnapshot` directory structure graph by writing to another temporary Column Family (`jobId-from-edges`).
+
+**3. `toSnapshot` Key Scan:**
+*   Iterates sequentially through the `toSnapshot`'s `key/fileTable`.
+*   Applies the same `updateID` gating logic: skips if `updateID <= fromSnapshot.dbTxSequenceNumber`.
+*   Builds the compare signature and adds to `newList`, recording these entries in the `DiffCandidateSet`. No parentID/path checks are performed at this stage.
+
+**4. `fromSnapshot` Key Scan:**
+*   Iterates sequentially through the `fromSnapshot`'s `key/fileTable`.
+*   Only builds compare signature for entries whose `objectID` was marked in `DiffCandidateSet` during the `toSnapshot` file scan. Adds these to the `oldList`.
+
+
+### Stage 2: Merge Join & Classification
+Same as Stage 2 of DAG based diff implementation.
+ 
+
+### Stage 3: Top level delete retention (FSO only)
+After merge join, 
+* Build `deletedDirSet` for deleted directories.
+* Compute `deletedRootSet` by removing any directory whose parent is also deleted
+* Only report delete entries for the directories in `deletedRootSet`
+
+
+### Stage 4: Deferred BFS with Early Stop + Dependency ordering + Final Write
+Same as Stage 3 of DAG based diff implementation.
+
+---
+
+## 6. Comparison with Baseline & Trade-offs
+
+| Feature | Baseline Implementation | POC Implementation |
+| :--- | :--- | :--- |
+| **Object Parsing** | Full Protobuf Deserialization (Heavy CPU/GC). | `SnapshotDiffValueParser` (Lightweight byte-stream parsing). |
+| **Modification Detection** | Full object equality. | Strict `updateID` gating & selective field hashing. |
+| **DAG Diff I/O Pattern** | Random point lookups (`db.get()`) for delta keys. | Sequential reads with K-way merge of SST files. |
+| **Classification Timing** | During report generation. | Deferred until merge join. |
+| **Path Resolution** | During report generation for all candidates. | Deferred to diff entries only. |
+| **Delete Handling** | Emits deletes of descendants inconsistently. | Retains only top level directory deletes, dependency ordered. |
+| **Report Ordering** | Naive ordering based on Diff Type. | Dependency ordered with Kahn's algorithm. |
+
+### Trade-offs
+1.  **Reliance on `updateID`:** The POC's speed in Full Diff relies heavily on `updateID`. If Ozone has bugs where `updateID` is not bumped during a meaningful metadata change (e.g., parent directory `modificationTime` updates during a child rename), the POC will miss the modification. Baseline catches this via full comparison, albeit much slower.
+2.  **K-way Merge Memory Overhead:** While the DAG POC eliminates random I/O, maintaining a Priority Queue for K-way merging requires slightly more active memory and CPU comparison logic than simple iteration, though this is vastly outweighed by the I/O savings.
+3.  **Signature Collisions:** Hash-based comparison assumes no SHA-256 collisions. While statistically negligible, baseline's exact object equality has zero collision risk.
+4.  **Dependency Ordering Overhead:** Building and topologically sorting the dependency graph adds some CPU and memory overhead, especially for large delete sets.
+
+## 7. Conclusion
+The POC implementation represents a shift from a compute-and-I/O-heavy approach to a streamlined, sequential, and deferred-evaluation model. By utilizing `SnapshotDiffValueParser` and `updateID` gating, CPU cycles and Garbage Collection pauses are drastically reduced. By replacing random reads in the DAG diff with a sequential K-way merge, disk I/O bottlenecks are eliminated. Deferred path resolution, batch RocksDB puts, and dependency ordered output ensure that resources are only spent on actual differences and replay remains consistent. Despite trade-offs around `updateID` reliance and graph ordering overhead, the POC provides a scalable and accurate snapshot diff engine suitable for massive buckets.
\ No newline at end of file

From 2f5b30b09c2f85982a1670338a8a2cc4165eec0c Mon Sep 17 00:00:00 2001
From: SaketaChalamchala <saketa.chalamchala@gmail.com>
Date: Wed, 27 May 2026 10:49:15 -0700
Subject: [PATCH 2/3] HDDS-9154. Added license

---
 .../docs/content/design/efficient-snapdiff.md | 22 +++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/hadoop-hdds/docs/content/design/efficient-snapdiff.md b/hadoop-hdds/docs/content/design/efficient-snapdiff.md
index 722ede189b7..03d62fb4e6b 100644
--- a/hadoop-hdds/docs/content/design/efficient-snapdiff.md
+++ b/hadoop-hdds/docs/content/design/efficient-snapdiff.md
@@ -1,4 +1,22 @@
-# Snapshot Diff Improvement POC - Technical Design Document
+---
+title: Snapshot Diff Optimization
+summary: Describe proposal for an optimized snapshot diff that uses mostly sequential reads and batch puts
+date: 2025-05-22
+jira: HDDS-9154
+status: draft
+author: Saketa Chalamchala
+---
+<!--
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+   http://www.apache.org/licenses/LICENSE-2.0
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
 
 ## 1. Introduction
 This document outlines the technical design, architectural choices, and algorithmic improvements to optimize Ozone's Snapshot Diff feature. The design addresses performance bottlenecks in both the **Full Diff** and **DAG-based Diff** paths. The primary goals are to reduce random I/O, minimize CPU overhead from deserialization, and streamline the classification of differences.
@@ -231,4 +249,4 @@ Same as Stage 3 of DAG based diff implementation.
 4.  **Dependency Ordering Overhead:** Building and topologically sorting the dependency graph adds some CPU and memory overhead, especially for large delete sets.
 
 ## 7. Conclusion
-The POC implementation represents a shift from a compute-and-I/O-heavy approach to a streamlined, sequential, and deferred-evaluation model. By utilizing `SnapshotDiffValueParser` and `updateID` gating, CPU cycles and Garbage Collection pauses are drastically reduced. By replacing random reads in the DAG diff with a sequential K-way merge, disk I/O bottlenecks are eliminated. Deferred path resolution, batch RocksDB puts, and dependency ordered output ensure that resources are only spent on actual differences and replay remains consistent. Despite trade-offs around `updateID` reliance and graph ordering overhead, the POC provides a scalable and accurate snapshot diff engine suitable for massive buckets.
\ No newline at end of file
+The POC implementation represents a shift from a compute-and-I/O-heavy approach to a streamlined, sequential, and deferred-evaluation model. By utilizing `SnapshotDiffValueParser` and `updateID` gating, CPU cycles and Garbage Collection pauses are drastically reduced. By replacing random reads in the DAG diff with a sequential K-way merge, disk I/O bottlenecks are eliminated. Deferred path resolution, batch RocksDB puts, and dependency ordered output ensure that resources are only spent on actual differences and replay remains consistent. Despite trade-offs around `updateID` reliance and graph ordering overhead, the POC provides a scalable and accurate snapshot diff engine suitable for massive buckets.

From 11b8b86da512db904c94893a81443390868d0ad5 Mon Sep 17 00:00:00 2001
From: SaketaChalamchala <saketa.chalamchala@gmail.com>
Date: Wed, 27 May 2026 16:29:09 -0700
Subject: [PATCH 3/3] HDDS-9154. Updated gating rules.

---
 .../docs/content/design/efficient-snapdiff.md | 42 ++++++++++---------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/hadoop-hdds/docs/content/design/efficient-snapdiff.md b/hadoop-hdds/docs/content/design/efficient-snapdiff.md
index 03d62fb4e6b..6f646c12474 100644
--- a/hadoop-hdds/docs/content/design/efficient-snapdiff.md
+++ b/hadoop-hdds/docs/content/design/efficient-snapdiff.md
@@ -89,9 +89,11 @@ ParsedObjectInfo parseSignatureKeyInfo(byte[] raw, boolean meaningfulOnly) {
 }
 ```
 
-### 2.3. UpdateID Gating
+### 2.3. Sequence/UpdateID Gating
 **Baseline Issue:** The baseline performs full object comparisons including timestamps to detect modifications, which is susceptible to clock skew and is computationally expensive.
-**Optimized Design:** Uses the `dbTxSequenceNumber` of the `fromSnapshot` as a strict gate. During the `toSnapshot` scan, entries are only considered candidates for diff if their `updateID > fromSnapshotDbTxSequenceNumber`.
+**Optimized Design:** Use snapshot-specific gates that align with the transactional guarantees of the deployment mode.
+- **Full diff (w/ OM HA only):** `updateID > fromSnapshot.lastTransactionInfo.txIndex`. This compares two OM/Ratis log indices.
+- **DAG diff:** Extend raw SST iterators to expose internal sequence numbers, gate with `entry.sequence > fromSnapshot.dbTxSequenceNumber`.
 
 ### 2.4. Deferred Classification & Path Resolution
 **Baseline Issue:** Baseline builds the diff key set first and then classifies entries during `generateDiffReport`, which requires resolving paths for all candidates. This causes unnecessary path lookups for entries that might ultimately be ignored.
@@ -143,8 +145,8 @@ If deletes are replayed first, `A/B` is removed before the rename and the rename
 - **oldList/newList maps**: `PersistentMap<byte[], byte[]>` keyed by `objectId`, storing `EntryValue` (`parentId`, `name`, `isDir`, `signature`).
 - **Directory path lookup**:
   - **Persisted BFS**: RocksDB CFs for edges storing `(parentID, objectID) -> name` and resolved paths `objectID -> fullPath`, with an LRU cache for hot path lookups.
- - **DiffCandidateSet**: `Set<objectIds>/Set<keys>` whose `updateID` exceeds `fromSnapshot.dbTxSequenceNumber`.
- - **SHA-256 Hashing:** Used to generate compact, fixed-size compare signatures for object metadata.
+- **DiffCandidateSet**: `Set<objectIds>/Set<keys>` captured by snapshot-specific gating rules mentioned in Section 2.3
+- **SHA-256 Hashing:** Used to generate compact, fixed-size compare signatures for object metadata.
 - **Delete retention sets (full diff only):** `deletedDirSet` and `deletedRootSet` to suppress redundant deletes.
 - **Dependency ordering graph:** adjacency list of `objectId -> children`, in-degree map, and a queue of zero in-degree nodes for Kahn's algorithm.
 - **Raw SST iterators**: `ManagedRawSSTFileIterator` yielding `(userKey, sequence, type, value)` tuples including tombstones used during DAG based diff delta SST scan.
@@ -152,14 +154,14 @@ If deletes are replayed first, `A/B` is removed before the rename and the rename
 
 ---
 
-## 4. DAG-Based Diff POC Implementation Stages
+## 4. Optimized DAG-Based Diff Implementation Stages
 
 The DAG-based diff optimizes the process by only looking at SST files that changed between snapshots. It identifies the set of SST files that differ between `fromSnapshot` and `toSnapshot` using the `RocksDBCheckpointDiffer` (compaction DAG).
 
 ### Stage 1: Sequential Read Flow + Batched Point Lookups + Directory Scans
 **Baseline Issue:** Baseline reads these delta files and then performs random reads against the snapshot DBs to find the old/new state of the keys, causing severe I/O bottlenecks.
 **Optimized Design (in order):**
-1. **Sequential scan of `toSnapshot` diff SSTs:** Use native iterators (`ManagedRawSSTFileIterator`) and a K-way merge to scan the delta SSTs **only in `toSnapshot`**. This yields the latest visible versions for changed keys and populates `newList` (for non-tombstones) plus the `DiffCandidateSet` (all tombstones + all keys with `updateID > fromSnapshotDbTxSequenceNumber`).
+1. **Sequential scan of `toSnapshot` diff SSTs:** Use native iterators (`ManagedRawSSTFileIterator`) and a K-way merge to scan the delta SSTs **only in `toSnapshot`**. This yields the latest visible versions for changed keys and populates `newList` (for non-tombstones) plus the `DiffCandidateSet` (all tombstones + all keys with `entry.sequence > fromSnapshot.dbTxSequenceNumber`).
 2. **Full table scan of `toSnapshot.directoryTable` (FSO only):** Use `tableIterator` to scan all directory entries sequentially and populate `jobId-to-edges`.
 3. **Full table scan of `fromSnapshot.directoryTable` (FSO only):** Use `tableIterator` to scan all directory entries sequentially.
    * Populate `jobId-from-edges` with `(parentId, objectId) -> name`.
@@ -187,7 +189,7 @@ A synchronized sequential iteration (merge join) is performed over the `oldList`
 
 ---
 
-## 5. Full Diff POC Implementation Stages
+## 5. Optimized Full Diff Implementation Stages
 
 The Full Diff path is used when compaction DAGs are unavailable or a full recalculation is forced.
 
@@ -196,7 +198,7 @@ Instead of random lookups, the optimization uses native RocksDB **Table Iterator
 
 **1. `toSnapshot` Directory Scan (FSO only):**
 *   Iterates sequentially through the `toSnapshot`'s `directoryTable`.
-*   Extracts `updateID` using the lightweight parser. If `updateID <= fromSnapshot.dbTxSequenceNumber`, the entry is unchanged (not created/renamed/modified) and is skipped. Otherwise, its compare signature is built and it is added to the `newList` and recorded in the `DiffCandidateSet`.
+*   Extracts `updateID` using the lightweight parser. If `updateID <= fromSnapshot.lastTransactionInfo.txIndex`, the entry is unchanged (not created/renamed/modified) and is skipped. Otherwise, its compare signature is built and it is added to the `newList` and recorded in the `DiffCandidateSet`.
 *   **Graph Construction:** Regardless of whether the entry is a candidate, the `parentID` and `name` are extracted to build the foundational edges of the `toSnapshot` directory structure graph. This is done by writing `(parentID, objectID) -> name` entries into a temporary RocksDB Column Family (`jobId-to-edges`).
 
 **2. `fromSnapshot` Directory Scan (FSO only):**
@@ -206,7 +208,7 @@ Instead of random lookups, the optimization uses native RocksDB **Table Iterator
 
 **3. `toSnapshot` Key Scan:**
 *   Iterates sequentially through the `toSnapshot`'s `key/fileTable`.
-*   Applies the same `updateID` gating logic: skips if `updateID <= fromSnapshot.dbTxSequenceNumber`.
+*   Applies the same `updateID` gating logic: skips if `updateID <= fromSnapshot.lastTransactionInfo.txIndex`.
 *   Builds the compare signature and adds to `newList`, recording these entries in the `DiffCandidateSet`. No parentID/path checks are performed at this stage.
 
 **4. `fromSnapshot` Key Scan:**
@@ -232,21 +234,21 @@ Same as Stage 3 of DAG based diff implementation.
 
 ## 6. Comparison with Baseline & Trade-offs
 
-| Feature | Baseline Implementation | POC Implementation |
-| :--- | :--- | :--- |
-| **Object Parsing** | Full Protobuf Deserialization (Heavy CPU/GC). | `SnapshotDiffValueParser` (Lightweight byte-stream parsing). |
-| **Modification Detection** | Full object equality. | Strict `updateID` gating & selective field hashing. |
-| **DAG Diff I/O Pattern** | Random point lookups (`db.get()`) for delta keys. | Sequential reads with K-way merge of SST files. |
-| **Classification Timing** | During report generation. | Deferred until merge join. |
-| **Path Resolution** | During report generation for all candidates. | Deferred to diff entries only. |
+| Feature | Baseline Implementation | Optimized Implementation                                      |
+| :--- | :--- |:--------------------------------------------------------------|
+| **Object Parsing** | Full Protobuf Deserialization (Heavy CPU/GC). | `SnapshotDiffValueParser` (Lightweight byte-stream parsing).  |
+| **Modification Detection** | Full object equality. | Key `sequence`/`updateID` gating + selective field hashing.   |
+| **DAG Diff I/O Pattern** | Random point lookups (`db.get()`) for delta keys. | Sequential reads with K-way merge of SST files.               |
+| **Classification Timing** | During report generation. | Deferred until merge join.                                    |
+| **Path Resolution** | During report generation for all candidates. | Deferred to diff entries only.                                |
 | **Delete Handling** | Emits deletes of descendants inconsistently. | Retains only top level directory deletes, dependency ordered. |
-| **Report Ordering** | Naive ordering based on Diff Type. | Dependency ordered with Kahn's algorithm. |
+| **Report Ordering** | Naive ordering based on Diff Type. | Dependency ordered with Kahn's algorithm.                     |
 
 ### Trade-offs
-1.  **Reliance on `updateID`:** The POC's speed in Full Diff relies heavily on `updateID`. If Ozone has bugs where `updateID` is not bumped during a meaningful metadata change (e.g., parent directory `modificationTime` updates during a child rename), the POC will miss the modification. Baseline catches this via full comparison, albeit much slower.
-2.  **K-way Merge Memory Overhead:** While the DAG POC eliminates random I/O, maintaining a Priority Queue for K-way merging requires slightly more active memory and CPU comparison logic than simple iteration, though this is vastly outweighed by the I/O savings.
+1.  **Reliance on `updateID` in full diff:** The optimized snapdiff's speed in Full Diff relies heavily on `updateID`. If Ozone has bugs where `updateID` is not bumped during a meaningful metadata change (e.g., parent directory `modificationTime` updates during a child rename), the optimization will miss the modification. Baseline catches this via full comparison, albeit much slower.
+2.  **K-way Merge Memory Overhead:** While the DAG optimization drastically reduces random I/O, maintaining a Priority Queue for K-way merging requires slightly more active memory and CPU comparison logic than simple iteration, though this is vastly outweighed by the I/O savings.
 3.  **Signature Collisions:** Hash-based comparison assumes no SHA-256 collisions. While statistically negligible, baseline's exact object equality has zero collision risk.
 4.  **Dependency Ordering Overhead:** Building and topologically sorting the dependency graph adds some CPU and memory overhead, especially for large delete sets.
 
 ## 7. Conclusion
-The POC implementation represents a shift from a compute-and-I/O-heavy approach to a streamlined, sequential, and deferred-evaluation model. By utilizing `SnapshotDiffValueParser` and `updateID` gating, CPU cycles and Garbage Collection pauses are drastically reduced. By replacing random reads in the DAG diff with a sequential K-way merge, disk I/O bottlenecks are eliminated. Deferred path resolution, batch RocksDB puts, and dependency ordered output ensure that resources are only spent on actual differences and replay remains consistent. Despite trade-offs around `updateID` reliance and graph ordering overhead, the POC provides a scalable and accurate snapshot diff engine suitable for massive buckets.
+The optimized implementation represents a shift from a compute-and-I/O-heavy approach to a streamlined, sequential, and deferred-evaluation model. By utilizing `SnapshotDiffValueParser` and entry `sequence`/`updateID` gating, CPU cycles and Garbage Collection pauses are drastically reduced. By replacing random reads in the DAG diff with a sequential K-way merge, disk I/O bottlenecks are eliminated. Deferred path resolution, batch RocksDB puts, and dependency ordered output ensure that resources are only spent on actual differences and replay remains consistent. Despite trade-offs around `updateID` reliance and graph ordering overhead, the optimization provides a scalable and accurate snapshot diff engine suitable for massive buckets.