Skip to content

feat: checkpoints#399

Merged
QuantumExplorer merged 3 commits intodevelopfrom
feat/checkpoints
Dec 19, 2025
Merged

feat: checkpoints#399
QuantumExplorer merged 3 commits intodevelopfrom
feat/checkpoints

Conversation

@QuantumExplorer
Copy link
Member

@QuantumExplorer QuantumExplorer commented Dec 19, 2025

Issue being fixed or feature implemented

Improved checkpoint functionality for GroveDB with proper read-only checkpoint opening, safe deletion, and configurable WAL flushing behavior.

What was done?

Checkpoint API Improvements

  • New open_checkpoint() method: Opens checkpoints in read-only mode with optimized RocksDB settings (disabled compactions, mmap reads enabled, reduced parallelism)
  • New delete_checkpoint() method: Safely deletes checkpoint directories with two safety guards:
    • Refuses to delete paths with fewer than 2 components (prevents accidental root deletion)
    • Validates the path is a valid GroveDB checkpoint by attempting to open it first
  • Extracted checkpoint logic into dedicated grovedb/src/checkpoints.rs module

Storage Layer Changes

  • Added checkpoint_rocksdb_with_path() in RocksDbStorage for read-only checkpoint opening
  • Added READ_ONLY_CHECKPOINTS_OPTS lazy static with appropriate read-only RocksDB options
  • Changed create_checkpoint() to use create_checkpoint_with_log_size() with u64::MAX threshold (never forces WAL flush)
  • Updated rocksdb dependency to use QuantumExplorer's fork (rev 52772eea7bcd214d1d07d80aa538b1d24e5015b7) for create_checkpoint_with_log_size API

Tests

  • Added comprehensive checkpoint tests in grovedb/src/tests/checkpoint_tests.rs:
    • Basic checkpoint create/open/independence test
    • test_delete_checkpoint - verifies safe deletion
    • test_checkpoint_wal_under_threshold_is_preserved - verifies WAL behavior under 50MB
    • test_checkpoint_wal_over_threshold_is_flushed (ignored - blocked by upstream RocksDB bug)
  • Added grovedb/src/tests/test_compaction_sizes.rs for WAL/SST storage overhead analysis

Note on Ignored Test

test_checkpoint_wal_over_threshold_is_flushed is ignored due to a bug in RocksDB where log_size_for_flush parameter is non-functional. Re-enable when facebook/rocksdb#14193 is merged.

How Has This Been Tested?

  • Unit tests for checkpoint creation, opening, and deletion
  • Tests verify checkpoint isolation (changes in checkpoint don't affect original DB and vice versa)
  • Tests verify WAL preservation behavior for checkpoints under threshold
  • Storage overhead tests measure WAL (6x) and SST (1x) ratios

Breaking Changes

None. This adds new public API methods without modifying existing behavior.

Summary by CodeRabbit

  • New Features

    • Added checkpoint management: create, open (including read-only checkpoint access), and delete database checkpoints with safety checks and improved defaults.
  • Tests

    • Added comprehensive checkpoint and compaction tests covering data integrity, WAL handling, flush behavior, and storage overhead measurements.
  • Chores

    • Switched RocksDB dependency to a custom fork to support enhanced checkpoint behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 19, 2025

Walkthrough

Adds checkpoint management (create, open, delete) to GroveDb with RocksDB checkpoint-specific initialization and options, renames session snapshot to checkpointed_grovedb, widens a Cow return type with a lifetime, introduces checkpoint/compaction tests, and switches the rocksdb dependency to a git fork.

Changes

Cohort / File(s) Summary
GroveDb checkpoint API
grovedb/src/checkpoints.rs, grovedb/src/lib.rs
Adds create_checkpoint, open_checkpoint, and delete_checkpoint methods on GroveDb; moves create logic into a private checkpoints module under feature gating and removes a previously public create_checkpoint from grovedb/src/lib.rs.
RocksDB storage checkpoint support
storage/src/rocksdb_storage/storage.rs, storage/Cargo.toml
Adds checkpoint_rocksdb_with_path (read-only checkpoint open), default_rocksdb_with_path, DEFAULT_LOG_SIZE_FOR_CHECKPOINT_FLUSH, and READ_ONLY_CHECKPOINTS_OPTS; updates create_checkpoint to use log-size variant; switches rocksdb dependency to a git repo at a fixed revision.
Session / debugger refactor
grovedb/src/debugger.rs
Renames Session::snapshotcheckpointed_grovedb, updates AppState::get_snapshotget_checkpointed_grovedb, and adjusts all construction/usage sites.
Tests — checkpoints & compaction
grovedb/src/tests/checkpoint_tests.rs, grovedb/src/tests/test_compaction_sizes.rs, grovedb/src/tests/mod.rs
Adds extensive checkpoint tests (create/open/delete, WAL/SST checks) and a compaction/storage-overhead test; reorganizes test modules.
Type signature lifetime change
grovedb-epoch-based-storage-flags/src/lib.rs
map_cow_some_element_flags_ref return widened to Option<Cow<'_, Self>> (adds lifetime on Cow).
Tutorial cleanup
tutorials/src/bin/replication.rs
Renames MAIN_ΚΕΥMAIN_KEY and MAIN_ΚΕΥ_EMPTYMAIN_KEY_EMPTY, removes pub(crate) SubtreePrefix, and adjusts various call sites and argument passing for grove_version.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor User
    participant GroveDb
    participant RocksDbStorage
    participant FS as Filesystem

    User->>GroveDb: create_checkpoint(path)
    GroveDb->>RocksDbStorage: create_checkpoint_with_log_size(path, log_size)
    RocksDbStorage->>FS: write checkpoint files (SSTs, WAL snapshot)
    RocksDbStorage-->>GroveDb: Result (ok / error)
    GroveDb-->>User: Result

    User->>GroveDb: open_checkpoint(path)
    GroveDb->>RocksDbStorage: checkpoint_rocksdb_with_path(path) [read-only opts]
    RocksDbStorage->>FS: open checkpoint directory (read-only)
    RocksDbStorage-->>GroveDb: RocksDbStorage instance
    GroveDb-->>User: GroveDb wrapping checkpoint storage

    User->>GroveDb: delete_checkpoint(path)
    GroveDb->>FS: validate path depth & attempt open (safety)
    FS->>GroveDb: remove_dir_all(path)
    GroveDb-->>User: Result
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

  • Pay special attention to:
    • grovedb/src/checkpoints.rs — path validation, error mapping, and safe deletion.
    • storage/src/rocksdb_storage/storage.rs — read-only options, default option choices, and interactions with the git-backed rocksdb.
    • New tests (checkpoint_tests.rs, test_compaction_sizes.rs) — flakiness, timing, and environment assumptions.
    • Lifetime widening to Cow<'_, Self> in grovedb-epoch-based-storage-flags/src/lib.rs — potential borrow/lifetime impacts.

Possibly related PRs

Suggested reviewers

  • fominok

Poem

🐇 I nibble logs and copy trees,
Checkpoints tucked beneath the leas,
RocksDB hums in careful rows,
Sessions renamed where memory grows,
I hop away — the data flows. 🌿

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: checkpoints' directly and clearly summarizes the main change: addition of checkpoint functionality with new APIs (open_checkpoint, delete_checkpoint) and related infrastructure.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/checkpoints

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bb08d1 and e9e7366.

📒 Files selected for processing (1)
  • grovedb/src/debugger.rs (7 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.rs: Format Rust code with cargo fmt
Run cargo clippy with -D warnings and fix all lints

Files:

  • grovedb/src/debugger.rs
{grovedb,merk,storage,costs}/src/**/*.rs

📄 CodeRabbit inference engine (CLAUDE.md)

{grovedb,merk,storage,costs}/src/**/*.rs: Use cost_return_on_error! for early returns while accumulating cost
Wrap errors with context (e.g., .map_err(|e| Error::CorruptedData(format!("context: {}", e))))

Files:

  • grovedb/src/debugger.rs
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: dashpay/grovedb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T12:46:28.692Z
Learning: Applies to storage/src/rocksdb_storage/**/*.rs : Use OptimisticTransactionDB for optimistic transactions
📚 Learning: 2025-02-18T08:17:30.690Z
Learnt from: fominok
Repo: dashpay/grovedb PR: 345
File: grovedb/src/operations/delete/v1.rs:19-35
Timestamp: 2025-02-18T08:17:30.690Z
Learning: Refactoring suggestions for legacy code should be avoided as they are considered out of scope for feature PRs. This includes the function `delete_internal_on_transaction` in `grovedb/src/operations/delete/v1.rs` which is legacy code.

Applied to files:

  • grovedb/src/debugger.rs
📚 Learning: 2025-09-06T12:46:28.692Z
Learnt from: CR
Repo: dashpay/grovedb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T12:46:28.692Z
Learning: Applies to {grovedb,merk,storage,costs}/src/**/*.rs : Wrap errors with context (e.g., .map_err(|e| Error::CorruptedData(format!("context: {}", e))))

Applied to files:

  • grovedb/src/debugger.rs
🧬 Code graph analysis (1)
grovedb/src/debugger.rs (1)
grovedb/src/lib.rs (2)
  • open (282-285)
  • path (1043-1045)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Tests
  • GitHub Check: Code Coverage
  • GitHub Check: Linting
🔇 Additional comments (2)
grovedb/src/debugger.rs (2)

148-166: LGTM! Method rename and implementation are correct.

The rename from get_snapshot to get_checkpointed_grovedb aligns with the PR's checkpoint terminology. The implementation correctly updates last_access and returns a read guard to the checkpointed GroveDB instance.


168-172: LGTM! Session struct field rename is consistent.

The field rename from snapshot to checkpointed_grovedb is consistent with the overall refactoring and correctly reflects that sessions now work with checkpointed GroveDB instances.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
grovedb/src/debugger.rs (1)

175-186: Debugger sessions should likely use open_checkpoint instead of plain open

Session::new currently does:

  • grovedb.create_checkpoint(&path)?;
  • let checkpointed_grovedb = GroveDb::open(path)?;

Given this PR adds GroveDb::open_checkpoint backed by RocksDB’s checkpoint‑specific, read‑only options (no compactions, mmap reads, reduced parallelism), the debugger’s per‑session DB seems like a prime consumer of that API.

Consider replacing the open call with the new helper, e.g.:

Proposed refactor using `open_checkpoint`
-        grovedb
-            .create_checkpoint(&path)
-            .map_err(|e| AppError::Any(e.to_string()))?;
-        let checkpointed_grovedb = GroveDb::open(path).map_err(|e| AppError::Any(e.to_string()))?;
+        grovedb
+            .create_checkpoint(&path)
+            .map_err(|e| AppError::Any(e.to_string()))?;
+        let checkpointed_grovedb =
+            GroveDb::open_checkpoint(path).map_err(|e| AppError::Any(e.to_string()))?;

This keeps debugger sessions strictly read‑only and aligned with the checkpoint configuration used elsewhere in the crate.

grovedb/src/tests/test_compaction_sizes.rs (1)

19-26: Add explicit flush before final SST measurement or mark test #[ignore]

The logic of get_files_size and the WAL/SST ratio checks is correct, but this test lacks explicit synchronization before measuring final SST size:

  • No db.flush() call before the final assertion, so compaction may still be in progress
  • Asserts sst_ratio < 1.5, which assumes settled state; on slower CI this could intermittently fail
  • Test itself is heavy (50 MB random writes + RocksDB background operations)

Consider either:

  • Adding db.flush().expect("flush failed") before line 113, or
  • Marking with #[ignore] and running selectively

This preserves the diagnostic value while reducing flakiness risk.

grovedb/src/checkpoints.rs (1)

28-48: Consider additional safety checks for checkpoint deletion.

The current safety checks are good:

  1. Path depth validation prevents root deletion
  2. Checkpoint validation ensures it's a valid GroveDB

However, consider these additional safeguards:

  1. Path canonicalization: Resolve symbolic links to prevent deleting unexpected locations
  2. TOCTOU race condition: There's a small window between validation (opening) and deletion where the checkpoint could be replaced
  3. Relative path handling: Ensure relative paths don't escape expected boundaries
Enhanced safety checks
 pub fn delete_checkpoint<P: AsRef<Path>>(path: P) -> Result<(), Error> {
     let path = path.as_ref();
+    
+    // Canonicalize path to resolve symbolic links and relative paths
+    let canonical_path = path.canonicalize()
+        .map_err(|e| Error::CorruptedData(format!("cannot canonicalize checkpoint path: {}", e)))?;

     // Safety: prevent deletion of root or near-root paths
-    let component_count = path.components().count();
+    let component_count = canonical_path.components().count();
     if component_count < 2 {
         return Err(Error::CorruptedData(
             "refusing to delete checkpoint: path too short (safety check)".to_string(),
         ));
     }

     // Verify this is a valid checkpoint by attempting to open it
     // This ensures we only delete actual GroveDB checkpoints
     {
-        let _checkpoint_db = Self::open_checkpoint(path)?;
+        let _checkpoint_db = Self::open_checkpoint(&canonical_path)?;
         // checkpoint_db is dropped here, closing the database
     }

-    std::fs::remove_dir_all(path)
+    std::fs::remove_dir_all(&canonical_path)
         .map_err(|e| Error::CorruptedData(format!("failed to delete checkpoint: {}", e)))
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between be1c499 and 5938191.

📒 Files selected for processing (10)
  • grovedb-epoch-based-storage-flags/src/lib.rs (1 hunks)
  • grovedb/src/checkpoints.rs (1 hunks)
  • grovedb/src/debugger.rs (3 hunks)
  • grovedb/src/lib.rs (1 hunks)
  • grovedb/src/tests/checkpoint_tests.rs (1 hunks)
  • grovedb/src/tests/mod.rs (3 hunks)
  • grovedb/src/tests/test_compaction_sizes.rs (1 hunks)
  • storage/Cargo.toml (1 hunks)
  • storage/src/rocksdb_storage/storage.rs (5 hunks)
  • tutorials/src/bin/replication.rs (8 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
**/*.rs

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.rs: Format Rust code with cargo fmt
Run cargo clippy with -D warnings and fix all lints

Files:

  • grovedb/src/checkpoints.rs
  • grovedb-epoch-based-storage-flags/src/lib.rs
  • grovedb/src/tests/checkpoint_tests.rs
  • storage/src/rocksdb_storage/storage.rs
  • grovedb/src/debugger.rs
  • grovedb/src/tests/mod.rs
  • grovedb/src/tests/test_compaction_sizes.rs
  • tutorials/src/bin/replication.rs
  • grovedb/src/lib.rs
{grovedb,merk,storage,costs}/src/**/*.rs

📄 CodeRabbit inference engine (CLAUDE.md)

{grovedb,merk,storage,costs}/src/**/*.rs: Use cost_return_on_error! for early returns while accumulating cost
Wrap errors with context (e.g., .map_err(|e| Error::CorruptedData(format!("context: {}", e))))

Files:

  • grovedb/src/checkpoints.rs
  • grovedb/src/tests/checkpoint_tests.rs
  • storage/src/rocksdb_storage/storage.rs
  • grovedb/src/debugger.rs
  • grovedb/src/tests/mod.rs
  • grovedb/src/tests/test_compaction_sizes.rs
  • grovedb/src/lib.rs
**/tests/**/*.rs

📄 CodeRabbit inference engine (CLAUDE.md)

**/tests/**/*.rs: Every state-modifying operation must be testable with proofs
Verify that cost calculations in tests match actual operations
Ensure reference integrity in tests (no cycles)

Files:

  • grovedb/src/tests/checkpoint_tests.rs
  • grovedb/src/tests/mod.rs
  • grovedb/src/tests/test_compaction_sizes.rs
{storage/src/**/*.rs,grovedb/src/batch/**/*.rs}

📄 CodeRabbit inference engine (CLAUDE.md)

Use transactions for multi-step operations to ensure safety

Files:

  • storage/src/rocksdb_storage/storage.rs
storage/src/rocksdb_storage/**/*.rs

📄 CodeRabbit inference engine (CLAUDE.md)

storage/src/rocksdb_storage/**/*.rs: Use Blake3 to derive 32-byte prefixes for path-based subtree isolation
Use OptimisticTransactionDB for optimistic transactions

Files:

  • storage/src/rocksdb_storage/storage.rs
🧠 Learnings (12)
📓 Common learnings
Learnt from: CR
Repo: dashpay/grovedb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T12:46:28.692Z
Learning: Applies to {storage/src/**/*.rs,grovedb/src/batch/**/*.rs} : Use transactions for multi-step operations to ensure safety
Learnt from: CR
Repo: dashpay/grovedb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T12:46:28.692Z
Learning: Applies to storage/src/rocksdb_storage/**/*.rs : Use OptimisticTransactionDB for optimistic transactions
📚 Learning: 2025-09-06T12:46:28.692Z
Learnt from: CR
Repo: dashpay/grovedb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T12:46:28.692Z
Learning: Applies to {grovedb,merk,storage,costs}/src/**/*.rs : Wrap errors with context (e.g., .map_err(|e| Error::CorruptedData(format!("context: {}", e))))

Applied to files:

  • grovedb/src/checkpoints.rs
  • grovedb-epoch-based-storage-flags/src/lib.rs
  • grovedb/src/tests/checkpoint_tests.rs
  • storage/Cargo.toml
  • grovedb/src/tests/mod.rs
  • grovedb/src/tests/test_compaction_sizes.rs
  • tutorials/src/bin/replication.rs
  • grovedb/src/lib.rs
📚 Learning: 2025-09-06T12:46:28.692Z
Learnt from: CR
Repo: dashpay/grovedb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T12:46:28.692Z
Learning: Applies to storage/src/rocksdb_storage/**/*.rs : Use Blake3 to derive 32-byte prefixes for path-based subtree isolation

Applied to files:

  • grovedb/src/checkpoints.rs
  • storage/Cargo.toml
  • storage/src/rocksdb_storage/storage.rs
  • grovedb/src/tests/test_compaction_sizes.rs
  • tutorials/src/bin/replication.rs
📚 Learning: 2025-09-06T12:46:28.692Z
Learnt from: CR
Repo: dashpay/grovedb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T12:46:28.692Z
Learning: Applies to **/tests/**/*.rs : Ensure reference integrity in tests (no cycles)

Applied to files:

  • grovedb/src/tests/checkpoint_tests.rs
  • grovedb/src/tests/mod.rs
  • grovedb/src/tests/test_compaction_sizes.rs
📚 Learning: 2025-09-06T12:46:28.692Z
Learnt from: CR
Repo: dashpay/grovedb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T12:46:28.692Z
Learning: Applies to **/tests/**/*.rs : Every state-modifying operation must be testable with proofs

Applied to files:

  • grovedb/src/tests/checkpoint_tests.rs
  • grovedb/src/tests/mod.rs
  • grovedb/src/tests/test_compaction_sizes.rs
📚 Learning: 2025-09-06T12:46:28.692Z
Learnt from: CR
Repo: dashpay/grovedb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T12:46:28.692Z
Learning: Applies to **/tests/**/*.rs : Verify that cost calculations in tests match actual operations

Applied to files:

  • grovedb/src/tests/checkpoint_tests.rs
  • grovedb/src/tests/mod.rs
  • grovedb/src/tests/test_compaction_sizes.rs
📚 Learning: 2025-12-17T18:39:16.131Z
Learnt from: QuantumExplorer
Repo: dashpay/grovedb PR: 397
File: grovedb/src/tests/provable_count_sum_tree_tests.rs:82-90
Timestamp: 2025-12-17T18:39:16.131Z
Learning: In Rust test code for grovedb, when a function returns a cost-wrapped result (CostResult), using .unwrap().map_err(|e| e) is acceptable to discard cost tracking by unwrapping the inner Result and converting the error type for test helpers. This pattern should be applied across test files that interact with cost-wrapped results, not just a single test, to ensure consistency in how test helpers handle costs.

Applied to files:

  • grovedb/src/tests/checkpoint_tests.rs
  • grovedb/src/tests/mod.rs
  • grovedb/src/tests/test_compaction_sizes.rs
📚 Learning: 2025-09-06T12:46:28.692Z
Learnt from: CR
Repo: dashpay/grovedb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T12:46:28.692Z
Learning: Applies to storage/src/rocksdb_storage/**/*.rs : Use OptimisticTransactionDB for optimistic transactions

Applied to files:

  • storage/Cargo.toml
  • storage/src/rocksdb_storage/storage.rs
📚 Learning: 2025-09-06T12:46:28.692Z
Learnt from: CR
Repo: dashpay/grovedb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T12:46:28.692Z
Learning: Applies to {storage/src/**/*.rs,grovedb/src/batch/**/*.rs} : Use transactions for multi-step operations to ensure safety

Applied to files:

  • storage/src/rocksdb_storage/storage.rs
  • grovedb/src/tests/mod.rs
  • grovedb/src/tests/test_compaction_sizes.rs
  • tutorials/src/bin/replication.rs
📚 Learning: 2025-09-06T12:46:28.692Z
Learnt from: CR
Repo: dashpay/grovedb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T12:46:28.692Z
Learning: Applies to {grovedb/src/operations/proof,merk/src/proofs}/**/*.rs : Never trust unverified proofs; always verify before use

Applied to files:

  • grovedb/src/tests/mod.rs
📚 Learning: 2025-09-06T12:46:28.692Z
Learnt from: CR
Repo: dashpay/grovedb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T12:46:28.692Z
Learning: Applies to grovedb/src/batch/**/*.rs : Prefer batch operations for multiple related changes

Applied to files:

  • grovedb/src/tests/mod.rs
  • grovedb/src/tests/test_compaction_sizes.rs
  • tutorials/src/bin/replication.rs
📚 Learning: 2025-02-18T08:17:30.690Z
Learnt from: fominok
Repo: dashpay/grovedb PR: 345
File: grovedb/src/operations/delete/v1.rs:19-35
Timestamp: 2025-02-18T08:17:30.690Z
Learning: Refactoring suggestions for legacy code should be avoided as they are considered out of scope for feature PRs. This includes the function `delete_internal_on_transaction` in `grovedb/src/operations/delete/v1.rs` which is legacy code.

Applied to files:

  • tutorials/src/bin/replication.rs
  • grovedb/src/lib.rs
🧬 Code graph analysis (3)
grovedb/src/checkpoints.rs (1)
storage/src/rocksdb_storage/storage.rs (3)
  • path (190-190)
  • create_checkpoint (567-573)
  • checkpoint_rocksdb_with_path (139-151)
grovedb/src/debugger.rs (1)
grovedb/src/lib.rs (2)
  • open (281-284)
  • path (1042-1044)
grovedb/src/tests/test_compaction_sizes.rs (3)
grovedb/src/lib.rs (6)
  • path (1042-1044)
  • open (281-284)
  • None (439-439)
  • None (1111-1111)
  • None (1158-1158)
  • None (1187-1187)
grovedb-version/src/version/mod.rs (1)
  • latest (26-30)
grovedb-element/src/element/constructor.rs (2)
  • empty_tree (11-13)
  • new_item (61-63)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Linting
  • GitHub Check: Code Coverage
  • GitHub Check: Compilation errors
  • GitHub Check: Tests
🔇 Additional comments (16)
grovedb/src/tests/mod.rs (1)

9-9: Test module wiring and refactor into general_tests look consistent

checkpoint_tests and test_compaction_sizes are correctly added to the test suite, and wrapping the existing tests into mod general_tests keeps visibility and imports intact via super::*. I don’t see functional or structural issues here.

Also applies to: 43-43, 1035-1035

storage/Cargo.toml (1)

21-22: Git-pinned RocksDB fork is fine short term; ensure tracking and reversion plan

Using the QuantumExplorer RocksDB fork with a fixed rev is reasonable to unblock create_checkpoint_with_log_size, but it does shift risk to this repo:

  • You’re now tied to that specific commit’s API/behavior and build flags.
  • Future upstream releases (once PR 1055 is merged) should replace this; the TODO captures that but it’s easy to forget.

I suggest:

  • Adding a brief note in your release/upgrade docs or an issue referencing this TODO so it’s not lost.
  • Verifying that this rev is built with the same features you relied on from crates.io (compression, jemalloc, etc., if any).
grovedb-epoch-based-storage-flags/src/lib.rs (1)

677-683: map_cow_some_element_flags_ref lifetime widening is sound and non-breaking

Changing the return type to Result<Option<Cow<'_, Self>>, StorageFlagsError> is type-correct and keeps current behavior (only Cow::Owned is constructed). This makes the API future-proof if you later want to return borrowed flags without forcing allocations.

tutorials/src/bin/replication.rs (4)

11-12: LGTM! Improved constant naming.

The renaming from Greek characters (ΚΕΥ) to ASCII (KEY) improves readability and portability across different environments and editors.


83-85: LGTM! Simplified return.

Direct return is more concise and idiomatic.


114-114: LGTM! Idiomatic emptiness check.

Using is_empty() is the preferred Rust pattern for checking collection emptiness.


265-265: LGTM! Improved error handling.

Using value? for error propagation is safer and more idiomatic than unwrap().

grovedb/src/lib.rs (1)

130-130: LGTM! Good module organization.

Extracting checkpoint functionality to a dedicated module improves code organization and maintainability.

storage/src/rocksdb_storage/storage.rs (2)

87-107: LGTM! Appropriate read-only checkpoint configuration.

The options correctly configure RocksDB for read-only checkpoint access:

  • Prevents any writes (no creation, no mmap writes, no auto-compactions)
  • Enables mmap reads for performance
  • Reduces parallelism to minimize resource usage

119-119: LGTM! Clear WAL flush behavior.

Setting DEFAULT_LOG_SIZE_FOR_CHECKPOINT_FLUSH to u64::MAX with the comment "Never flush" clearly indicates the checkpoint strategy: preserve the WAL without forcing a flush, keeping checkpoints lightweight at the cost of WAL replay on open.

Also applies to: 567-573

grovedb/src/checkpoints.rs (2)

8-11: LGTM! Clean delegation.

Simple delegation to the storage layer with appropriate error conversion.


13-17: LGTM! Proper read-only checkpoint opening.

Correctly uses the dedicated checkpoint initialization path with read-only configuration.

grovedb/src/tests/checkpoint_tests.rs (4)

11-168: LGTM! Comprehensive checkpoint functionality test.

The test thoroughly verifies:

  • Checkpoint creation and opening
  • Data retrieval from checkpoints
  • Isolation between original DB and checkpoint (bidirectional independence)

Clear assertions and good test structure.


170-225: LGTM! Thorough checkpoint deletion test.

The test verifies all key aspects:

  • Checkpoint creation and accessibility
  • Successful deletion
  • No impact on original database

Good test coverage.


230-309: LGTM! WAL preservation test is well-designed.

The test correctly verifies that when data size is under the 50MB threshold, the WAL is preserved in the checkpoint rather than being flushed to SST files.

Good use of filesystem checks to verify WAL presence and appropriate data size calculations (490 items × 100KB ≈ 49MB).


327-503: Well-documented ignored test with appropriate safeguards.

The test is properly marked #[ignore] with clear re-enable criteria (pending upstream RocksDB bug fix). The test logic is sound—it incrementally writes data until WAL exceeds a 50 MiB threshold, then verifies checkpoint WAL is empty (indicating flush was triggered). The implementation includes good safeguards: a 62 MiB limit prevents memtable auto-flush from invalidating test assumptions, and comprehensive debugging output aids troubleshooting.

Note: The specific GitHub PR reference (#14193) could not be independently verified in publicly available RocksDB repositories, though the upstream checkpoint/WAL flush behavior described aligns with RocksDB's documented log_size_for_flush parameter behavior and historical fixes like PR #9208.

Comment on lines +139 to +151
pub fn checkpoint_rocksdb_with_path<P: AsRef<Path>>(path: P) -> Result<Self, Error> {
let db = Db::open_cf_descriptors(
&READ_ONLY_CHECKPOINTS_OPTS,
&path,
[
ColumnFamilyDescriptor::new(AUX_CF_NAME, DEFAULT_OPTS.clone()),
ColumnFamilyDescriptor::new(ROOTS_CF_NAME, DEFAULT_OPTS.clone()),
ColumnFamilyDescriptor::new(META_CF_NAME, DEFAULT_OPTS.clone()),
],
)
.map_err(RocksDBError)?;
Ok(RocksDbStorage { db })
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's view the full context of checkpoint_rocksdb_with_path and surrounding code
cat -n storage/src/rocksdb_storage/storage.rs | head -160 | tail -40

Repository: dashpay/grovedb

Length of output: 1886


🏁 Script executed:

#!/bin/bash
# Find where READ_ONLY_CHECKPOINTS_OPTS and DEFAULT_OPTS are defined
rg -n "READ_ONLY_CHECKPOINTS_OPTS|DEFAULT_OPTS" storage/src/rocksdb_storage/ -B 2 -A 2

Repository: dashpay/grovedb

Length of output: 2621


🏁 Script executed:

#!/bin/bash
# Look for other functions that open RocksDB to see if they follow the same pattern
rg -n "open_cf_descriptors" storage/src/rocksdb_storage/ -B 5 -A 10

Repository: dashpay/grovedb

Length of output: 2772


🏁 Script executed:

#!/bin/bash
# Check if there's a read-only version of DEFAULT_OPTS or similar
rg -n "ColumnFamilyDescriptor|READ_ONLY" storage/src/rocksdb_storage/storage.rs -B 2 -A 2

Repository: dashpay/grovedb

Length of output: 1342


🏁 Script executed:

#!/bin/bash
# View the complete definitions of both options
sed -n '75,120p' storage/src/rocksdb_storage/storage.rs

Repository: dashpay/grovedb

Length of output: 1530


🏁 Script executed:

#!/bin/bash
# Check if there are any other read-only patterns or if CF options are ever customized elsewhere
rg -n "read_only|ReadOnly" storage/src/rocksdb_storage/ -i

Repository: dashpay/grovedb

Length of output: 248


Use read-only options for column families in checkpoint mode.

The DB is opened with READ_ONLY_CHECKPOINTS_OPTS which explicitly disables write-related settings (create_if_missing, create_missing_column_families, allow_mmap_writes), but column families are created with DEFAULT_OPTS which enables these write-focused settings. This creates an inconsistency where the database-level read-only contract is violated at the column family level.

Column families should use a read-only configuration (or a minimal set of options without write-enabling features) to maintain consistency with the DB-level read-only constraints.

🤖 Prompt for AI Agents
In storage/src/rocksdb_storage/storage.rs around lines 139 to 151, the DB is
opened with READ_ONLY_CHECKPOINTS_OPTS but the ColumnFamilyDescriptor instances
use DEFAULT_OPTS which enable write-related settings; replace those DEFAULT_OPTS
with a read-only/minimal option set (e.g. clone of READ_ONLY_CHECKPOINTS_OPTS or
a dedicated READ_ONLY_CF_OPTS) that does not set create_if_missing,
create_missing_column_families, allow_mmap_writes, etc., so the column-family
options match the database-level read-only constraints before calling
Db::open_cf_descriptors.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
grovedb/src/debugger.rs (1)

174-187: Use open_checkpoint() instead of open() for read-only checkpoint access.

Line 181 uses GroveDb::open(path) to open the checkpoint, but this PR introduces open_checkpoint() specifically for read-only checkpoint access with optimized RocksDB options (disabled compactions, mmap reads, reduced parallelism). Since the debugger only reads from checkpoints and never writes to them, you should use the optimized checkpoint-specific API.

🔎 Proposed fix
-        let checkpointed_grovedb = GroveDb::open(path).map_err(|e| AppError::Any(e.to_string()))?;
+        let checkpointed_grovedb = GroveDb::open_checkpoint(path).map_err(|e| AppError::Any(e.to_string()))?;
♻️ Duplicate comments (1)
grovedb/src/debugger.rs (1)

236-236: LGTM! All call sites correctly updated.

All four call sites (lines 236, 264, 293, 310) have been updated to use get_checkpointed_grovedb, resolving the compilation failure flagged in the previous review.

Also applies to: 264-264, 293-293, 310-310

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bb08d1 and e9e7366.

📒 Files selected for processing (1)
  • grovedb/src/debugger.rs (7 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.rs: Format Rust code with cargo fmt
Run cargo clippy with -D warnings and fix all lints

Files:

  • grovedb/src/debugger.rs
{grovedb,merk,storage,costs}/src/**/*.rs

📄 CodeRabbit inference engine (CLAUDE.md)

{grovedb,merk,storage,costs}/src/**/*.rs: Use cost_return_on_error! for early returns while accumulating cost
Wrap errors with context (e.g., .map_err(|e| Error::CorruptedData(format!("context: {}", e))))

Files:

  • grovedb/src/debugger.rs
🧠 Learnings (3)
📓 Common learnings
Learnt from: CR
Repo: dashpay/grovedb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T12:46:28.692Z
Learning: Applies to storage/src/rocksdb_storage/**/*.rs : Use OptimisticTransactionDB for optimistic transactions
📚 Learning: 2025-02-18T08:17:30.690Z
Learnt from: fominok
Repo: dashpay/grovedb PR: 345
File: grovedb/src/operations/delete/v1.rs:19-35
Timestamp: 2025-02-18T08:17:30.690Z
Learning: Refactoring suggestions for legacy code should be avoided as they are considered out of scope for feature PRs. This includes the function `delete_internal_on_transaction` in `grovedb/src/operations/delete/v1.rs` which is legacy code.

Applied to files:

  • grovedb/src/debugger.rs
📚 Learning: 2025-09-06T12:46:28.692Z
Learnt from: CR
Repo: dashpay/grovedb PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-09-06T12:46:28.692Z
Learning: Applies to {grovedb,merk,storage,costs}/src/**/*.rs : Wrap errors with context (e.g., .map_err(|e| Error::CorruptedData(format!("context: {}", e))))

Applied to files:

  • grovedb/src/debugger.rs
🧬 Code graph analysis (1)
grovedb/src/debugger.rs (1)
grovedb/src/lib.rs (2)
  • open (282-285)
  • path (1043-1045)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Tests
  • GitHub Check: Code Coverage
  • GitHub Check: Linting
🔇 Additional comments (2)
grovedb/src/debugger.rs (2)

148-166: LGTM! Method rename and implementation are correct.

The rename from get_snapshot to get_checkpointed_grovedb aligns with the PR's checkpoint terminology. The implementation correctly updates last_access and returns a read guard to the checkpointed GroveDB instance.


168-172: LGTM! Session struct field rename is consistent.

The field rename from snapshot to checkpointed_grovedb is consistent with the overall refactoring and correctly reflects that sessions now work with checkpointed GroveDB instances.

@QuantumExplorer QuantumExplorer merged commit 4b4a832 into develop Dec 19, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant