fix(rocksdb): flush batch data into storage to make sure stage is completed#340
Merged
Conversation
nekomoto911
added a commit
to Galxe/gravity-sdk
that referenced
this pull request
May 25, 2026
…730) ## What Bump the `gravity-reth` git dependency: - `41c0b7092ad578abcfb59b3aeb0ce9ec43f5fcf7` → `acc458846c2f1fc684fa4344cf02ae9488efd252` Updates `bin/gravity_node/Cargo.toml` and the regenerated `Cargo.lock`. No SDK source changes. ## Why Pulls in **gravity-reth#340** — a RocksDB durability fix. The rev range is exactly one commit ahead (`ahead 1, behind 0`); this bump introduces #340 and nothing else. `provider_rw.commit()` called `rocksdb::DB::write()` without `fdatasync()`. With `wal_bytes_per_sync=4MB`, the low-traffic `account_db`/`storage_db` could go 2000+ blocks without syncing, risking large trie data loss on power failure and a state-root mismatch panic during consensus re-execution after a crash. The fix: 1. All `write()` calls in `commit_view()` use `write_opt(sync=true)` — every `commit()` is fsync-durable before returning. 2. Disable pipelined write and drop `wal_bytes_per_sync` (redundant with per-commit sync). 3. Recovery always re-checks every stage checkpoint (no early return), repairing only stages that are behind. Ref: Galxe/gravity-reth#340
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
provider_rw.commit()callsrocksdb::DB::write()which appends to WAL but does not callfdatasync(). Data sits in OS page cache afterwrite()returns. Background sync fires only every 4MB of WAL (wal_bytes_per_sync=4MB). Three separate DB instances (state_db, account_db, storage_db) have independent WAL counters:Result: after crash, state_db checkpoints report consistency but trie DBs lost committed data. Consensus re-execution reads stale trie nodes → state root mismatch → panic.
Fix
1.
tx.rs— everywrite()→write_opt(sync=true)All 7
db.write()calls incommit_view()now useWriteOptions::set_sync(true). Everyprovider_rw.commit()is fsync-durable before returning.2.
mod.rs— disable pipelined write, remove wal_bytes_per_syncWith per-commit sync, background WAL sync and pipelining are redundant.
3.
recovery.rs— always check all stage checkpointsRemoved early-return when
recover_block == best_block. Eachrecover_*function has internalif ck < targetguard → no-op when consistent, repair when not.Fault tolerance after fix
Each commit atomic on disk. Any crash point → recovery scans all stages → rebuilds only what is behind. No blind spots.