fix(rocksdb): flush batch data into storage to make sure stage is completed by AshinGau · Pull Request #340 · Galxe/gravity-reth

AshinGau · 2026-05-22T07:07:33Z

Problem

provider_rw.commit() calls rocksdb::DB::write() which appends to WAL but does not call fdatasync(). Data sits in OS page cache after write() returns. Background sync fires only every 4MB of WAL (wal_bytes_per_sync=4MB). Three separate DB instances (state_db, account_db, storage_db) have independent WAL counters:

state_db gets ~50KB writes per block → syncs every ~80 blocks → data largely survives
account_db / storage_db get ~0-2KB per empty block → may go 2000+ blocks without sync → massive data loss on power failure

Result: after crash, state_db checkpoints report consistency but trie DBs lost committed data. Consensus re-execution reads stale trie nodes → state root mismatch → panic.

Fix

1. tx.rs — every write() → write_opt(sync=true)

All 7 db.write() calls in commit_view() now use WriteOptions::set_sync(true). Every provider_rw.commit() is fsync-durable before returning.

2. mod.rs — disable pipelined write, remove wal_bytes_per_sync

With per-commit sync, background WAL sync and pipelining are redundant.

3. recovery.rs — always check all stage checkpoints

Removed early-return when recover_block == best_block. Each recover_* function has internal if ck < target guard → no-op when consistent, repair when not.

Fault tolerance after fix

persistence.rs          recovery.rs (always runs)
  [A] Execution (sync)    → recover_hashing  (if ck < target)
  [B] AccountHashing(sync)→ recover_merkle   (if ck < target)
  [C] HistoryIndex (sync) → recover_history  (if ck < target)
  [D] MerkleExecute(sync)

Each commit atomic on disk. Any crash point → recovery scans all stages → rebuilds only what is behind. No blind spots.

…pleted

Richard1048576

LGTM

…730) ## What Bump the `gravity-reth` git dependency: - `41c0b7092ad578abcfb59b3aeb0ce9ec43f5fcf7` → `acc458846c2f1fc684fa4344cf02ae9488efd252` Updates `bin/gravity_node/Cargo.toml` and the regenerated `Cargo.lock`. No SDK source changes. ## Why Pulls in **gravity-reth#340** — a RocksDB durability fix. The rev range is exactly one commit ahead (`ahead 1, behind 0`); this bump introduces #340 and nothing else. `provider_rw.commit()` called `rocksdb::DB::write()` without `fdatasync()`. With `wal_bytes_per_sync=4MB`, the low-traffic `account_db`/`storage_db` could go 2000+ blocks without syncing, risking large trie data loss on power failure and a state-root mismatch panic during consensus re-execution after a crash. The fix: 1. All `write()` calls in `commit_view()` use `write_opt(sync=true)` — every `commit()` is fsync-durable before returning. 2. Disable pipelined write and drop `wal_bytes_per_sync` (redundant with per-commit sync). 3. Recovery always re-checks every stage checkpoint (no early return), repairing only stages that are behind. Ref: Galxe/gravity-reth#340

AshinGau changed the title ~~fix(rocksdb): flush batch data into storage to make sure stage is com…~~ fix(rocksdb): flush batch data into storage to make sure stage is completed May 22, 2026

fix(rocksdb): flush batch data into storage to make sure stage is com…

0805e4f

…pleted

AshinGau force-pushed the main branch from 940114f to 0805e4f Compare May 22, 2026 07:21

Richard1048576 approved these changes May 22, 2026

View reviewed changes

AshinGau merged commit acc4588 into Galxe:main May 23, 2026
31 checks passed

nekomoto911 mentioned this pull request May 25, 2026

chore(deps): bump gravity-reth to acc45884 (rocksdb durability fix) Galxe/gravity-sdk#730

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(rocksdb): flush batch data into storage to make sure stage is completed#340

fix(rocksdb): flush batch data into storage to make sure stage is completed#340
AshinGau merged 1 commit into
Galxe:mainfrom
AshinGau:main

AshinGau commented May 22, 2026 •

edited

Loading

Uh oh!

Richard1048576 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AshinGau commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Fault tolerance after fix

Uh oh!

Richard1048576 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AshinGau commented May 22, 2026 •

edited

Loading