Skip to content

fix(rocksdb): flush batch data into storage to make sure stage is completed#340

Merged
AshinGau merged 1 commit into
Galxe:mainfrom
AshinGau:main
May 23, 2026
Merged

fix(rocksdb): flush batch data into storage to make sure stage is completed#340
AshinGau merged 1 commit into
Galxe:mainfrom
AshinGau:main

Conversation

@AshinGau
Copy link
Copy Markdown
Collaborator

@AshinGau AshinGau commented May 22, 2026

Problem

provider_rw.commit() calls rocksdb::DB::write() which appends to WAL but does not call fdatasync(). Data sits in OS page cache after write() returns. Background sync fires only every 4MB of WAL (wal_bytes_per_sync=4MB). Three separate DB instances (state_db, account_db, storage_db) have independent WAL counters:

  • state_db gets ~50KB writes per block → syncs every ~80 blocks → data largely survives
  • account_db / storage_db get ~0-2KB per empty block → may go 2000+ blocks without sync → massive data loss on power failure

Result: after crash, state_db checkpoints report consistency but trie DBs lost committed data. Consensus re-execution reads stale trie nodes → state root mismatch → panic.

Fix

1. tx.rs — every write()write_opt(sync=true)

All 7 db.write() calls in commit_view() now use WriteOptions::set_sync(true). Every provider_rw.commit() is fsync-durable before returning.

2. mod.rs — disable pipelined write, remove wal_bytes_per_sync

With per-commit sync, background WAL sync and pipelining are redundant.

3. recovery.rs — always check all stage checkpoints

Removed early-return when recover_block == best_block. Each recover_* function has internal if ck < target guard → no-op when consistent, repair when not.

Fault tolerance after fix

persistence.rs          recovery.rs (always runs)
  [A] Execution (sync)    → recover_hashing  (if ck < target)
  [B] AccountHashing(sync)→ recover_merkle   (if ck < target)
  [C] HistoryIndex (sync) → recover_history  (if ck < target)
  [D] MerkleExecute(sync)

Each commit atomic on disk. Any crash point → recovery scans all stages → rebuilds only what is behind. No blind spots.

@AshinGau AshinGau changed the title fix(rocksdb): flush batch data into storage to make sure stage is com… fix(rocksdb): flush batch data into storage to make sure stage is completed May 22, 2026
Copy link
Copy Markdown
Collaborator

@Richard1048576 Richard1048576 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@AshinGau AshinGau merged commit acc4588 into Galxe:main May 23, 2026
31 checks passed
nekomoto911 added a commit to Galxe/gravity-sdk that referenced this pull request May 25, 2026
…730)

## What

Bump the `gravity-reth` git dependency:

- `41c0b7092ad578abcfb59b3aeb0ce9ec43f5fcf7` →
`acc458846c2f1fc684fa4344cf02ae9488efd252`

Updates `bin/gravity_node/Cargo.toml` and the regenerated `Cargo.lock`.
No SDK source changes.

## Why

Pulls in **gravity-reth#340** — a RocksDB durability fix. The rev range
is exactly one commit ahead (`ahead 1, behind 0`); this bump introduces
#340 and nothing else.

`provider_rw.commit()` called `rocksdb::DB::write()` without
`fdatasync()`. With `wal_bytes_per_sync=4MB`, the low-traffic
`account_db`/`storage_db` could go 2000+ blocks without syncing, risking
large trie data loss on power failure and a state-root mismatch panic
during consensus re-execution after a crash.

The fix:
1. All `write()` calls in `commit_view()` use `write_opt(sync=true)` —
every `commit()` is fsync-durable before returning.
2. Disable pipelined write and drop `wal_bytes_per_sync` (redundant with
per-commit sync).
3. Recovery always re-checks every stage checkpoint (no early return),
repairing only stages that are behind.

Ref: Galxe/gravity-reth#340
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants