Skip to content

fix(store): scope FSM_SYNC_MODE=nosync to raft-apply callers only#600

Merged
bootjp merged 2 commits intomainfrom
perf/fsm-apply-group-commit
Apr 23, 2026
Merged

fix(store): scope FSM_SYNC_MODE=nosync to raft-apply callers only#600
bootjp merged 2 commits intomainfrom
perf/fsm-apply-group-commit

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented Apr 23, 2026

Summary

Follow-up to #592 addressing the codex P2 correctness finding: ELASTICKV_FSM_SYNC_MODE=nosync previously affected direct (non-raft) callers of store.ApplyMutations / store.DeletePrefixAt, which have no raft-log replay as a durability backstop. Splits the API so the knob only relaxes fsync on the raft-apply path.

  • ApplyMutations / DeletePrefixAt are now unconditionally pebble.Sync. Safe for any direct caller (catalog bootstrap, admin snapshots, migrations, tests).
  • ApplyMutationsRaft / DeletePrefixAtRaft are the new raft-apply entry points; they observe s.fsmApplyWriteOpts. Called only from kv/fsm.go.

Code evidence for the bug

distribution/catalog.go:630 calls s.store.ApplyMutations directly from CatalogStore.applySaveMutations. main.go:749 wires distribution.NewCatalogStore around the raw raftGroupRuntime.store (a store.MVCCStore), so EnsureCatalogSnapshot -> CatalogStore.Save -> store.ApplyMutations is a production path that never goes through the raft apply loop. Under nosync, a crash before Pebble flush would lose acknowledged catalog writes with no raft entry to replay — exactly the failure mode codex P2 described.

Before / after per call site

Call site Before After
kv/fsm.go (raft apply: raw / 1PC / prepare / commit / abort / DEL_PREFIX) s.fsmApplyWriteOpts s.fsmApplyWriteOpts (via *Raft) — unchanged
distribution/catalog.go CatalogStore.applySaveMutations s.fsmApplyWriteOpts (UNSAFE) pebble.Sync (always)
adapter/*_test.go direct ApplyMutations / DeletePrefixAt calls s.fsmApplyWriteOpts pebble.Sync (always)
kv/shard_store.go, kv/leader_routed_store.go wrappers forwarded once forwarded twice (both variants)

Test plan

  • TestDirectApplyWriteOpts_AlwaysSync asserts the direct path resolves to pebble.Sync even when the FSM-apply mode is nosync.
  • TestDirectApplyMutations_NoSyncConfigured_StillWritesDurably exercises the public ApplyMutations / DeletePrefixAt entry points under a NoSync-configured store and confirms data is visible after a clean reopen.
  • Existing TestApplyMutations_NoSync* and BenchmarkApplyMutations_SyncMode renamed to *Raft to reflect that the knob now only governs the raft-apply path.
  • go test -race -count=1 ./... — all packages pass.
  • golangci-lint run ./... — 0 issues.

/gemini review
@codex review

Previously ApplyMutations/DeletePrefixAt used s.fsmApplyWriteOpts
unconditionally, which made ELASTICKV_FSM_SYNC_MODE=nosync affect direct
(non-raft) callers that do NOT have raft-log replay as a durability
backstop. Concrete production path:
distribution.EnsureCatalogSnapshot -> CatalogStore.Save ->
store.ApplyMutations. If the process crashed before Pebble flushed,
those acknowledged writes could be lost with no raft entry to re-apply.

This change splits the API:

- ApplyMutations / DeletePrefixAt: always pebble.Sync. Safe for any
  caller, including those that bypass raft (catalog bootstrap, admin
  snapshots, migrations, tests).
- ApplyMutationsRaft / DeletePrefixAtRaft: governed by
  s.fsmApplyWriteOpts (ELASTICKV_FSM_SYNC_MODE). Intended solely for the
  FSM apply loop; only the raft WAL fsync makes pebble.NoSync safe.

kv/fsm.go is the only caller of the new *Raft methods. CatalogStore and
every adapter test retain the always-sync ApplyMutations, so the
nosync opt-in can no longer silently weaken durability outside the
raft-apply path. MVCCStore / ShardStore / LeaderRoutedStore all
implement both variants; the in-memory mvccStore has no WAL so both
delegate to the same body.

Tests:
- Added TestDirectApplyWriteOpts_AlwaysSync asserting the direct path
  always resolves to pebble.Sync even when the FSM-apply mode is
  nosync.
- Added TestDirectApplyMutations_NoSyncConfigured_StillWritesDurably
  for functional coverage of the public entry points under a
  NoSync-configured store.
- Existing sync-mode tests + benchmark renamed to *Raft to reflect
  that the knob now only governs the raft-apply path.

Addresses codex P2 on #592.
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 23, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 34 minutes and 39 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 34 minutes and 39 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3178e60a-54ae-47d3-a2ac-3800602c7702

📥 Commits

Reviewing files that changed from the base of the PR and between 0fb7db7 and 343e870.

📒 Files selected for processing (8)
  • kv/fsm.go
  • kv/leader_routed_store.go
  • kv/shard_store.go
  • store/lsm_store.go
  • store/lsm_store_sync_mode_benchmark_test.go
  • store/lsm_store_sync_mode_test.go
  • store/mvcc_store.go
  • store/store.go
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/fsm-apply-group-commit

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Can't wait for the next one!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@bootjp bootjp enabled auto-merge April 23, 2026 17:37
@bootjp bootjp merged commit d9afba2 into main Apr 23, 2026
8 checks passed
@bootjp bootjp deleted the perf/fsm-apply-group-commit branch April 23, 2026 17:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant