Skip to content

feat(cluster): batched Raft commands for compaction manifests#403

Merged
xe-nvdk merged 1 commit intomainfrom
feat/batch-raft-file-ops
Apr 14, 2026
Merged

feat(cluster): batched Raft commands for compaction manifests#403
xe-nvdk merged 1 commit intomainfrom
feat/batch-raft-file-ops

Conversation

@xe-nvdk
Copy link
Copy Markdown
Member

@xe-nvdk xe-nvdk commented Apr 14, 2026

Summary

  • Replaces 40 sequential RegisterFile/DeleteFile Raft applies per compaction manifest with a single CommandBatchFileOps log entry, cutting apply latency from ~200ms to ~5ms for typical manifests
  • New CommandBatchFileOps (type 10) FSM handler dispatches to existing idempotent register/delete helpers — all files in a batch share the same LSN (the batch log index), which is semantically correct since a compaction job's outputs are causally simultaneous
  • ManifestBridge interface extended with BatchFileOps; CompletionWatcher.applyOne rewritten to issue one batch call while preserving the output_written → sources_deleted two-phase ordering invariant
  • CommandBatchFileOps added to the forward-apply security allowlist with the same HMAC + role authorization as existing file-manifest commands
  • 13 new tests across FSM, watcher, and bridge layers; all pre-existing tests updated and passing

Closes #399

Test plan

  • go test ./internal/cluster/raft/... -run TestFSMBatch — 4 FSM tests (happy path, registers-only, unknown op type, empty batch)
  • go test ./internal/compaction/... -run TestWatcher — all watcher tests including 4 new batch-specific ones
  • go test ./internal/cluster/... -run TestCompactionBridge_Batch — 4 bridge tests
  • go build ./... — clean build

Replace the O(N) sequential RegisterFile/DeleteFile applies in the
CompletionWatcher with a single CommandBatchFileOps log entry per
manifest. For a typical manifest with 20 outputs + 20 deleted sources
this drops Raft round-trips from 40 to 1, cutting apply latency from
~200ms to ~5ms.

- Add CommandBatchFileOps (type 10) to the FSM with BatchFileOp /
  BatchFileOpsPayload types and an applyBatchFileOps handler that
  dispatches to the existing idempotent register/delete helpers.
- Add Node.BatchFileOps and Coordinator.BatchFileOpsInManifest with the
  same leader-or-forward semantics as the single-command path.
- Extend ManifestBridge with BatchFileOps; rewrite CompletionWatcher.
  applyOne to build register/delete slices and issue one batch call,
  preserving the output_written → sources_deleted two-phase ordering
  invariant.
- Add CommandBatchFileOps to the forward-apply security allowlist.
- 13 new tests across FSM, watcher, and bridge layers.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes compaction manifest processing by batching RegisterFile and DeleteFile operations into a single Raft log entry (CommandBatchFileOps). This change reduces Raft round-trips from O(N) to 1 per manifest, significantly cutting apply latency while maintaining idempotency and the two-phase write-then-delete invariant. The implementation includes updates to the CompactionBridge, Coordinator, Raft FSM, and CompletionWatcher, along with comprehensive test coverage. I have no feedback to provide.

@xe-nvdk xe-nvdk merged commit a92cb09 into main Apr 14, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Batched Raft commands for compaction manifests

1 participant