Skip to content

feat(split-brain): Introduce lease mechanism for Kvrocks master#3397

Open
redwood9 wants to merge 5 commits intoapache:unstablefrom
redwood9:unstable
Open

feat(split-brain): Introduce lease mechanism for Kvrocks master#3397
redwood9 wants to merge 5 commits intoapache:unstablefrom
redwood9:unstable

Conversation

@redwood9
Copy link

Based on the ideas I proposed in issue #3380
#3380

this is the first step of the solution:
introducing a lease mechanism for the Kvrocks master node.
This mechanism ensures that the maximum split-brain window is bounded by the lease TTL. It also comes with flexible configuration options:

  1. No config / disabled — Behaves exactly like the current system. If the lease feature is not enabled, there is zero impact on existing behavior.
  2. Reject writes — When the lease expires, the node rejects all write operations to prevent any data conflicts / divergence.
  3. Log only — On lease expiration, the node only logs an error (no write rejection). This makes it easy to catch via monitoring / alerting systems without disrupting traffic.

To minimize invasion into the existing codebase structure, the implementation follows these strategies:

  1. Controller probing — Add a new HEARTBEAT command (non-conflicting with existing standard Redis commands). The response format remains unchanged to keep controller integration simple and backward-compatible.
  2. Unified lease check — Add the lease validation logic inside Storage::writeToDB(), so all write paths are guarded in one central place.

I will also submit the corresponding HEARTBEAT implementation in the controller at the same time / in parallel.

  Introduce a master lease mechanism that allows a master node to detect
  when it may have lost cluster ownership and optionally block writes to
  prevent split-brain data corruption.

  - Add `master_lease_mode` config option: disabled / log-only / block-write
  - Add lease atomics (`lease_deadline_`, `lease_owner_`) to `Storage`
  - Add `UpdateLease()` and `ResetLease()` methods on `Storage`
  - Check lease expiry in `Storage::Write()` and `writeToDB()` to cover
    both direct writes and `CommitTxn()` paths
  - Add `CLUSTERX HEARTBEAT <election_version> <ttl_ms>` command for
    controller-driven lease renewal
  - Reset master lease automatically on role transition to slave
  - Add C++ unit tests (`tests/cppunit/lease_test.cc`)
  - Add Go integration tests (`tests/gocase/integration/cluster/lease_test.go`)
  Introduce a master lease mechanism that allows a master node to detect
  when it may have lost cluster ownership and optionally block writes to
  prevent split-brain data corruption.

  - Add `master_lease_mode` config option: disabled / log-only / block-write
  - Add lease atomics (`lease_deadline_`, `lease_owner_`) to `Storage`
  - Add `UpdateLease()` and `ResetLease()` methods on `Storage`
  - Check lease expiry in `Storage::Write()` and `writeToDB()` to cover
    both direct writes and `CommitTxn()` paths
  - Add `CLUSTERX HEARTBEAT <election_version> <ttl_ms>` command for
    controller-driven lease renewal
  - Reset master lease automatically on role transition to slave
  - Add C++ unit tests (`tests/cppunit/lease_test.cc`)
  - Add Go integration tests (`tests/gocase/integration/cluster/lease_test.go`)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant