feat #236: Implement ReadIndex batching mechanism to improve LinearizableRead throughput#239
feat #236: Implement ReadIndex batching mechanism to improve LinearizableRead throughput#239
Conversation
…y log entry. feat #66: Refactor the Errors to distinguish between protocol logic errors and system-level errors. * feat #43: When a Raft leader is elected, it should first send an empty log entry. 1. implement verify_leadership_in_new_term 2. refactor verify_leadership_in_new_term and enforce_quorum_consensus * feat #66: Refactor the Errors to distinguish between protocol logic errors and system-level errors.
…r-friendly StorageEngine and StateMachine * feat #59: Refactor state_machine_commit_listener into a separate function to improve visibility across all newly spawned threads. * feat #79: Add snapshot feature * fix #90: Refactor: Decouple Client Command Protocol from Raft Internal Log Payload Type * feat #89: Add auto-discovery support for new learner nodes * feat #101: Revised learner join process with promotion/fail semantics * feat #45: Implemented first/last index for term to resolve replication conflicts * fix #106: Retry leadership noop confirmation until timeout * feat #102: Automatic node removal * feat #109: Enable RPC connection cache * fix #107: 1. Add back rpc connection params 2. Change parking lot sync lock to Tokio async lock in RaftMembership 3. optimize MembershipGuard for performance. Remove read locker. * feat #110: 1. add new events SnapshotCreated and LogPurgeCompleted 2. decouple Leader snapshot create to several events 3. refactor raft log index generation from pre_allocate_raft_logs_next_index to pre_allocate_id_range 4. optimize LocalLogBatch structure - based on performance analysis * feat #119: Make the StorageEngine trait more developer-friendly * feat #120: Make the StateMachine trait more developer-friendly * fix #121: Election: higher term vote request should be updated immediately and reset voted_for * feat #122: Refactor compress and decompress snapshot logic from StateMachine to StateMachineHandler * fix #123: Replace global tracing log with trace_test crate * feat #125: Make rocksdb adaptor as feature. Move sled implementation as example
…tart when path is already locked by another node
* refs #138: optimize to use long lived peer tasks for append entries request only * refs #138: try to fix race issue - Multiple threads can pass the initial check and all create new appenders * feat #138: 1. introduce crossbeam 2. optimize buffered_raft_log by creating a flush worker pool with configurable number of workers to handle persistence operations * feat #138: optimize flush_workers as configure * feat #138: optimize channel_capacity into configure * feat #139: work on a reliable monitoring metric for resource utilization * feat #140: Optimize proto bytes fields to use bytes::Bytes * feat #140: switch Into<Bytes> to AsRef<[u8]> * fix #141: Optimize RocksDB write path and Raft log loop for lower latency * feat 142: Add Read Consistency Policy Support and Lease-Based Read Optimization * feat #142: update v0.1.4 bench report * feat #142: add client doc guide * feat #143: Refactor gRPC compression configuration for Raft transport (performance optimization) - step1: disable server side client rpc compress response * feat #143: Refactor gRPC compression configuration for Raft transport (performance optimization) * feat #143: add performance diagram * feat #138: ready to publish v0.1.4 * feat #138: add ut * fix #146: buffered_raft_log.rs Code Reviews * fix #146: update bench report * fix #145: Bug: Undefined Behavior from Incorrect Lifetime Conversion in Mmap Zero-Copy Path * fix #148: fix rocksdb storage engine flush bug * fix #147: fix file state machine bug
…ion for v0.2.0 ## Overview Prepare d-engine v0.2.0 for multi-crate publishing with comprehensive documentation, improved developer experience, and production-ready examples. ## Key Changes ### 📚 Documentation Restructuring (#225) **Adopted OpenRaft documentation pattern:** - Moved docs from standalone crate to `d-engine/src/docs/` (preserves git history) - Unified documentation structure with role-based guides - All crates now have clear positioning and usage guidelines **Multi-crate README strategy:** - Added READMEs for all workspace crates with consistent structure - Clear guidance on when to use each crate vs. main `d-engine` package - Architecture diagrams showing crate relationships - Verified all crates ready for publishing (`cargo publish --dry-run`) **Link fixes after refactoring:** - Updated 14 files with cross-crate doc links to docs.rs format - Fixed intra-doc links in `d-engine/src/docs/mod.rs` - Added tested Go/Python code generation examples in `d-engine-proto` ### 🔧 Developer Experience Improvements (#226) **Reduced startup noise:** - Eliminated misleading connection errors during cluster startup - Downgraded expected connection failures from `error!` to `debug!` level - Added startup banner with config path and node prefixes `[Node1]` `[Node2]` `[Node3]` - Improved log readability with emoji indicators **Examples cleanup:** - Removed duplicate `rocksdb-cluster` example (superseded by `three-nodes-cluster`) - Fixed `quick-start-embedded` release mode config error (added `CONFIG_PATH`) - Updated all examples to match current API (`EmbeddedEngine`, `StandaloneServer`) - Clarified service-discovery examples focus on **Watch API** demonstration ### 🐛 Bug Fixes **Compilation fixes:** - Fixed `Result` type conflicts in `quick-start-embedded` (use `std::result::Result`) - Updated API imports in `sled-cluster` after module reorganization - Fixed clippy `uninlined_format_args` warning in `three-nodes-cluster` **Updated references:** - All `rocksdb-cluster` references point to `three-nodes-cluster` - Updated `include_str!` paths in `d-engine-core` after docs relocation ## Testing - ✅ All examples compile and pass clippy checks - ✅ All crates ready for publishing (dry-run verified) - ✅ Proto generation commands tested (Go/Python/Java) - ✅ No performance regressions (±5% variance within normal range) ## Files Changed - **8** crate lib.rs/README.md files - **6** example README files - **3** example source files (compilation fixes) - **5** network/logging files (startup experience) - **1** docs reorganization (mod.rs/overview.md) --- **Ready for review and merge to `develop`** 🚀
- Fix protoc working directory in d-engine-proto README (should be in d-engine-proto, not proto/ subdirectory) - Update Rust version requirement from 1.70+ to 1.85+ to match edition 2024 - Fix inconsistent imports in sled-cluster test (use d_engine_proto::client consistently across codebase)
## Problem **Critical bug**: `put()` → `get_linearizable()` returned `None` immediately after write - **Root cause**: Leader commits log but doesn't wait for state machine to apply - **Impact**: Violates linearizability guarantee - reads don't reflect committed writes - **Reproducibility**: Only on first startup (no existing data) due to timing race ## Solution Following industry best practices (inspired by OpenRaft), implemented **wait-for-apply mechanism**: 1. **Watch channel for apply notifications** - `tokio::sync::watch` notifies when `last_applied` advances - `ensure_state_machine_upto_commit_index()` waits until target index applied 2. **Configurable timeout** - `state_machine_sync_timeout_ms` (default: 10ms for local SSD) - Tests use 100ms for CI environment reliability 3. **Enhanced error diagnostics** - Timeout errors now include `current_applied` index - Helps diagnose: stuck state machine, slow apply, or no progress ## Test Coverage - **Integration tests**: Multi-node linearizable read consistency (Test 5b) - **No-sleep tests**: `test_put_get_no_sleep()` validates immediate reads - **Concurrent test fixes**: Isolated mock addresses, tempfile for test data - **Examples updated**: Use `get_linearizable()` in quick-start/service-discovery ## Performance Impact | Read Type | Latency Change | Notes | |------------------|----------------|--------------------------------| | Linearizable | +0.8% | ✅ No regression | | LeaseRead | +9.5% | ✅ Unexpected improvement | | Eventual | -6.1% |⚠️ Will optimize in v0.3.0 | | Write | -3.0% | ✅ Acceptable for correctness | **Trade-off**: -3~6% performance for **critical data consistency fix**. Linearizable read (primary use case) unaffected. ## Documentation - README example simplified (30→13 lines) with links to detailed guides - Added references to `examples/quick-start-embedded`, `service-discovery-embedded`, `quick-start-standalone` ## Breaking Changes None - API unchanged, behavior now matches documented guarantee.
## Performance Optimization - Add fast path check in wait_applied to avoid timeout future overhead - Immediately return if target_index already applied (0-cost early exit) - Expected to reduce linearizable read latency by 20-30% for common case ## Code Quality - Add debug_assert in apply_chunk for channel send failure detection - Add test_simulate_apply() helper method for unit tests ## Test Coverage (P0 + P1 + P2) Unit tests (6 new): - Fast path: already applied scenario - Slow path: wait for notification - Timeout: state machine stuck with diagnostics - Concurrent waiters: multiple readers waiting - Sequential waits: incremental apply progress - Exact match: target equals last_applied Integration test (1 new): - Concurrent write+read: 100 parallel operations validate no stale reads ## Verification - All 6 unit tests pass - All 7 linearizable integration tests pass - Clippy clean
- Fix test-utils feature propagation to d-engine-core - Replace all-features with explicit feature list in metadata.docs.rs - Add unified release tag configuration for workspace - Improve Makefile docs-check-all to simulate docs.rs build
- Add "Built with a simple vision" statement emphasizing cheap to run and simple to use - Add "Built on a core philosophy" highlighting deliberate choice of simple architectures - Explain single-threaded event loop as intentional design for strong consistency - Update both README.md and docs/overview.md for consistency - Remove generic "resource efficiency" marketing language
- Add benches/embedded-bench/ for measuring <0.1ms latency claims - Remove test-only restriction from node_id() methods (needed for benchmarks) - Clarify quick-start docs: "in-process" vs "local-first" terminology - Add LocalKvClient API documentation links for discoverability This benchmark validates the performance characteristics documented in integration-modes.md and helps distinguish embedded vs standalone metrics.
… vs embedded metrics
…mark Changes: - Fix test_handle_client_read_lease_read_valid_lease: remove incorrect replication handler expectation (valid lease doesn't trigger quorum check) - Add comprehensive embedded mode benchmark suite with detailed v0.2.0 report - Clarify standalone vs embedded mode performance metrics in README - Fix clippy warnings in embedded-bench (too_many_arguments, format strings) Benchmark Results (v0.2.0): - Write: 203K ops/sec (3.2x faster than standalone) - LeaseRead: 496K ops/sec (6.0x faster than standalone) - Eventual: 511K ops/sec (4.3x faster than standalone) Test Fix: - LeaseRead with valid lease serves reads locally without replication - Only expired lease triggers verify_leadership_limited_retry() - Test expectation was incorrect since v0.2.0 introduction
…state machine apply Changes: - Skip commit_index update wait when current commit >= readIndex - Add tracking for last_state_machine_applied_index - Improve ReadIndex state management in leader - Add comprehensive test coverage for optimization scenarios This resolves the redundant wait issue where readers would unnecessarily block on state machine apply even when the commit_index already satisfies the read requirement.
Changes: - Remove unstable latency ratio assertion in integration test - Add unit tests for slow state machine apply scenario - Fix CI disk space cleanup to preserve build toolchain - Remove unused tracing imports The integration test now focuses on functional correctness rather than performance timing, which was unreliable in CI environments. Unit tests provide precise coverage of the optimization logic using mocks. CI fix: Preserve GCC/Clang toolchain by disabling large-packages cleanup, resolving "stdbool.h not found" error during RocksDB compilation.
…tion - Add read_buffer to LeaderState for batching LinearizableRead requests - Implement dual-threshold flush mechanism (size: 50 requests, time: 10ms) - Remove redundant needs_verification check in flush_read_buffer - Update lease timestamp after successful verification for LeaseRead optimization - Add drain_read_buffer error handling during role transitions - Add comprehensive unit tests for batching behavior and edge cases This change reduces leadership verification overhead by batching multiple LinearizableRead requests and performing single quorum check per batch, significantly improving read throughput under high concurrency.
…functions - Remove verify_leadership_for_read() and verify_leadership_limited_retry() - Introduce verify_leadership_and_refresh_lease() for unified Read path - Update verify_leadership_persistent() to auto-refresh lease after Config changes - Align with Raft best practices: fast-fail for reads, persistent retry for Config Impact: Fixes linearizable read batching bugs (case6_2, case6_3) Architecture: 4-layer → 3-layer call chain, net -145 lines
…contribution guidelines Changes: - Test Migration: Move 9 LinearizableRead unit tests from server to core package * Created leader_state_linearizable_read_test.rs (430 lines) * Tests for calculate_read_index, wait_until_applied, on_noop_committed * All 566 core tests passing - Contribution Guidelines: Clarify project philosophy to protect focus * CONTRIBUTING.md: Add 20/80 rule, Issue-first workflow, out-of-scope boundaries * README.md: Add Maintainer Philosophy and Contribution Guide sections * .github: Update PR/Issue templates to enforce quality standards * Removed blank issue template to require structured reports Rationale: - Core unit tests belong in core package (single responsibility) - Clear guidelines prevent feature bloat and maintain project direction
Fixed test_batch_promotion_failure hang (28.3s) by syncing node_config timeout in TestFixture. Added LeaderState benchmarks + optimized 11 existing benchmarks by moving StateMachine/Engine creation outside iteration loops. Benchmark suite now runs in <2min (was 15min+) with <10% measurement overhead (was 90%+).
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThis PR introduces read index batching optimization for linearizable reads in the Raft consensus layer, adds new cluster state query APIs ( Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Leader
participant StateMachine
participant ReplicationGroup
Client->>Leader: ClientReadRequest (1st)
activate Leader
Leader->>Leader: Initialize read_buffer<br/>Start timeout task
Client->>Leader: ClientReadRequest (2nd–Nth)<br/>(within threshold)
Leader->>Leader: Append to read_buffer
alt Size or Time Threshold Reached
Leader->>ReplicationGroup: verify_leadership_and_refresh_lease()
ReplicationGroup-->>Leader: Lease verified
else Timeout Triggered
Leader->>ReplicationGroup: verify_leadership_and_refresh_lease()
ReplicationGroup-->>Leader: Lease verified
end
Leader->>StateMachine: wait_until_applied(max(commit_index, noop_log_id))
activate StateMachine
StateMachine-->>Leader: Ready (entries applied)
deactivate StateMachine
Leader->>Leader: Drain read_buffer<br/>Create responses for all N requests
deactivate Leader
Leader-->>Client: ClientResponse (1st–Nth)
This diagram illustrates the read batching flow: multiple concurrent linearizable read requests are buffered, and a single quorum verification is performed for the entire batch before the state machine is polled, reducing RPC overhead. Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
✨ Finishing touches
📜 Recent review detailsConfiguration used: defaults Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (9)
📒 Files selected for processing (130)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This pull request implements ReadIndex batching mechanism to improve LinearizableRead throughput (#236). The PR includes:
- New integration tests for linearizable read optimization and batching
- Addition of
examples/three-nodes-standalone/directory (appears to be a rename/reorganization fromthree-nodes-cluster) - Enhanced EmbeddedEngine API with cluster state methods (
is_leader(),leader_info()) - Benchmark improvements and new leader_state benchmarks
- Documentation updates reflecting the reorganization
Reviewed changes
Copilot reviewed 90 out of 139 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
d-engine-server/tests/linearizable_read/ |
New integration tests for read batching and optimization |
d-engine-server/src/api/embedded.rs |
Added is_leader() and leader_info() API methods with documentation |
d-engine-server/tests/embedded/cluster_state_test.rs |
New integration tests for cluster state APIs |
d-engine-server/tests/components/raft_role/leader_state_test.rs |
Test updates for batching configuration |
d-engine-server/benches/ |
Benchmark improvements to avoid repeated setup |
examples/three-nodes-standalone/ |
New example directory (reorganization) |
| Various doc files | Path updates from three-nodes-cluster to three-nodes-standalone |
| Cargo.toml files | Metadata updates for package release configuration |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Type
Related Issues
Checklist
Summary by CodeRabbit
Release Notes
New Features
is_leader(),leader_info()) to embedded enginePerformance
Documentation
Configuration
✏️ Tip: You can customize this high-level summary in your review settings.