feat #236: Implement ReadIndex batching mechanism to improve LinearizableRead throughput#240
feat #236: Implement ReadIndex batching mechanism to improve LinearizableRead throughput#240
Conversation
…state machine apply Changes: - Skip commit_index update wait when current commit >= readIndex - Add tracking for last_state_machine_applied_index - Improve ReadIndex state management in leader - Add comprehensive test coverage for optimization scenarios This resolves the redundant wait issue where readers would unnecessarily block on state machine apply even when the commit_index already satisfies the read requirement.
Changes: - Remove unstable latency ratio assertion in integration test - Add unit tests for slow state machine apply scenario - Fix CI disk space cleanup to preserve build toolchain - Remove unused tracing imports The integration test now focuses on functional correctness rather than performance timing, which was unreliable in CI environments. Unit tests provide precise coverage of the optimization logic using mocks. CI fix: Preserve GCC/Clang toolchain by disabling large-packages cleanup, resolving "stdbool.h not found" error during RocksDB compilation.
…tion - Add read_buffer to LeaderState for batching LinearizableRead requests - Implement dual-threshold flush mechanism (size: 50 requests, time: 10ms) - Remove redundant needs_verification check in flush_read_buffer - Update lease timestamp after successful verification for LeaseRead optimization - Add drain_read_buffer error handling during role transitions - Add comprehensive unit tests for batching behavior and edge cases This change reduces leadership verification overhead by batching multiple LinearizableRead requests and performing single quorum check per batch, significantly improving read throughput under high concurrency.
…functions - Remove verify_leadership_for_read() and verify_leadership_limited_retry() - Introduce verify_leadership_and_refresh_lease() for unified Read path - Update verify_leadership_persistent() to auto-refresh lease after Config changes - Align with Raft best practices: fast-fail for reads, persistent retry for Config Impact: Fixes linearizable read batching bugs (case6_2, case6_3) Architecture: 4-layer → 3-layer call chain, net -145 lines
…contribution guidelines Changes: - Test Migration: Move 9 LinearizableRead unit tests from server to core package * Created leader_state_linearizable_read_test.rs (430 lines) * Tests for calculate_read_index, wait_until_applied, on_noop_committed * All 566 core tests passing - Contribution Guidelines: Clarify project philosophy to protect focus * CONTRIBUTING.md: Add 20/80 rule, Issue-first workflow, out-of-scope boundaries * README.md: Add Maintainer Philosophy and Contribution Guide sections * .github: Update PR/Issue templates to enforce quality standards * Removed blank issue template to require structured reports Rationale: - Core unit tests belong in core package (single responsibility) - Clear guidelines prevent feature bloat and maintain project direction
Fixed test_batch_promotion_failure hang (28.3s) by syncing node_config timeout in TestFixture. Added LeaderState benchmarks + optimized 11 existing benchmarks by moving StateMachine/Engine creation outside iteration loops. Benchmark suite now runs in <2min (was 15min+) with <10% measurement overhead (was 90%+).
…d_test.rs ## Summary Migrated 29 client read tests from multiple files into a unified test suite, renamed unclear test names, enhanced documentation, and added 3 P0 missing tests. ## Changes ### 1. File Reorganization - Renamed: leader_state_linearizable_read_test.rs → leader_state_client_read_test.rs - Migrated 17 tests from server layer (d-engine-server/leader_state_test.rs) - Migrated 12 batching tests from core layer (d-engine-core/leader_state_test.rs) - Deleted duplicate tests from original locations ### 2. Test Renaming (Clarity Improvement) - case6_1 → test_linearizable_read_quorum_failure - case6_2 → test_linearizable_read_quorum_success - case6_3 → test_linearizable_read_encounters_higher_term ### 3. Documentation Enhancement All tests now include: - Test Scenario description - Given/When/Then structure - Raft Protocol Context (where applicable) - Clear English comments following project standards ### 4. New P0 Tests Added 1. test_mixed_read_policies_in_batch - Verifies batch processing of multiple LinearizableRead requests - Validates batch optimization (single quorum check) 2. test_lease_reuse_after_linearizable_read_refresh - Validates lease timestamp refresh after LinearizableRead - Verifies LeaseRead can reuse refreshed lease (no redundant quorum) 3. test_eventual_consistency_ignores_stale_lease - Confirms EventualConsistency skips lease validation - Documents trade-off: performance vs linearizability guarantee ## Test Coverage - Total tests: 32 (29 migrated + 3 new) - All tests passing: ✅ - Test categories: * LinearizableRead core functions (9 tests) * Read policy handling (8 tests) * Read batching (15 tests) ## Rationale Consolidating client read tests improves: - Maintainability (single source of truth) - Discoverability (unified test location) - Documentation (consistent comment style) - Coverage (filled critical gaps) Follows d-engine principle: Simple, clean, maintainable code with narrow impact scope.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the 📝 WalkthroughWalkthroughThis PR implements ReadIndex batching for linearizable reads to improve throughput, reorganizes the benchmark suite (renaming d-engine-bench to standalone-bench and examples/three-nodes-cluster to three-nodes-standalone), updates Raft configuration with new snapshot, membership, watch, and RPC compression settings, refactors leadership verification from limited-retry to persistent checks, and adds comprehensive integration tests for linearizable read optimization and batching semantics. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Leader
participant ReadBuffer as Read Buffer
participant QuorumVerify as Quorum Verification
participant StateMachine as State Machine
Client->>Leader: LinearizableRead Request (arrives at t=0)
activate Leader
Leader->>ReadBuffer: Add request to buffer
ReadBuffer->>ReadBuffer: Start timeout (time_threshold_ms)
alt Buffer reaches size_threshold OR timeout fires
ReadBuffer->>Leader: Flush signal (size/time trigger)
else
Client->>Leader: More LinearizableRead Requests
Leader->>ReadBuffer: Add to same batch
end
Leader->>QuorumVerify: Single verify_leadership call for entire batch
activate QuorumVerify
QuorumVerify-->>Leader: Verified at commit_index C
deactivate QuorumVerify
Leader->>StateMachine: wait_until_applied(C)
activate StateMachine
StateMachine-->>Leader: Applied up to index C
deactivate StateMachine
Leader->>ReadBuffer: Respond all buffered requests with read_index=C
ReadBuffer-->>Client: Linearizable responses (all from same snapshot)
deactivate Leader
sequenceDiagram
participant Node as Node<br/>(Any Role)
participant RaftRole as RaftRole<br/>State Machine
participant RoleState as Current Role<br/>(Candidate/Follower/Learner)
Node->>RaftRole: Transition triggered (e.g., BecomeFollower)
activate RaftRole
RaftRole->>RoleState: drain_read_buffer()
activate RoleState
RoleState-->>RaftRole: Returns error (NotLeader)
deactivate RoleState
alt Role is Leader
RaftRole->>RoleState: drain_read_buffer() succeeds
RoleState->>RoleState: Respond all pending reads with error
else Role is Non-Leader
RaftRole->>RaftRole: Log drain failure warning
end
RaftRole->>RaftRole: Complete role transition
deactivate RaftRole
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes The PR introduces significant architectural changes to the read path (ReadIndex batching with buffering logic, timeout-driven flushes, state-machine synchronization), spans multiple interconnected components (core config, errors, events, leader/non-leader role states), adds 2,377+ lines of dense test logic covering edge cases and concurrency scenarios, reorganizes the entire benchmark infrastructure with renamed directories/binaries, and updates raft configuration with 133 new/modified lines covering snapshot, membership, watch, and RPC compression tuning. While many files involve documentation or mechanical renaming (lower effort), the core read-batching implementation and extensive test coverage require careful logic review to ensure correctness under concurrent load, role transitions, and timeout conditions. Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Pull request overview
This PR implements ReadIndex batching to significantly improve LinearizableRead throughput by sharing a single leadership verification across multiple concurrent read requests, addressing issue #236.
Changes:
- Implemented ReadIndex batching mechanism for linearizable reads with configurable size (50) and time (10ms) thresholds
- Added comprehensive integration tests validating batching behavior and linearizability guarantees
- Renamed
three-nodes-clustertothree-nodes-standaloneacross examples and documentation for clarity
Reviewed changes
Copilot reviewed 69 out of 118 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| d-engine-core/src/raft_role/leader_state.rs | Core batching logic: read buffer, flush triggers, batch processing with shared quorum verification |
| d-engine-server/tests/linearizable_read/read_batching_integration.rs | Integration tests for batching (linearizability, timeout, size threshold scenarios) |
| d-engine-core/src/config/raft.rs | Added ReadBatchingConfig with size_threshold and time_threshold_ms |
| config/base/raft.toml | Configuration documentation for read batching parameters |
| d-engine-core/src/event.rs | Added FlushReadBuffer event for timeout-triggered batch processing |
| benches/standalone-bench/reports/v0.2.2/report_v0.2.2.md | Performance report showing 428% linearizable read improvement |
| examples/* | Renamed three-nodes-cluster to three-nodes-standalone across all examples |
Comments suppressed due to low confidence (3)
d-engine-core/src/raft_role/learner_state.rs:1
- Error message incorrectly states 'attempted to create snapshot' when the actual operation is FlushReadBuffer (read buffer flushing), not snapshot creation.
d-engine-core/src/raft_role/follower_state.rs:1 - Error message incorrectly states 'attempted to create snapshot' when the actual operation is FlushReadBuffer (read buffer flushing), not snapshot creation.
d-engine-core/src/raft_role/candidate_state.rs:1 - Error message incorrectly states 'attempted to create snapshot' when the actual operation is FlushReadBuffer (read buffer flushing), not snapshot creation.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 17
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
d-engine-server/benches/state_machine.rs (1)
166-177: State machine reuse causes measurement inconsistency across benchmark iterations.The state machine is created once before benchmarking and reused across all iterations. Each benchmark iteration applies entries with fixed indices (always starting at index 1), so the same keys are overwritten repeatedly:
- First iteration: inserts fresh keys
- Subsequent iterations: overwrites existing keys
Since insert and update paths may have different performance characteristics (e.g., hash table lookups, state machine traversal), the measurements conflate insert and update overhead. The benchmark should either:
- Use unique keys per iteration to measure consistent insert performance, or
- Document the intended behavior as measuring update/overwrite performance
This pattern affects:
bench_apply_without_ttlandbench_apply_with_ttl(lines 166–195)bench_batch_applyandbench_batch_apply_with_ttl(lines 273–305) with fixedstart_index=1bench_apply_without_watch,bench_apply_with_1_watcher, andbench_apply_with_100_watchers(lines 315+)The comment at line 316 acknowledges the design but doesn't address the measurement validity concern.
d-engine-server/benches/ttl.rs (1)
172-186: State machine reuse causes key overwrites and measurements of update operations instead of inserts.This benchmark creates the state machine once and reuses it across iterations. Each iteration calls
create_entries_with_ttl(size, 1, 3600)with the samestart_index=1, generating identical keys (key_ttl_1,key_ttl_2, etc.) across all three benchmark sizes. This means:
- Iteration 1 (size=10): Inserts keys 1–10
- Iteration 2 (size=100): Overwrites keys 1–10, inserts keys 11–100
- Iteration 3 (size=1000): Overwrites keys 1–100, inserts keys 101–1000
The benchmark measures update/overwrite performance, not fresh insert performance. Use unique indices per iteration (e.g.,
create_entries_with_ttl(size, iteration * size, 3600)) or recreate the state machine for each iteration to measure consistent insert behavior.The same issue applies to
bench_varying_ttl_durations(line 278).benches/standalone-bench/reports/v0.2.0/report_v0.2.0_final.md (1)
90-134: Repro commands likely use the wrong relative path for the bench binary.You
cd examples/three-nodes-standalone(Line 91) but then execute./target/release/standalone-bench(e.g., Line 99), which likely won’t exist from that directory.Proposed fix (pick one approach)
### Start Cluster ```bash -cd examples/three-nodes-standalone +cd examples/three-nodes-standalone make start-clusterRun Benchmarks
# Single client write (10K requests) -./target/release/standalone-bench \ +../../target/release/standalone-bench \ --endpoints http://127.0.0.1:9081 --endpoints http://127.0.0.1:9082 --endpoints http://127.0.0.1:9083 \ --conns 1 --clients 1 --sequential-keys --total 10000 \ --key-size 8 --value-size 256 put(Apply the same
../../target/release/standalone-benchchange to the other benchmark blocks.)d-engine-core/src/config/raft.rs (2)
960-995: Config deserialization bug risk: partialread_batchingoverrides likely fail without per-field serde defaults.Because
ReadBatchingConfigfields lack#[serde(default)](ordefault = ...), specifying only one of the fields in TOML can make deserialization error out.Proposed fix
/// Read batching configuration #[derive(Clone, Debug, Serialize, Deserialize)] pub struct ReadBatchingConfig { /// Flush when buffer reaches this size + #[serde(default = "default_read_batching_size_threshold")] pub size_threshold: usize, /// Flush when first request ages beyond this (milliseconds) + #[serde(default = "default_read_batching_time_threshold_ms")] pub time_threshold_ms: u64, } +const fn default_read_batching_size_threshold() -> usize { + 50 +} + +const fn default_read_batching_time_threshold_ms() -> u64 { + 10 +} + impl Default for ReadBatchingConfig { fn default() -> Self { Self { - size_threshold: 50, - time_threshold_ms: 10, + size_threshold: default_read_batching_size_threshold(), + time_threshold_ms: default_read_batching_time_threshold_ms(), } } }Also applies to: 966-983
1011-1020: Add validation forread_consistency.read_batchingthresholds.Guard against
0(and potentially very large values) to avoid pathological behavior at runtime.Proposed fix
impl ReadConsistencyConfig { fn validate(&self) -> Result<()> { // Validate read consistency configuration if self.lease_duration_ms == 0 { return Err(Error::Config(ConfigError::Message( "read_consistency.lease_duration_ms must be greater than 0".into(), ))); } + if self.read_batching.size_threshold == 0 { + return Err(Error::Config(ConfigError::Message( + "read_consistency.read_batching.size_threshold must be greater than 0".into(), + ))); + } + if self.read_batching.time_threshold_ms == 0 { + return Err(Error::Config(ConfigError::Message( + "read_consistency.read_batching.time_threshold_ms must be greater than 0".into(), + ))); + } Ok(()) } }
🤖 Fix all issues with AI agents
In @.github/workflows/ci.yml:
- Around line 54-63: The workflow step using the reusable action currently
references jlumbroso/free-disk-space@main which is unstable; change the uses
reference to a specific release tag (e.g., jlumbroso/free-disk-space@vX.Y.Z) to
pin the dependency and ensure reproducible CI; locate the step with name "Free
Disk Space (Ubuntu)" or the uses string "jlumbroso/free-disk-space@main", query
the action repo releases to find the latest stable tag, and replace "@main" with
that tag in the uses field.
In @benches/standalone-bench/reports/v0.2.2/report_v0.2.2.md:
- Around line 7-10: The TL;DR and conclusion contain conflicting performance
claims; reconcile them to exactly match the numerical results shown in the
tables referenced (update the Write percent speedup, Read speedup ratios, and
Unique/LeaseRead multipliers so they are numerically identical to the table
values), remove or correct any blanket statements like “all metrics improved” if
any table shows regressions, and ensure the same corrected wording appears in
the TL;DR and conclusion sections (also apply these fixes to the same phrasing
in the other occurrences noted around lines 37-43 and 270-276).
- Around line 23-24: Choose a single canonical date and make it consistent
across the report: update the "Test Date" header (currently "Test Date: January
9, 2026"), the footer "Report Date: December 13, 2025", and any inline
references such as "Core Optimizations (January 9, 2026)" so they all use the
same chosen date; specifically replace the inconsistent strings at the "Test
Date" line, the footer "Report Date" line, and the occurrences around lines
289-292 to the selected date.
In @d-engine-server/benches/lease_performance.rs:
- Around line 176-177: The benchmark bench_batch_ttl_operations reuses the same
StateMachine (created by create_sm_background and assigned to sm_bg) across
different batch sizes so apply_chunk reuses overlapping log indices and keys;
fix by instantiating a fresh StateMachine for each batch size (call
create_sm_background/construct a new StateMachine per batch size before
preparing/applying that size's entries) or by resetting the StateMachine state
(clearing last_applied_index and stored entries) between sizes and between
b.iter() iterations so apply_chunk sees monotonic indices and each benchmark run
starts from a clean state.
In @d-engine-server/benches/ttl.rs:
- Around line 248-257: The benchmark currently seeds 100 expired entries once
(outside the per-iteration closure) then runs ten noop cleanup triggers per
iteration so the first iteration removes most expired entries and subsequent
iterations measure the noop path; fix by repopulating or resetting state inside
each iteration: move the loop that creates the 100 expired entries (the code
that calls create_noop_entry and sm.apply_chunk) into the async closure passed
to group.bench_function (or alternatively recreate/reset the StateMachine
instance `sm` at the start of each iteration) so that each iteration starts with
the same 100 expired entries before running the ten cleanup noops.
- Around line 214-221: The benchmark currently creates the 50 expired and 50
active entries once and then measures cleanup by repeatedly applying a noop (in
the group.bench_function closure using create_noop_entry and sm.apply_chunk),
which means only the first iteration measures the intended "cleanup 50 expired
entries" case; subsequent iterations see no expired entries. Fix by repopulating
the expired entries inside each measured iteration (move the setup that creates
the 50 expired/50 active entries into the async closure executed per iteration)
or switch to iter_custom and manually start/stop timing so you can recreate
expired entries between timing intervals; ensure the code that calls
create_noop_entry and sm.apply_chunk remains the measured operation while setup
is redone per iteration.
In @d-engine-server/tests/linearizable_read/linearizable_read_optimization.rs:
- Around line 391-419: Replace the fixed paths
node1_config_path/node2_config_path/node3_config_path with TempDir-scoped paths:
create a tempfile::TempDir at the start of the test, join filenames like
"linear_read_n1.toml" etc. onto that tempdir.path() and use those paths in the
tokio::fs::write calls for node1_config, node2_config and node3_config; ensure
the TempDir is held for the test lifetime so files are cleaned up automatically
to avoid cross-test collisions.
- Around line 43-63: ClusterConfig uses a randomized listen_address but
initial_cluster[0].address is hardcoded to "127.0.0.1:9081", causing mismatch;
update the test to compute the bind address once and reuse it for both
ClusterConfig.listen_address and NodeMeta.address (i.e., generate the port/addr
into a local variable and assign that same value to listen_address and
initial_cluster[0].address) so the node advertises the actual bound address for
membership and RPCs.
In @d-engine-server/tests/linearizable_read/read_batching_integration.rs:
- Around line 390-462: The assertion in test_batching_throughput_improvement is
flaky because it requires throughput2 >= throughput1; relax the check to allow a
small regression budget (e.g., require throughput2 >= throughput1 * 0.9) or
alternatively run multiple iterations of the benchmark (using NUM_REQUESTS
loops) and compare medians of throughput1/throughput2 before asserting; update
the regression check near throughput1/throughput2 and reference
test_batching_throughput_improvement, create_single_node_with_batching, and the
throughput1/throughput2 variables when making the change.
- Around line 48-68: ClusterConfig currently sets listen_address to a generated
ephemeral port but hardcodes initial_cluster[0].address to "127.0.0.1:9081",
causing mismatched advertised membership; change the code so the same address
string is used for both listen_address and NodeMeta.address (e.g., compute let
addr = format!("127.0.0.1:{}", port) / parse it once and assign addr.clone() to
initial_cluster[0].address) so ClusterConfig.listen_address and
initial_cluster[].address match; update references to ClusterConfig,
listen_address, initial_cluster, and NodeMeta accordingly.
In @d-engine/src/docs/examples/three-nodes-standalone.md:
- Around line 5-6: Update the broken relative links that point to the benches
directory: replace occurrences of "../../benches/..." with
"../../../../benches/..." for the two hrefs that currently use
"../../benches/..." and also update the link text that currently displays
"../../benches/..." to "../../../../benches/..." so the displayed text matches
the correct href (adjust the three instances of the "../../benches/..." string
in the markdown).
🟡 Minor comments (13)
d-engine-server/benches/watch_overhead.rs-203-206 (1)
203-206: Engine cleanup is skipped if benchmark panics before completion.The cleanup blocks execute after all benchmark iterations, but if the benchmark panics (e.g., from an
.expect()call), theengine.stop()code is never reached. WhileEmbeddedEngine's Drop implementation warns of improper shutdown, it does not automatically stop the engine, potentially leaving resources (ports, threads) unreleased.For benchmarks this may be acceptable since resources are cleaned up on process exit, but consider whether Criterion provides panic-safe teardown hooks or a Drop guard pattern to improve CI/local development hygiene.
Also applies to: 253-256, 305-308
benches/embedded-bench/Makefile-2-2 (1)
2-2: Fix stale report references (v0.2.0 → v0.2.2) to avoid confusion.
Line 2 still points at the old report naming/version while the rest of the Makefile points to v0.2.2.Also applies to: 211-211
benches/embedded-bench/README.md-27-29 (1)
27-29: Simplify the relative path reference to standalone-bench reports.The path
../../benches/standalone-bench/reports/v0.2.2/report_v0.2.2.mdis unnecessarily convoluted; frombenches/embedded-bench/, use../standalone-bench/reports/v0.2.2/report_v0.2.2.mdinstead. The filenamereport_v0.2.2.mdis correct and confirmed to exist.benches/embedded-bench/Makefile-78-78 (1)
78-78: Update "Comparable to" comments to use current embedded-bench CLI syntax.The Makefile comments reference non-existent CLI flags (
--conns) and subcommands (range) that embedded-bench does not support. Update lines 78, 97, 120, 138, 156, and 174 to reflect the actual current CLI:embedded-bench --mode local --total <N> --clients <N> <put|get> [--consistency <policy>]instead of the deprecated--connsandrangereferences.d-engine-core/src/raft_role/learner_state.rs-302-309 (1)
302-309: Fix misleading RoleViolation context forFlushReadBuffer.This arm handles
RaftEvent::FlushReadBufferbut the message says “attempted to create snapshot.” Suggest aligning the message with the actual operation.Proposed fix
RaftEvent::FlushReadBuffer => { return Err(ConsensusError::RoleViolation { current_role: "Learner", required_role: "Leader", - context: format!("Learner node {} attempted to create snapshot.", ctx.node_id), + context: format!("Learner node {} attempted to flush read buffer.", ctx.node_id), } .into()); }d-engine-core/src/raft_role/follower_state.rs-357-367 (1)
357-367: Fix misleading RoleViolation context forFlushReadBuffer.The error says “attempted to create snapshot”, but this arm is for
RaftEvent::FlushReadBuffer(ReadIndex batching). Also consider usingself.node_id()for consistency.Proposed fix
RaftEvent::FlushReadBuffer => { return Err(ConsensusError::RoleViolation { current_role: "Follower", required_role: "Leader", context: format!( - "Follower node {} attempted to create snapshot.", - ctx.node_id + "Follower node {} attempted to flush read buffer.", + self.node_id() ), } .into()); }d-engine-core/src/raft_role/candidate_state.rs-400-410 (1)
400-410: FlushReadBuffer RoleViolation message is incorrect/misleading (mentions snapshot).This looks copy/paste from snapshot handling; for
RaftEvent::FlushReadBufferthe context should mention flushing/draining the read buffer (or batching flush), not snapshot creation. Also consider usingself.node_id()for consistency with the rest of the role-state code.Proposed fix
RaftEvent::FlushReadBuffer => { return Err(ConsensusError::RoleViolation { current_role: "Candidate", required_role: "Leader", context: format!( - "Candidate node {} attempted to create snapshot.", - ctx.node_id + "Candidate node {} attempted to flush read buffer.", + self.node_id() ), } .into()); }d-engine-core/src/raft_role/follower_state_test.rs-15-49 (1)
15-49: Replace hard-coded/tmppaths with per-test temporary directories and use structured error matching.
- Hard-coded
/tmppaths can collide under parallel test runs and fail on non-Unix systems. Usetempfile::tempdir()ortempfile::TempDirto create isolated, platform-safe temporary directories per test.- String-based error matching (
error_str.contains("...")) is brittle and breaks when debug formatting changes. Use structured matching viamatches!on error enum variants (e.g.,matches!(e, MembershipError::NotLeader)) instead.This pattern is already established in the codebase (e.g.,
d-engine-server/tests/components/raft_role/follower_state_test.rsusestempfile::tempdir()).Also applies to: 55-80
d-engine-core/src/raft_role/mod.rs-10-19 (1)
10-19: Updatedrain_read_bufferdocumentation to accurately reflect behavior.The doc comment "other roles are no-op" is misleading. The default implementation in
role_state.rsreturnsErr(MembershipError::NotLeader), not a no-op. Update the comment to reflect this: "Only Leader drains the buffer; other roles return NotLeader error."Good news: Call sites in
raft.rs(lines 309, 330, 399) are already correctly placed in thehandle_role_event()role transition handlers and properly handle the error withif let Err(e) = ...logging.The
on_noop_committedmethod is documented correctly and behaves as documented (true no-op for non-leaders).benches/embedded-bench/reports/v0.2.2/embedded_bench_report_v0.2.2.md-72-80 (1)
72-80: “Performance Ladder” ordering is misleading when the multipliers are equal/near-equal.If Eventual and LeaseRead are both ~1.8x vs Linearizable, prefer a tie notation and/or more precise ratios derived from the table values.
benches/standalone-bench/reports/v0.2.2/report_v0.2.2.md-168-170 (1)
168-170: Update the Go version in the environment note to Go 1.16 or newer.The benchmark report lists Go 1.8.3 for etcd 3.5.0, but etcd 3.5.0 requires Go 1.16+ (per release notes and build documentation). Correct this at lines 168-170.
d-engine-server/tests/linearizable_read/read_batching_integration.rs-242-303 (1)
242-303: Timing-based assertions: widen bounds or use mocked time to avoid flaky failures.
E.g., “10–20ms” expectations can fail under noisy CI scheduling even if functionality is correct.Also applies to: 305-371
d-engine-core/src/raft_role/role_state.rs-471-474 (1)
471-474: Fix copy/paste log message indrain_read_bufferdefault impl.
Currently logs"update_match_index NotLeader error"which is misleading and makes operational debugging harder.Proposed fix
fn drain_read_buffer(&mut self) -> Result<()> { - warn!("update_match_index NotLeader error"); + warn!("drain_read_buffer NotLeader error"); Err(MembershipError::NotLeader.into()) }
🧹 Nitpick comments (20)
d-engine-server/tests/components/raft_role/leader_state_test.rs (1)
1831-1857: Appreciate the memory footprint documentation approach.The soft guideline with historical tracking is a good practice for awareness without creating brittle tests. The cache-line calculation provides useful context for performance-conscious development.
Minor observation: The comment on line 1855 appears incomplete ("LeaderState spans {} cache lines" - missing the actual value reference), but this is benign since
div_ceil(64)is printed on line 1856.Optional: Complete the comment for consistency
- // For reference: Modern CPU cache line is 64 bytes - // LeaderState spans {} cache lines + // For reference: Modern CPU cache line is 64 bytes println!("Cache lines occupied: {}", size.div_ceil(64));d-engine-server/benches/state_machine.rs (2)
315-327: Clarify whether the benchmark measures insert or update performance.The comment acknowledges that state machine initialization occurs outside the loop, but doesn't address that the same 100 keys (starting at index 1) are applied in every iteration. This means after the first iteration, all subsequent iterations measure update performance rather than insert performance.
If measuring insert performance is intended, use unique indices per iteration or recreate the state machine.
334-384: State machine reuse impacts watch benchmark measurements.While the watch system is recreated per iteration (lines 338, 395, 452), the state machine is reused across iterations. Each iteration applies 100 entries with indices starting at 1, accumulating state:
- Iteration 1: 100 fresh inserts
- Iteration 2: 100 updates to existing keys
- Iteration N: 100 updates plus accumulated overhead
This accumulation could affect the measured overhead of watch notifications. Consider using unique indices per iteration to ensure consistent measurement conditions.
Also applies to: 391-440, 448-497
d-engine-server/benches/lease_performance.rs (1)
145-148: Verify async iteration methodology change is intentional.The benchmarks have been updated from synchronous iteration to
b.to_async(&rt).iter()for async iteration. This changes what's being measured, as async iteration includes async runtime overhead in the benchmark timings.This may provide more realistic results if the actual workload is async, but it's worth confirming:
- Is this methodology change intentional?
- Should historical benchmark results be considered incomparable with new results?
- Does this align with how the code is used in production?
Also applies to: 158-161, 184-197
examples/quick-start-embedded/Cargo.toml (1)
7-8: Path dependency is fine for in-repo examples, but consider a more “portable” pattern.If users copy
examples/quick-start-embeddedout of the repo,path = "../../d-engine"will break. If portability matters, consider a workspace dependency (or document that this example is intended to be built from the repo root workspace).d-engine-core/src/raft.rs (1)
308-312: Good safety improvement: drain read buffer on step-down; consider deduping the transition boilerplate.
The same “drain + warn” block appears in multiple transitions; a small helper (best-effort) would reduce drift risk.Also applies to: 329-333, 398-402
README.md (1)
175-175: Fix redundant phrasing.LanguageTool flagged line 175 as containing redundant phrasing. Review and simplify if needed.
✏️ Suggested edit
Current line 175:
- **Breaking changes are OK pre-1.0** - we prioritize getting it right over backward compatibilityConsider rewording for clarity (if redundancy exists in the actual whitespace/formatting):
- **Breaking changes are OK pre-1.0** - we prioritize correctness over backward compatibilitybenches/standalone-bench/docker/Dockerfile (1)
24-24: Rename looks consistent; consider--lockedfor reproducible image builds.Optional tweak
-RUN cargo build --release --bin standalone-bench +RUN cargo build --release --locked --bin standalone-benchAlso applies to: 42-42
d-engine-core/src/raft_role/candidate_state_test.rs (1)
15-49: Replace hard-coded/tmp/paths withtempfile::tempdir()and match error variants directly instead of debug-format strings.Both tests use hard-coded
/tmp/test_candidate_*paths that collide when tests run in parallel, and rely on brittleformat!("{e:?}").contains(...)assertions that break if error debug formatting changes. Thetempfilecrate is already used elsewhere in the codebase (config_test.rs, state_machine_handler_test.rs); use it here. Replace debug-string matching with pattern matching on the concreteError::Consensus(ConsensusError::RoleViolation { ... })andReplicationError::NotLeader { ... }variants.Also applies to: 55-78
d-engine-core/src/raft_role/learner_state_test.rs (3)
17-49: Avoid brittle string-matching for error assertions (prefer type/variant matching).
format!("{e:?}")+contains(...)will break on error display/debug changes; prefermatches!againstcrate::Error/crate::ConsensusErrorvariants if available.Proposed refactor (illustrative)
+use crate::{ConsensusError, Error}; ... - if let Err(e) = result { - let error_str = format!("{e:?}"); - assert!( - error_str.contains("RoleViolation"), - "Error should be RoleViolation, got: {error_str}" - ); - assert!( - error_str.contains("Learner"), - "Error should mention Learner role" - ); - assert!( - error_str.contains("Leader"), - "Error should mention Leader as required role" - ); - } + assert!( + matches!( + result, + Err(Error::Consensus(ConsensusError::RoleViolation { + current_role: "Learner", + required_role: "Leader", + .. + })) + ), + "Learner should reject FlushReadBuffer with RoleViolation(Learner -> Leader), got: {result:?}" + );
18-19: Hard-coded/tmp/...paths can make tests flaky under parallelism/leftovers.Consider using a per-test temp dir (or at least unique paths) to avoid collisions and make cleanup deterministic.
Also applies to: 58-59
48-49: Redundantdrop(shutdown_tx).Letting it fall out of scope is sufficient; keeping it is harmless but noise.
benches/standalone-bench/reports/v0.2.2/report_v0.2.2.md (1)
67-76: Table labeling is confusing: “Before (v0.2.2)” vs “After (optimized)” within a v0.2.2 report.Consider renaming to “Baseline” vs “Optimized” and, ideally, include the commit SHAs/config diffs used for each run.
d-engine-server/benches/leader_state_bench.rs (2)
134-146: Benchmark likely measuresArc::clone/allocation noise more thanLeaderState::new.
TempDir+ config setup outside the iter is good, but consider also making theLeaderState::newbench avoid any hidden filesystem/config overhead insidenew(if any).
156-216: Move per-iteration fixture setup outside the measured region usingiter_batchedonAsyncBencher.The current benchmarks time TempDir creation, mock setup, and config cloning alongside
process_pending_promotions. Useb.to_async(&rt).iter_batched(setup, routine)withBatchSize::SmallInputto separate fixture creation (setup) from the function being measured (routine).This applies to lines 225–242 and 251–278.
d-engine-core/src/raft_role/leader_state_client_read_test.rs (1)
1237-1526: Avoid reloading config in tests when you can assert againststate.node_config.
E.g.,test_read_buffer_size_trigger/test_read_buffer_time_triggercallnode_config(...)again, which can drift from the config actually used by theLeaderState.config/base/raft.toml (2)
43-69: Config structure looks aligned with batching feature; ensure deserialization + defaults are covered.
Given new nested tables ([raft.read_consistency.read_batching],[raft.membership]), I’d strongly prefer a config-load unit test that parses this base file (or an override) and asserts the fields are populated.Also applies to: 188-196
37-39: Consider fixing the config key typo (if it’s not locked for backward-compat).
rpc_peer_connectinon_monitor_interval_in_seclooks misspelled; if you can’t rename due to compatibility, consider supporting both keys with clear precedence.d-engine-core/src/raft_role/leader_state.rs (2)
157-162: Avoid widening API surface just for tests (noop_log_idaspub).
Since you already addedtest_get_noop_log_id, prefer keeping the field non-public (orpub(super)) to reduce accidental coupling.Also applies to: 251-259
2666-2732: Refactor: reusewait_until_applied()insideprocess_linearizable_read_batch().
Right now the “wait for apply” logic is duplicated; calling the helper would keep semantics aligned and reduce future drift.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (7)
benches/d-engine-bench/Cargo.lockis excluded by!**/*.lockbenches/embedded-bench/reports/v0.2.2/dengine_comparison_v0.2.2.pngis excluded by!**/*.pngbenches/standalone-bench/reports/v0.1.3/dengine_comparison_v0.1.3.pngis excluded by!**/*.pngbenches/standalone-bench/reports/v0.1.4/dengine_comparison_v0.1.4.pngis excluded by!**/*.pngbenches/standalone-bench/reports/v0.2.0/dengine_comparison_v0.2.0.pngis excluded by!**/*.pngbenches/standalone-bench/reports/v0.2.2/dengine_comparison_v0.2.2.pngis excluded by!**/*.pngexamples/three-nodes-cluster/docker/jepsen/sshkeys/id_rsa.pubis excluded by!**/*.pub
📒 Files selected for processing (111)
.github/ISSUE_TEMPLATE/blank.md.github/ISSUE_TEMPLATE/feature_request.md.github/PULL_REQUEST_TEMPLATE.md.github/workflows/ci.ymlCONTRIBUTING.mdCargo.tomlREADME.mdbenches/embedded-bench/Makefilebenches/embedded-bench/README.mdbenches/embedded-bench/reports/v0.2.0/embedded_bench_report_v0.2.0.mdbenches/embedded-bench/reports/v0.2.2/embedded_bench_report_v0.2.2.mdbenches/embedded-bench/src/main.rsbenches/standalone-bench/.gitignorebenches/standalone-bench/Cargo.tomlbenches/standalone-bench/Makefilebenches/standalone-bench/README.mdbenches/standalone-bench/docker/Dockerfilebenches/standalone-bench/docker/README.mdbenches/standalone-bench/docker/docker-compose.ymlbenches/standalone-bench/docker/soak-test.shbenches/standalone-bench/reports/v0.1.2/report_20250419.mdbenches/standalone-bench/reports/v0.1.3/report_20250901.mdbenches/standalone-bench/reports/v0.1.4/report_20251011.mdbenches/standalone-bench/reports/v0.1.4/report_20251205_remeasured.mdbenches/standalone-bench/reports/v0.2.0/report_v0.2.0_final.mdbenches/standalone-bench/reports/v0.2.2/report_v0.2.2.mdbenches/standalone-bench/rust-toolchain.tomlbenches/standalone-bench/rustfmt.tomlbenches/standalone-bench/src/main.rsconfig/base/raft.tomld-engine-core/src/config/raft.rsd-engine-core/src/errors.rsd-engine-core/src/event.rsd-engine-core/src/raft.rsd-engine-core/src/raft_role/candidate_state.rsd-engine-core/src/raft_role/candidate_state_test.rsd-engine-core/src/raft_role/follower_state.rsd-engine-core/src/raft_role/follower_state_test.rsd-engine-core/src/raft_role/leader_state.rsd-engine-core/src/raft_role/leader_state_client_read_test.rsd-engine-core/src/raft_role/learner_state.rsd-engine-core/src/raft_role/learner_state_test.rsd-engine-core/src/raft_role/mod.rsd-engine-core/src/raft_role/role_state.rsd-engine-server/Cargo.tomld-engine-server/README.mdd-engine-server/benches/leader_state_bench.rsd-engine-server/benches/lease_performance.rsd-engine-server/benches/state_machine.rsd-engine-server/benches/ttl.rsd-engine-server/benches/watch_overhead.rsd-engine-server/tests/components/raft_role/leader_state_test.rsd-engine-server/tests/integration_test.rsd-engine-server/tests/linearizable_read/linearizable_read_optimization.rsd-engine-server/tests/linearizable_read/mod.rsd-engine-server/tests/linearizable_read/read_batching_integration.rsd-engine/src/docs/examples/mod.rsd-engine/src/docs/examples/single-node-expansion.mdd-engine/src/docs/examples/three-nodes-standalone.mdd-engine/src/docs/overview.mdd-engine/src/docs/performance/throughput-optimization-guide.mdd-engine/src/docs/quick-start-standalone.mdd-engine/src/docs/server_guide/customize-state-machine.mdexamples/README.mdexamples/quick-start-embedded/Cargo.tomlexamples/quick-start-standalone/Makefileexamples/quick-start-standalone/README.mdexamples/service-discovery-standalone/Makefileexamples/service-discovery-standalone/README.mdexamples/three-nodes-cluster/docker/jepsen/sshkeys/authorized_keysexamples/three-nodes-cluster/docker/jepsen/sshkeys/id_rsaexamples/three-nodes-cluster/docker/jepsen/sshkeys/known_hostsexamples/three-nodes-standalone/.gitignoreexamples/three-nodes-standalone/Cargo.tomlexamples/three-nodes-standalone/LICENSEexamples/three-nodes-standalone/Makefileexamples/three-nodes-standalone/README.mdexamples/three-nodes-standalone/config/n1.tomlexamples/three-nodes-standalone/config/n2.tomlexamples/three-nodes-standalone/config/n3.tomlexamples/three-nodes-standalone/docker/.envexamples/three-nodes-standalone/docker/Dockerfileexamples/three-nodes-standalone/docker/README.mdexamples/three-nodes-standalone/docker/config/n1.tomlexamples/three-nodes-standalone/docker/config/n2.tomlexamples/three-nodes-standalone/docker/config/n3.tomlexamples/three-nodes-standalone/docker/config/n4.tomlexamples/three-nodes-standalone/docker/config/n5.tomlexamples/three-nodes-standalone/docker/docker-compose.ymlexamples/three-nodes-standalone/docker/jepsen/Dockerfileexamples/three-nodes-standalone/docker/jepsen/Makefileexamples/three-nodes-standalone/docker/jepsen/README.mdexamples/three-nodes-standalone/docker/jepsen/entrypoint.shexamples/three-nodes-standalone/docker/jepsen/project.cljexamples/three-nodes-standalone/docker/jepsen/src/jepsen/d_engine.cljexamples/three-nodes-standalone/docker/jepsen/vendor/dengine_ctl/Cargo.tomlexamples/three-nodes-standalone/docker/jepsen/vendor/dengine_ctl/README.mdexamples/three-nodes-standalone/docker/jepsen/vendor/dengine_ctl/src/main.rsexamples/three-nodes-standalone/docker/monitoring/.envexamples/three-nodes-standalone/docker/monitoring/docker-compose.ymlexamples/three-nodes-standalone/docker/monitoring/grafana/dashboards/dashboard.jsonexamples/three-nodes-standalone/docker/monitoring/grafana/dashboards/node_exporter_dashboard.jsonexamples/three-nodes-standalone/docker/monitoring/grafana/datasources/loki-datasource.ymlexamples/three-nodes-standalone/docker/monitoring/grafana/datasources/prometheus-datasource.ymlexamples/three-nodes-standalone/docker/monitoring/prometheus/Dockerfileexamples/three-nodes-standalone/docker/monitoring/prometheus/README.mdexamples/three-nodes-standalone/docker/monitoring/prometheus/prometheus.ymlexamples/three-nodes-standalone/docker/monitoring/promtail/config.ymlexamples/three-nodes-standalone/samply_reports/.gitignoreexamples/three-nodes-standalone/scripts/reset.shexamples/three-nodes-standalone/src/main.rs
💤 Files with no reviewable changes (5)
- examples/three-nodes-cluster/docker/jepsen/sshkeys/known_hosts
- examples/three-nodes-cluster/docker/jepsen/sshkeys/authorized_keys
- examples/three-nodes-cluster/docker/jepsen/sshkeys/id_rsa
- benches/embedded-bench/reports/v0.2.0/embedded_bench_report_v0.2.0.md
- .github/ISSUE_TEMPLATE/blank.md
🧰 Additional context used
🧬 Code graph analysis (13)
d-engine-server/benches/watch_overhead.rs (1)
d-engine-core/src/watch/manager.rs (2)
watcher_count(185-190)key(58-60)
d-engine-core/src/raft_role/follower_state.rs (3)
d-engine-core/src/raft_role/role_state.rs (1)
node_id(42-44)d-engine-server/src/node/client/local_kv.rs (1)
node_id(434-436)d-engine-server/src/node/mod.rs (1)
node_id(342-344)
d-engine-core/src/raft_role/learner_state_test.rs (2)
d-engine-core/src/raft_role/mod.rs (1)
state(239-246)d-engine-client/src/error.rs (1)
e(114-114)
d-engine-server/benches/leader_state_bench.rs (1)
d-engine-server/tests/components/raft_role/leader_state_test.rs (1)
new(3756-3803)
d-engine-core/src/raft_role/learner_state.rs (3)
d-engine-core/src/raft_role/role_state.rs (1)
node_id(42-44)d-engine-server/src/node/client/local_kv.rs (1)
node_id(434-436)d-engine-server/src/node/mod.rs (1)
node_id(342-344)
d-engine-core/src/raft_role/candidate_state_test.rs (2)
d-engine-core/src/raft_role/mod.rs (1)
state(239-246)d-engine-client/src/error.rs (1)
e(114-114)
d-engine-server/tests/linearizable_read/read_batching_integration.rs (1)
d-engine-server/src/node/builder.rs (4)
storage_engine(183-189)state_machine(192-198)raft_config(210-216)from_cluster_config(154-161)
d-engine-core/src/raft_role/candidate_state.rs (3)
d-engine-core/src/raft_role/role_state.rs (1)
node_id(42-44)d-engine-server/src/node/client/local_kv.rs (1)
node_id(434-436)d-engine-server/src/node/mod.rs (1)
node_id(342-344)
d-engine-server/benches/state_machine.rs (1)
d-engine-server/benches/ttl.rs (1)
create_test_state_machine(33-54)
d-engine-core/src/raft.rs (1)
d-engine-client/src/error.rs (1)
e(114-114)
d-engine-server/tests/linearizable_read/linearizable_read_optimization.rs (3)
d-engine-server/src/node/builder.rs (4)
storage_engine(183-189)state_machine(192-198)raft_config(210-216)from_cluster_config(154-161)d-engine-server/tests/components/raft_role/leader_state_test.rs (1)
new(3756-3803)d-engine-server/src/api/embedded.rs (2)
client(501-503)start_custom(277-335)
d-engine-server/tests/components/raft_role/leader_state_test.rs (4)
d-engine-server/src/node/builder.rs (1)
node_config(201-207)d-engine-core/src/config/raft.rs (14)
default(107-125)default(197-204)default(266-273)default(325-333)default(378-384)default(518-540)default(646-650)default(709-714)default(734-739)default(886-894)default(977-982)default(986-994)default(1109-1117)default(1219-1225)d-engine-core/src/raft_role/leader_state.rs (2)
new(108-116)new(2173-2217)d-engine-core/src/raft_role/role_state.rs (1)
commit_index(220-222)
d-engine-server/benches/ttl.rs (2)
d-engine-server/benches/state_machine.rs (2)
create_test_state_machine(41-57)create_entries_with_ttl(130-159)d-engine-proto/src/exts/common_ext.rs (1)
noop(15-19)
🪛 LanguageTool
README.md
[style] ~175-~175: This phrase is redundant. Consider using “over”.
Context: ...OK pre-1.0** - we prioritize getting it right over backward compatibility This approach...
(RIGHT_OVER)
🪛 markdownlint-cli2 (0.18.1)
examples/service-discovery-standalone/README.md
33-33: Hard tabs
Column: 6
(MD010, no-hard-tabs)
33-33: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
34-34: Hard tabs
Column: 6
(MD010, no-hard-tabs)
35-35: Hard tabs
Column: 6
(MD010, no-hard-tabs)
36-36: Hard tabs
Column: 6
(MD010, no-hard-tabs)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Lint and Format Check
- GitHub Check: Lint and Format Check
benches/standalone-bench/reports/v0.2.2/standalone_bench_report_v0.2.2.md
Show resolved
Hide resolved
benches/standalone-bench/reports/v0.2.2/standalone_bench_report_v0.2.2.md
Show resolved
Hide resolved
d-engine-server/tests/consistent_reads/linearizable_read_consistency_embedded.rs
Show resolved
Hide resolved
d-engine-server/tests/consistent_reads/linearizable_read_consistency_embedded.rs
Outdated
Show resolved
Hide resolved
d-engine-server/tests/consistent_reads/linearizable_read_batching_embedded.rs
Show resolved
Hide resolved
d-engine-server/tests/consistent_reads/linearizable_read_batching_embedded.rs
Show resolved
Hide resolved
Major changes: - Makefile: Reduce command count from ~45 to ~33 (delete 12 obsolete targets) - Rename test-all → test with integrated examples compilation check - Add bench-save and automatic regression detection to bench target - Add version consistency check and docs.rs simulation to pre-release - Replace manual feature enumeration with --all-features Documentation improvements: - Add three-nodes-embedded example README with HA deployment guide - Add ha-deployment-load-balancing.md for production HA patterns - Add graceful shutdown handling to three-nodes-embedded example - Fix #241: Data persistence bug (missing engine.stop() call) - Add "Recommended Entry Point" section to overview.md Breaking changes: - Removed commands: test-all, test-verbose, test-doc, test-examples, docs-all, docs-private, docs-crate, doc, doc-check, bench-compile, bench-gate, build-examples - Updated all references: test-all → test in README, CONTRIBUTING, PULL_REQUEST_TEMPLATE
… fix data loss bugs Add production-ready storage layer with full crash recovery test coverage and fix critical durability violations in both RocksDB and File storage engines. Test Infrastructure (+590 lines): - Add StateMachineTestSuite with crash recovery tests * test_drop_persists_last_applied * test_ungraceful_shutdown_recovery * test_data_survives_reopen - Add StorageEngineTestSuite with durability tests * test_persist_entries_durability * test_hard_state_crash_recovery * test_truncate_durability * test_purge_durability - Add test_purge_empty_range edge case coverage Critical Fixes: - RocksDB: Fix iterator bug causing data loss on restart * Iterator.last() with IteratorMode::End returned smallest key instead of largest * Caused last_index=1 when should be last_index=N * Fix: Use iterator.next() to get first (largest) key from End - RocksDB: Add Drop implementations to flush on drop - File: Implement physical deletion in purge() for durability * Previously only deleted from memory, entries reappeared after restart * Now rewrites file to physically remove purged entries - File: Add Drop implementation for FileLogStore Module Structure: - Expose storage module as public API (d-engine-core/src/lib.rs) - Rename rocksdb_engine_test.rs -> rocksdb_storage_engine_test.rs - Add rocksdb_state_machine_test.rs - Add integration test structure
## Problem State machine read operations were not protected during snapshot restoration, potentially causing race conditions. During PHASE 2.5 (temporary DB swap to release lock), `get()` could access the temporary empty database, returning incorrect empty results instead of proper errors.
Add integration test validating Learner can safely receive AppendEntries during snapshot application, specifically testing PHASE 2.5 temporary DB window safety. Test scenario: - Start 3-node cluster, write 150 entries (triggers snapshot at 100) - Add readonly Learner (Node 4) which begins receiving snapshot - Concurrently write 50 more entries while Learner applies snapshot - Verify Learner catches up to all 200 entries with correct values Key validations: - Learner successfully joins cluster (role=3 Learner, status=1 readonly) - Snapshot generation at entry 100 completes successfully - Concurrent writes during snapshot application succeed - Learner receives and applies snapshot without data loss - PHASE 2.5 temporary DB swap does not cause corruption Technical notes: - Single Learner requires readonly mode to avoid being kicked out - Uses get_eventual() for Learner reads (not get_linearizable()) - All nodes' initial_cluster includes Learner configuration upfront Location: tests/snapshot/concurrent_replication_embedded.rs
Critical fixes: - Fix partial leader init when on_noop_committed fails - Fix address mismatch in test cluster configs (3 files) - Fix benchmark state pollution (TTL + lease benchmarks) Documentation: - Add performance-overview.md as top-level entry point - Refactor standalone/embedded reports with mode context - Fix date inconsistency (Dec 13 -> Jan 9, 2026) - Fix relative paths to benches directory - Fix data contradictions in conclusion section
- Recreate engine per iteration in watch_overhead benchmark to avoid state pollution - Add bincode 1.3.3 to advisory ignore list (team considers it complete) Fixes benchmark timeout caused by engine state accumulation across iterations. Addresses cargo-deny advisory warning for unmaintained but stable dependency.
Version Updates: - Bump workspace version from 0.2.1 to 0.2.2 - Update CHANGELOG with v0.2.2 highlights
Type
Related Issues
Checklist
Summary by CodeRabbit
Release Notes
New Features
Performance Improvements
Updates
✏️ Tip: You can customize this high-level summary in your review settings.