Conversation
…y log entry. feat #66: Refactor the Errors to distinguish between protocol logic errors and system-level errors. * feat #43: When a Raft leader is elected, it should first send an empty log entry. 1. implement verify_leadership_in_new_term 2. refactor verify_leadership_in_new_term and enforce_quorum_consensus * feat #66: Refactor the Errors to distinguish between protocol logic errors and system-level errors.
…r-friendly StorageEngine and StateMachine * feat #59: Refactor state_machine_commit_listener into a separate function to improve visibility across all newly spawned threads. * feat #79: Add snapshot feature * fix #90: Refactor: Decouple Client Command Protocol from Raft Internal Log Payload Type * feat #89: Add auto-discovery support for new learner nodes * feat #101: Revised learner join process with promotion/fail semantics * feat #45: Implemented first/last index for term to resolve replication conflicts * fix #106: Retry leadership noop confirmation until timeout * feat #102: Automatic node removal * feat #109: Enable RPC connection cache * fix #107: 1. Add back rpc connection params 2. Change parking lot sync lock to Tokio async lock in RaftMembership 3. optimize MembershipGuard for performance. Remove read locker. * feat #110: 1. add new events SnapshotCreated and LogPurgeCompleted 2. decouple Leader snapshot create to several events 3. refactor raft log index generation from pre_allocate_raft_logs_next_index to pre_allocate_id_range 4. optimize LocalLogBatch structure - based on performance analysis * feat #119: Make the StorageEngine trait more developer-friendly * feat #120: Make the StateMachine trait more developer-friendly * fix #121: Election: higher term vote request should be updated immediately and reset voted_for * feat #122: Refactor compress and decompress snapshot logic from StateMachine to StateMachineHandler * fix #123: Replace global tracing log with trace_test crate * feat #125: Make rocksdb adaptor as feature. Move sled implementation as example
…tart when path is already locked by another node
* refs #138: optimize to use long lived peer tasks for append entries request only * refs #138: try to fix race issue - Multiple threads can pass the initial check and all create new appenders * feat #138: 1. introduce crossbeam 2. optimize buffered_raft_log by creating a flush worker pool with configurable number of workers to handle persistence operations * feat #138: optimize flush_workers as configure * feat #138: optimize channel_capacity into configure * feat #139: work on a reliable monitoring metric for resource utilization * feat #140: Optimize proto bytes fields to use bytes::Bytes * feat #140: switch Into<Bytes> to AsRef<[u8]> * fix #141: Optimize RocksDB write path and Raft log loop for lower latency * feat 142: Add Read Consistency Policy Support and Lease-Based Read Optimization * feat #142: update v0.1.4 bench report * feat #142: add client doc guide * feat #143: Refactor gRPC compression configuration for Raft transport (performance optimization) - step1: disable server side client rpc compress response * feat #143: Refactor gRPC compression configuration for Raft transport (performance optimization) * feat #143: add performance diagram * feat #138: ready to publish v0.1.4 * feat #138: add ut * fix #146: buffered_raft_log.rs Code Reviews * fix #146: update bench report * fix #145: Bug: Undefined Behavior from Incorrect Lifetime Conversion in Mmap Zero-Copy Path * fix #148: fix rocksdb storage engine flush bug * fix #147: fix file state machine bug
|
Important Review skippedToo many files! 48 files out of 198 files are above the max files limit of 150. You can disable this status message by setting the 📝 WalkthroughWalkthroughWorkspace-wide v0.2.0 migration: adds workspace Cargo layout and new crates, introduces a unified client crate and KvClient trait with GrpcKvClient (put_with_ttl, watch), moves protobuf imports to d_engine_proto, adds lease/watch/TTL plumbing and Raft/replication/raft-log/lease traits, broadens many visibilities, adds tests, benchmarks, CI/workflow and tooling updates, and adds .claudeignore. No breaking runtime protocol changes beyond documented API/signature adjustments. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant App as Application
participant Client as d-engine Client
participant Pool as ConnectionPool
participant CM as ClusterManagementService (gRPC)
participant Leader as Leader Node (gRPC)
App->>Client: Client::builder(endpoints)
Client->>Pool: build_connections()
Pool->>CM: get_cluster_metadata()
CM-->>Pool: ClusterMembership{nodes, current_leader_id}
Pool->>Pool: parse_cluster_metadata() -> leader addr
Pool-->>Client: client ready (kv + cluster clients)
App->>Client: kv().put(key, val) / put_with_ttl(...)
Client->>Leader: ClientWriteRequest (gRPC)
Leader-->>Client: ClientResponse (success/error)
Client->>App: return result (mapped to KvClientError/KvResult)
App->>Client: kv().watch(key)
Client->>Leader: WatchRequest (stream)
Leader-->>Client: streaming WatchResponse events
Client->>App: stream events to caller
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Areas to focus during review:
Possibly related PRs
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
This release (v0.2.0) represents a major refactoring of the d-engine-core codebase, focusing on improved modularity, visibility control, and integration with the extracted d-engine-proto package. The changes establish clearer boundaries between internal implementation and public API while maintaining backward compatibility for test infrastructure.
Key changes:
- Extracted protocol buffer definitions to separate
d-engine-protopackage - Enhanced visibility controls with
pub/pub(crate)annotations for better API boundaries - Improved test infrastructure with
test-utilsfeature flag - Added watch functionality for client notifications
- Refactored storage layer with explicit safety contracts
Reviewed changes
Copilot reviewed 106 out of 373 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| d-engine-core/src/test_utils/mock/* | Updated mock utilities to use extracted proto definitions and improved visibility |
| d-engine-core/src/storage/* | Added comprehensive documentation and safety contracts for storage traits |
| d-engine-core/src/state_machine_handler/* | Refactored to use d-engine-proto and added watch notification support |
| d-engine-core/src/raft_role/* | Enhanced leader change notifications and atomic leader tracking |
| d-engine-core/src/replication/* | Added single-node cluster optimization and improved error handling |
| d-engine-core/src/network/* | Refactored transport layer with extracted proto definitions |
| d-engine-core/src/config/* | Updated configuration to use d-engine-proto types |
| d-engine-core/src/lib.rs | Restructured exports for clearer public API surface |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 18
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (13)
d-engine-core/src/storage/storage_test.rs (1)
39-60: Update constant references to use imported NodeRole variants.The code references
FOLLOWERandLEARNERconstants that are no longer imported. Based on the imports at lines 6-7, these should be updated to use theNodeRoleenum variants.Apply this diff to fix the role assignments:
NodeMeta { id: 2, address: "127.0.0.1:10000".to_string(), - role: FOLLOWER, + role: Follower as i32, status: NodeStatus::Active.into(), }, NodeMeta { id: 3, address: "127.0.0.1:10000".to_string(), - role: FOLLOWER, + role: Follower as i32, status: NodeStatus::Active.into(), }, NodeMeta { id: 4, address: "127.0.0.1:10000".to_string(), - role: LEARNER, + role: Learner as i32, status: NodeStatus::Active.into(), }, NodeMeta { id: 5, address: "127.0.0.1:10000".to_string(), - role: LEARNER, + role: Learner as i32, status: NodeStatus::Active.into(), },README.md (3)
57-66: Fix markdown: stray Rustuse ...line is outside the fenced code block.
use d-engine::{RaftCore, MemoryStorage, Config};appears outside a ```rust fence, which will render incorrectly (and the crate path also looks suspicious with the hyphen in code).
208-211: Typos/links: “d-eninge” and Apache URL.
Fix the project name typo and consider usinghttpsfor the Apache license link.
91-100: Consider standardizing import paths in README examples for consistency.The RocksDB example (line 96) uses
use d_engine::{...}while earlier examples useuse d_engine_server::{...}. Both import paths work correctly sinced_enginere-exports all types fromd_engine_serverunder theserverandrocksdbfeatures. However, standardizing to one approach throughout the documentation would improve clarity and reduce potential confusion for users.d-engine-core/src/test_utils/mock/mock_rpc.rs (1)
141-151: Fix copy/paste error message indiscover_leader(misleading debugging).
TheNonecase returns"No mock get_cluster_metadata response set"insidediscover_leader; this should referencediscover_leader.- None => Err(tonic::Status::unknown( - "No mock get_cluster_metadata response set", - )), + None => Err(tonic::Status::unknown( + "No mock discover_leader response set", + )),d-engine-core/src/state_machine_handler/default_state_machine_handler.rs (1)
1197-1209: Hardcoded offset in parse_snapshot_dirname doesn't match dynamic prefix length.Line 1209 uses a hardcoded offset of 9 (
"snapshot-".len() = 9), but the function now accepts a dynamicsnapshot_dir_prefixparameter. If the prefix differs from"snapshot-", the parsing will be incorrect.- // Remove fixed parts - let core = &name[9..name.len()]; // "snapshot-".len() = 9, + // Remove the prefix (e.g., "snapshot-") + let core = &name[snapshot_dir_prefix.len()..];d-engine-core/src/maybe_clone_oneshot.rs (2)
151-161: Potential panic in Clone implementation when test_inner is None.Line 158 uses
.unwrap()onself.test_inner.as_ref(), which will panic iftest_innerisNone. Although this code is gated bytest-utils, the panic could occur if someone constructs aMaybeCloneOneshotReceiverwithtest_inner: Noneand then tries to clone it.#[cfg(any(test, feature = "test-utils"))] impl<T: Send + Clone> Clone for MaybeCloneOneshotReceiver<T> { fn clone(&self) -> Self { let (_, receiver) = oneshot::channel(); Self { inner: receiver, - test_inner: Some(self.test_inner.as_ref().unwrap().resubscribe()), + test_inner: self.test_inner.as_ref().map(|rx| rx.resubscribe()), } } }
119-138: Busy-wait polling pattern causes excessive CPU usage.The
Future::pollimplementation at line 124 callscx.waker().wake_by_ref()when the channel is empty, creating a busy-wait loop that will continuously poll without yielding to the executor properly. This can cause 100% CPU usage.The proper pattern would be to use
tokio::pin!with the asyncrecv()method, or implement a proper waker registration. For a quick fix in test-only code:Err(broadcast::error::TryRecvError::Empty) => { - // Register a Waker to wake up the task when data arrives - cx.waker().wake_by_ref(); - Poll::Pending + // Broadcast channels don't support proper waker registration via try_recv. + // This is a limitation in test-only code; consider using recv().await pattern + // in async contexts instead. + cx.waker().wake_by_ref(); + Poll::Pending }Consider adding a doc comment warning about this behavior, or restructuring to avoid
Futureimpl entirely in test mode.d-engine-core/src/network/backgroup_snapshot_transfer_test.rs (1)
185-189: Comment is inconsistent with code.The comment states "2 chunks * 5KB = 10KB" but line 156 creates 3 chunks:
create_snapshot_stream(3, 5 * 1024). Update the comment to reflect the actual test data.- // Calculate expected minimum time: - // Total data = 2 chunks * 5KB = 10KB = 80,000 bits - // Bandwidth = 1 Mbps = 1,000,000 bps - // Minimum time = 80,000 / 1,000,000 = 0.08 seconds + // Calculate expected minimum time: + // Total data = 3 chunks * 5KB = 15KB = 120,000 bits + // Bandwidth = 1 Mbps = 1,000,000 bps + // Minimum time = 120,000 / 1,000,000 = 0.12 secondsd-engine-core/src/raft_role/candidate_state.rs (1)
483-493: Incorrect role in error message: "Learner" should be "Candidate".This appears to be a copy-paste error. The
current_roleand context message incorrectly identify the node as a "Learner" when it's actually a "Candidate".RaftEvent::LogPurgeCompleted(_purged_id) => { return Err(ConsensusError::RoleViolation { - current_role: "Learner", + current_role: "Candidate", required_role: "Leader", context: format!( - "Learner node {} should not receive LogPurgeCompleted event.", + "Candidate node {} should not receive LogPurgeCompleted event.", ctx.node_id ), } .into()); }d-engine-core/src/raft_role/mod.rs (1)
133-136: Minor: fix log typo (“wtih”).- debug!( - "New Shared State wtih, hard_state_from_db:{:?}, last_applied_index_option:{:?} ", + debug!( + "New Shared State with, hard_state_from_db:{:?}, last_applied_index_option:{:?} ", &hard_state_from_db, &last_applied_index_option );d-engine-client/src/pool.rs (1)
153-187: Normalize bootstrap endpoint addresses before dialing (avoid scheme-related hard failures).
You already normalizeNodeMeta.addressviaaddress_str, butendpointsgo straight intoEndpoint::try_from. If configs commonly use127.0.0.1:50051, pool creation will fail immediately.pub(super) async fn load_cluster_metadata( endpoints: &[String], config: &ClientConfig, ) -> std::result::Result<ClusterMembership, ClientApiError> { for addr in endpoints { - match Self::create_channel(addr.clone(), config).await { + let addr = address_str(addr); + match Self::create_channel(addr.clone(), config).await { Ok(channel) => {d-engine-client/src/error.rs (1)
150-204: Preserve server error messages in the defaultStatusmapping.
Right now the_arm dropsstatus.message()and replaces it with “Unhandled status code…”, which makes debugging harder.- _ => Self::Business { - code: ErrorCode::Uncategorized, - message: format!("Unhandled status code: {code:?}"), - required_action: None, - }, + _ => Self::Business { + code: ErrorCode::Uncategorized, + message: format!("Unhandled status code: {code:?}: {message}"), + required_action: None, + }, }
🧹 Nitpick comments (46)
d-engine-core/src/maybe_clone_oneshot_test.rs (2)
1-227: Consider adding error case tests for completeness.While the happy path coverage is excellent, consider adding tests for error/failure scenarios to ensure robust behavior:
- Sending after all receivers are dropped
- Receiving after sender is dropped
- Double-send on the oneshot channel (if applicable)
- Timeout scenarios or cancellation
These tests would validate that
MaybeCloneOneshothandles edge cases and failures gracefully.
49-50: Optional: Simplify variable handling.The intermediate
rx1_mutandrx2_mutvariables are unnecessary. You could declarerx1andrx2as mutable directly or use them inline:- let mut rx1_mut = rx1; - let mut rx2_mut = rx2; - let received1 = rx1_mut.recv().await.expect("Failed to receive rx1"); - let received2 = rx2_mut.recv().await.expect("Failed to receive rx2"); + let received1 = rx1.recv().await.expect("Failed to receive rx1"); + let received2 = rx2.recv().await.expect("Failed to receive rx2");d-engine-core/src/config/config_test.rs (1)
9-12: LGTM! Unsafe block correctly guards environment variable removal.The
unsafeblock is properly added as required by Rust's safety model, and the safety justification is reasonable for test code where the#[serial]attribute ensures sequential execution.Optionally, consider refining the safety comment to be more precise:
- // SAFETY: Test-only cleanup in single-threaded test context + // SAFETY: Test-only cleanup; all callers are marked #[serial] to prevent + // concurrent execution, and no other threads access env vars during tests unsafe { std::env::remove_var(&key); }MIGRATION_GUIDE.md (1)
20-22: Add language specifier to fenced code block.The code fence is missing a language identifier, which improves syntax highlighting and follows Markdown best practices.
Apply this diff:
-```` +```text Entry fields: ..., ttl_secs: u32 (4 bytes, relative TTL)</blockquote></details> <details> <summary>d-engine-client/src/utils.rs (1)</summary><blockquote> `28-40`: **Optional: Simplify by reducing duplication.** The function is correct and clear, but you could reduce duplication by combining the scheme-preservation cases. Apply this diff to consolidate the logic: ```diff pub(crate) fn address_str(addr: &str) -> String { - // Detect if the address already has a scheme - if addr.starts_with("https://") { - // Preserve HTTPS scheme - addr.to_string() - } else if addr.starts_with("http://") { - // Preserve HTTP scheme + if addr.starts_with("https://") || addr.starts_with("http://") { + // Preserve existing scheme addr.to_string() } else { // No scheme present, default to HTTP - format!("http://{addr}",) + format!("http://{addr}") } }README.md (1)
200-202: Docs claim is very strong—ensure it’s accurate or soften it.
“1000+ integration tests” and “Jepsen validation” are credibility-critical statements; if they’re not continuously true for v0.2.0, consider qualifying (e.g., “has CI integration tests and Jepsen coverage”) or linking to evidence.d-engine-core/src/test_utils/mock/mock_rpc.rs (1)
155-194:watch()returns an immediately-closed stream—verify that’s what tests expect.
Because_txis dropped right away, the client will observe EOF immediately. If you need a “quiet but open” stream for watch tests, store the sender onMockRpcService(or return a stream from a test-controlled receiver).d-engine-core/src/state_machine_handler/default_state_machine_handler.rs (2)
717-737: Silent failures in notify_watchers may hide decoding issues.When
WriteCommand::decodefails at line 721, the error is silently ignored. While this is acceptable for non-command payloads, decoding failures for valid command payloads could indicate data corruption or protocol mismatches that should be logged.Consider adding debug-level logging for decode failures:
// Decode the WriteCommand - if let Ok(write_cmd) = WriteCommand::decode(bytes.as_ref()) { + match WriteCommand::decode(bytes.as_ref()) { + Ok(write_cmd) => { match write_cmd.operation { Some(Operation::Insert(insert)) => { watch_mgr.notify_put(insert.key, insert.value); } Some(Operation::Delete(delete)) => { watch_mgr.notify_delete(delete.key); } None => { // No operation, skip } } + } + Err(e) => { + tracing::debug!("Failed to decode WriteCommand: {}", e); + } }
1185-1194: Function namezero_copy_bytes_from_mmapis misleading since it copies data.The function uses
Bytes::copy_from_slice(slice)which performs a full memory copy, not zero-copy. The documentation and naming suggest zero-copy behavior that isn't actually implemented.Either rename the function to reflect actual behavior, or implement true zero-copy using
Bytes::from_owner:fn zero_copy_bytes_from_mmap( mmap_arc: Arc<Mmap>, start: usize, end: usize, ) -> Bytes { - // Get a slice of the memory map - let slice = &mmap_arc[start..end]; - Bytes::copy_from_slice(slice) + // True zero-copy: use Bytes::from_owner to share the Arc<Mmap> + let len = end - start; + Bytes::from_owner(mmap_arc).slice(start..end) }Alternatively, rename to
bytes_from_mmapto avoid the misleading "zero_copy" prefix.d-engine-core/src/storage/raft_log.rs (3)
253-260: Defaultwait_durableimplementation may mask bugs in MemFirst implementations.The default no-op implementation is documented for DiskFirst strategy, but implementers using MemFirst might accidentally rely on the default and violate durability guarantees. Consider making this method required (no default) to force explicit implementation.
If keeping the default, consider adding a compile-time or runtime check:
async fn wait_durable( &self, index: u64, ) -> Result<()> { - // Default implementation for DiskFirst: no-op (already durable) + // Default implementation for DiskFirst: no-op (already durable) + // WARNING: MemFirst implementations MUST override this method + #[cfg(debug_assertions)] + tracing::trace!(index, "wait_durable: using default no-op (DiskFirst assumed)"); let _ = index; Ok(()) }
46-55: Inconsistency betweenfirst_entry_idreturn value andis_emptycheck.The doc states
first_entry_id()returns0if log is empty, butis_empty()at line 91 states it's equivalent to(first_entry_id() == 0). However, after a purge operation,first_entry_id()could be non-zero while the log is technically not empty. The documentation should clarify the distinction between "empty log" and "log with entries purged".Consider clarifying the documentation:
/// Returns the smallest log index (inclusive). /// /// # Returns - /// - `0` if log is empty + /// - `0` if log is empty (never written to) /// - First index otherwise (typically 1 after initialization) + /// - After purge: first non-purged index (NOT 0)
374-377:purge_logs_up_totakesLogIdbut only needs index.The method accepts a full
LogIdbut the documentation at line 357 says "LogId of last entry to purge" while the actual need is just the index. Consider whether term validation is needed here or if justu64would suffice.If term validation isn't required:
- async fn purge_logs_up_to( - &self, - cutoff_index: LogId, - ) -> Result<()>; + async fn purge_logs_up_to( + &self, + cutoff_index: u64, + ) -> Result<()>;Or add documentation explaining why
LogIdis needed.d-engine-core/src/membership.rs (1)
17-32:ConnectionType::all()allocates a new Vec on each call.For a frequently-called method returning static data, consider using a const array or lazy_static.
impl ConnectionType { - pub fn all() -> Vec<ConnectionType> { - vec![ - ConnectionType::Control, - ConnectionType::Data, - ConnectionType::Bulk, - ] - } + pub const ALL: &'static [ConnectionType] = &[ + ConnectionType::Control, + ConnectionType::Data, + ConnectionType::Bulk, + ]; + + pub fn all() -> &'static [ConnectionType] { + Self::ALL + } }d-engine-core/src/raft_test.rs (1)
7-62: Channel tests verify basic mpsc behavior rather than Raft logic.These tests (
test_leader_change_listener_registration,test_multiple_listeners,test_no_leader_notification,test_channel_closed) primarily validatetokio::sync::mpscchannel semantics rather than Raft-specific code. While they serve as documentation of expected behavior, consider whether they add sufficient value given they test standard library functionality.d-engine-client/Cargo.toml (1)
29-32: Consider migrating remaining dependencies to workspace.The dependencies
arc-swap,tokio-stream, andasync-traituse explicit versions while others useworkspace = true. For consistency and centralized version management, consider adding these to the workspace dependencies.d-engine-core/src/state_machine_handler/default_state_machine_handler_test.rs (2)
312-380: Significant commented-out code block.This ~70-line commented-out test (
test_install_snapshot_case1) should either be removed if no longer needed, or restored and fixed if still valuable. Leaving large commented code blocks can cause confusion about test coverage.Consider either:
- Removing this commented block if the test is obsolete
- Restoring and updating the test if it provides valuable coverage
- Converting to a tracking issue if the test needs significant rework
1527-1611: Additional large commented-out code block.Another ~85-line commented-out helper function (
mock_node_with_rpc_service) that appears to be unused. This should be cleaned up for the same reasons as the previous comment.d-engine-core/src/storage/state_machine_test.rs (1)
346-374: Consider resetting state machine between measurements for cleaner benchmarks.The scalability test runs the large batch (1000 entries) on a state machine that already contains 100 entries from the small batch. While this still detects O(N²) complexity, the measurements may be influenced by the existing state. For a purer comparison, consider using a fresh state machine instance for each measurement.
That said, the current approach tests realistic incremental workloads, which is also valuable.
d-engine-core/src/storage/lease.rs (2)
31-31: Consider standardizing to English in documentation.The doc comment contains Chinese characters "(租约)". For consistency with the rest of the codebase documentation, consider using English only or adding an English translation.
114-129:get_expired_keyshas non-obvious mutation semantics.The method name uses
get_prefix but performs mutation ("Removes returned keys from internal lease indexes"). While documented, this could be surprising. Consider renaming todrain_expired_keysortake_expired_keysto make the side effect clearer in the method name.d-engine-client/src/kv_client.rs (1)
137-172: Consider documenting theget_multisignature difference.The
get_multimethod takes&[Bytes]while other methods useimpl AsRef<[u8]> + Send. This is intentional (to avoid allocation issues with iterators), but a brief doc comment explaining whyBytesis required here could help implementors understand the design choice.d-engine-client/src/pool_test.rs (1)
103-121: Test name doesn't match behavior.The test is named
test_create_channel_successbut it actually tests the failure case with an invalid address. Consider renaming totest_create_channel_invalid_address_failsor similar for clarity.#[tokio::test] #[traced_test] -async fn test_create_channel_success() { +async fn test_create_channel_invalid_address_fails() { let config = ClientConfig { id: get_now_as_u32(), connect_timeout: Duration::from_millis(1000),d-engine-core/src/storage/mod.rs (1)
12-17: Consider gating test modules behind a feature flag.The test modules (
state_machine_test,storage_engine_test) arepub modwith#[doc(hidden)]. If these are only needed for testing, consider using#[cfg(any(test, feature = "test-utils"))]to avoid including test code in production builds.#[doc(hidden)] pub use raft_log::*; -#[doc(hidden)] -pub mod state_machine_test; -#[doc(hidden)] -pub mod storage_engine_test; +#[cfg(any(test, feature = "test-utils"))] +#[doc(hidden)] +pub mod state_machine_test; +#[cfg(any(test, feature = "test-utils"))] +#[doc(hidden)] +pub mod storage_engine_test;Cargo.toml (1)
50-125: Remove commented-out code.Large blocks of commented-out configuration (old package settings, dependencies, dev-dependencies) clutter the manifest and should be removed. If historical reference is needed, git history preserves it.
d-engine-client/src/kv_error.rs (1)
56-63: Consider adding a dedicatedNotLeadervariant for better client handling.Currently,
ErrorCode::NotLeadermaps toKvClientError::ServerError(message), which loses the semantic distinction. Clients often need to handle "not leader" specially (e.g., for retry with redirection). A dedicated variant would enable:KvClientError::NotLeader { leader_id: Option<String>, leader_address: Option<String> }This would allow clients to automatically redirect to the leader without parsing error messages.
d-engine-client/src/proto/client_ext.rs (1)
49-62: Consider avoiding unnecessary clone ininto_read_results.The current implementation clones the results vector via
.clone().into_iter(). Sinceselfis consumed (moved), you can useinto_iter()directly on the owned data.fn into_read_results(self) -> std::result::Result<Vec<Option<ClientResult>>, ClientApiError> { self.validate_error()?; - match &self.success_result { + match self.success_result { Some(SuccessResult::ReadData(data)) => data .results - .clone() .into_iter() .map(|item| { Ok(Some(ClientResult { key: item.key, value: item.value, })) }) .collect(), - _ => { - let found = match &self.success_result { + other => { + let found = match &other { Some(SuccessResult::WriteAck(_)) => "WriteAck", None => "None", _ => "Unknown", };benches/d-engine-bench/src/main.rs (1)
120-140: Consider documenting the key generation behavior whenkey_spaceis 0.The current implementation will panic on division by zero if
key_spaceisSome(0). Consider adding validation or documenting this edge case.fn generate_prefixed_key( sequential: bool, key_size: usize, index: u64, key_space: Option<u64>, ) -> String { // Apply key space limit if specified - let effective_index = key_space.map_or(index, |space| index % space); + let effective_index = key_space.map_or(index, |space| { + debug_assert!(space > 0, "key_space must be > 0"); + if space == 0 { index } else { index % space } + }); if sequential {d-engine-core/src/event.rs (1)
191-195: Placeholder mapping forStepDownSelfRemovedshould use a dedicated variant.Mapping
StepDownSelfRemovedtoTestEvent::CreateSnapshotEventas a placeholder is confusing and could cause issues in test assertions. Consider adding a dedicatedTestEvent::StepDownSelfRemovedvariant.pub enum TestEvent { // ... existing variants ... PromoteReadyLearners, + StepDownSelfRemoved, } // In raft_event_to_test_event: RaftEvent::StepDownSelfRemoved => { - // StepDownSelfRemoved is handled at Raft level, not converted to TestEvent - // This is a control flow event, not a user-facing event - TestEvent::CreateSnapshotEvent // Placeholder - this event won't be emitted to tests + TestEvent::StepDownSelfRemoved }d-engine-client/src/lib.rs (2)
57-65: Consider explicit re-exports instead of wildcard exports.Using
pub use module::*for multiple modules can lead to unintended public API exposure and potential naming conflicts. Consider explicitly re-exporting only the intended public types.For example:
pub use builder::ClientBuilder; pub use cluster::ClusterClient; pub use config::ClientConfig; pub use error::ClientApiError; // ... etc
171-191: Therefreshmethod safely updates connection state.The implementation correctly:
- Loads current state
- Creates a new pool with updated endpoints
- Atomically swaps the inner state
However, the
&mut selfrequirement may be overly restrictive sinceArcSwap::storeonly needs&self.- pub async fn refresh( - &mut self, + pub async fn refresh( + &self, new_endpoints: Option<Vec<String>>, ) -> std::result::Result<(), ClientApiError> {This would allow refreshing connections without requiring mutable access to the
Client, which is beneficial for shared usage patterns.d-engine-core/src/replication/replication_handler.rs (1)
96-106: Partition logic runs unnecessarily for single-node clusters.When
is_single_nodeis true,replication_targetsis empty, making the partition and learner trace redundant. Consider moving this block inside the multi-node path.+ // Multi-node cluster: perform replication to peers + if is_single_node { + debug!( + "Single-node cluster (leader={}): logs persisted, quorum automatically achieved", + self.my_id + ); + return Ok(AppendResults { + commit_quorum_achieved: true, + peer_updates: HashMap::new(), + learner_progress: HashMap::new(), + }); + } + // Separate Voters and Learners let (voters, learners): (Vec<_>, Vec<_>) = replication_targets .iter() .partition(|node| node.status == NodeStatus::Active as i32);This avoids the empty partition operation for single-node clusters.
d-engine-core/src/raft_role/candidate_state.rs (1)
571-574: Consider defensive error handling instead ofunreachable!().While the comment indicates this should never be reached, using
unreachable!()will cause a panic if the event is ever received due to a bug in the event routing logic. Consider returning an error instead for more graceful handling.RaftEvent::StepDownSelfRemoved => { - // Unreachable: handled at Raft level before reaching RoleState - unreachable!("StepDownSelfRemoved should be handled in Raft::run()"); + // Should be handled at Raft level before reaching RoleState + return Err(ConsensusError::RoleViolation { + current_role: "Candidate", + required_role: "Raft", + context: format!( + "StepDownSelfRemoved should be handled in Raft::run(), not in CandidateState for node {}", + ctx.node_id + ), + } + .into()); }d-engine-client/src/grpc_kv_client.rs (1)
18-19: Consider removing#[allow(unused_imports)]if imports are used.
CoreKvClientandKvResultare used in the trait implementation. IfKvClientErroris intentionally imported for re-export or documentation purposes, consider using a more targeted allow attribute or removing the unused import.d-engine-core/src/raft_role/mod.rs (1)
7-9: Makeraft_role_testcfg consistent withtest-utils-gated helpers.
Right nowRaftRole::{voted_for,commit_index,match_index,next_index,follower_role_i32}can be built withfeature="test-utils", butmod raft_role_test;is#[cfg(test)]only—so any feature-gated internal tests/helpers won’t compile/run under that feature setup.-#[cfg(test)] +#[cfg(any(test, feature = "test-utils"))] mod raft_role_test;Also applies to: 306-316, 322-328, 363-366
d-engine-client/src/mock_rpc_service.rs (1)
84-90: ReturnResult<Channel, _>frommock_channel_with_portinstead ofexpect.
This is a test utility, but it’s used as infrastructure—better to bubble failures up than abort the whole test process.- pub(crate) async fn mock_channel_with_port(port: u16) -> Channel { - Channel::from_shared(format!("http://127.0.0.1:{port}")) - .expect("valid address") - .connect() - .await - .expect("connection failed") + pub(crate) async fn mock_channel_with_port( + port: u16, + ) -> std::result::Result<Channel, tonic::transport::Error> { + Channel::from_shared(format!("http://127.0.0.1:{port}")) + .expect("valid address") + .connect() + .await }d-engine-core/src/config/raft.rs (2)
538-620: Prefer an enum forLeaseConfig.cleanup_strategy(avoid stringly-typed config).
Validation is good, but this still makes downstream logic more error-prone and less discoverable (docs/IDE).
1384-1416: Gate watch-size warnings behindenabled(avoid noisy configs when feature is off).
Right now huge values warn even when watch is disabled.- if self.event_queue_size > 100_000 { + if self.enabled && self.event_queue_size > 100_000 { warn!( ... ); } @@ - if self.watcher_buffer_size > 1000 { + if self.enabled && self.watcher_buffer_size > 1000 { warn!( ... ); }d-engine-client/src/error.rs (1)
55-98: Remove large blocks of commented-out code (old error enums / categories).
This adds noise and makes the new API harder to read.Also applies to: 207-238
d-engine-client/src/mock_rpc.rs (2)
71-76: Redundant clone in error mapping.The
map_err(|e| e.clone())is redundant sinceeis already moved into the closure andtonic::StatusimplementsClone. You can simplify this.- (Some(f), Some(port)) => f(port).map(tonic::Response::new).map_err(|e| e.clone()), + (Some(f), Some(port)) => f(port).map(tonic::Response::new),
134-142: Inconsistent watch() implementation across mock files.This implementation returns
Err(tonic::Status::unimplemented(...)), while the similar mock ind-engine-core/src/test_utils/mock/mock_rpc.rs(lines 185-193) returns an empty stream viaOk(tonic::Response::new(...)). Consider aligning these implementations for consistent test behavior.d-engine-core/src/raft.rs (2)
163-171: Silent error handling when notifying leader change listeners.The
let _ = tx.send(...)silently ignores failures when the channel is closed. Consider logging a warning or debug message when a listener's channel is unavailable, similar to hownotify_new_commit(line 420-421) logs errors.fn notify_leader_change( &self, leader_id: Option<u32>, term: u64, ) { for tx in &self.leader_change_listener { - let _ = tx.send((leader_id, term)); + if let Err(e) = tx.send((leader_id, term)) { + tracing::debug!("Leader change listener channel closed: {:?}", e); + } } }
434-440: Inconsistent error handling in test notification methods.
notify_role_transitionusesexpect("should succeed")which will panic if the channel is closed, whilenotify_leader_changesilently ignores errors. Consider using consistent error handling patterns across notification methods.pub fn notify_role_transition(&self) { let new_role_i32 = self.role.as_i32(); for tx in &self.test_role_transition_listener { - tx.send(new_role_i32).expect("should succeed"); + if let Err(e) = tx.send(new_role_i32) { + tracing::warn!("Role transition listener send failed: {:?}", e); + } } }d-engine-core/src/raft_role/leader_state.rs (3)
1538-1548: Use tracing macros instead of println! in library code.Direct
println!statements in library code bypass structured logging and can interfere with applications that control stdout. Consider usingtracing::info!consistently.- for learner_id in &ready_learners { - let match_index = learner_progress.get(learner_id).and_then(|mi| *mi).unwrap_or(0); - crate::utils::cluster_printer::print_leader_promoting_learner( - self.node_id(), - *learner_id, - match_index, - leader_commit_index, - ); - } + for learner_id in &ready_learners { + let match_index = learner_progress.get(learner_id).and_then(|mi| *mi).unwrap_or(0); + info!( + leader_id = self.node_id(), + learner_id = *learner_id, + match_index, + leader_commit_index, + "Promoting learner to voter" + ); + }
2092-2093: Replace println! with structured logging.Similar to the previous comment, this
println!should use tracing macros for consistent logging behavior.- println!("============== Promotion successful ================"); - println!("Now cluster members: {:?}", membership.voters().await); + info!( + members = ?membership.voters().await, + "Promotion successful" + );
2305-2311: Replace println! with tracing for stale learner notification.This
println!should use structured logging instead.- println!( - " - ===================== - Learner {node_id} is stalled - ===================== - ", - ); info!("Learner {} is stalled", node_id);d-engine-core/src/lib.rs (1)
21-36: Extensive wildcard re-exports expand public API surface.Multiple
pub use module::*statements significantly expand the crate's public API. This can lead to:
- Namespace pollution with internal types unintentionally exposed
- Potential naming conflicts with downstream crates
- Difficulty tracking API changes
Consider using explicit re-exports for the intended public API surface, or document which types from each module are part of the stable public API.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (11)
Cargo.lockis excluded by!**/*.lockbenches/d-engine-bench/Cargo.lockis excluded by!**/*.lockd-engine-client/Cargo.lockis excluded by!**/*.lockd-engine-proto/src/generated/d_engine.client.rsis excluded by!**/generated/**d-engine-proto/src/generated/d_engine.common.error.rsis excluded by!**/generated/**d-engine-proto/src/generated/d_engine.common.rsis excluded by!**/generated/**d-engine-proto/src/generated/d_engine.error.rsis excluded by!**/generated/**d-engine-proto/src/generated/d_engine.server.cluster.rsis excluded by!**/generated/**d-engine-proto/src/generated/d_engine.server.election.rsis excluded by!**/generated/**d-engine-proto/src/generated/d_engine.server.replication.rsis excluded by!**/generated/**d-engine-proto/src/generated/d_engine.server.storage.rsis excluded by!**/generated/**
📒 Files selected for processing (103)
.claudeignore(1 hunks).github/workflows/benchmark.yml(1 hunks).github/workflows/ci.yml(1 hunks).github/workflows/commit-message-check.yml(1 hunks).github/workflows/dependency-audit.yml(2 hunks).gitignore(1 hunks)CHANGELOG.md(1 hunks)Cargo.toml(1 hunks)MIGRATION_GUIDE.md(1 hunks)Makefile(1 hunks)README.md(5 hunks)benches/d-engine-bench/Cargo.toml(1 hunks)benches/d-engine-bench/reports/v0.1.4/report_20251205_remeasured.md(1 hunks)benches/d-engine-bench/reports/v0.2.0/report_20251205.md(1 hunks)benches/d-engine-bench/reports/v0.2.0/report_20251209.md(1 hunks)benches/d-engine-bench/reports/v0.2.0/report_20251210.md(1 hunks)benches/d-engine-bench/src/main.rs(9 hunks)clippy.toml(1 hunks)config/base/raft.toml(2 hunks)d-engine-client/Cargo.toml(1 hunks)d-engine-client/src/builder.rs(3 hunks)d-engine-client/src/cluster.rs(2 hunks)d-engine-client/src/cluster_test.rs(5 hunks)d-engine-client/src/config.rs(1 hunks)d-engine-client/src/error.rs(8 hunks)d-engine-client/src/grpc_kv_client.rs(6 hunks)d-engine-client/src/kv_client.rs(1 hunks)d-engine-client/src/kv_error.rs(1 hunks)d-engine-client/src/kv_test.rs(29 hunks)d-engine-client/src/lib.rs(1 hunks)d-engine-client/src/mock_rpc.rs(1 hunks)d-engine-client/src/mock_rpc_service.rs(1 hunks)d-engine-client/src/pool.rs(10 hunks)d-engine-client/src/pool_test.rs(1 hunks)d-engine-client/src/proto/client_ext.rs(1 hunks)d-engine-client/src/proto/client_ext_test.rs(1 hunks)d-engine-client/src/proto/mod.rs(1 hunks)d-engine-client/src/scoped_timer.rs(2 hunks)d-engine-client/src/utils.rs(1 hunks)d-engine-client/src/utils_test.rs(1 hunks)d-engine-core/Cargo.toml(1 hunks)d-engine-core/src/commit_handler/default_commit_handler.rs(6 hunks)d-engine-core/src/commit_handler/default_commit_handler_test.rs(31 hunks)d-engine-core/src/commit_handler/mod.rs(1 hunks)d-engine-core/src/config/cluster.rs(2 hunks)d-engine-core/src/config/config_test.rs(2 hunks)d-engine-core/src/config/mod.rs(1 hunks)d-engine-core/src/config/network_test.rs(1 hunks)d-engine-core/src/config/raft.rs(13 hunks)d-engine-core/src/config/tls_test.rs(2 hunks)d-engine-core/src/election/election_handler.rs(4 hunks)d-engine-core/src/election/election_handler_test.rs(1 hunks)d-engine-core/src/election/mod.rs(2 hunks)d-engine-core/src/errors.rs(1 hunks)d-engine-core/src/errors_test.rs(1 hunks)d-engine-core/src/event.rs(4 hunks)d-engine-core/src/lib.rs(2 hunks)d-engine-core/src/maybe_clone_oneshot.rs(15 hunks)d-engine-core/src/maybe_clone_oneshot_test.rs(1 hunks)d-engine-core/src/membership.rs(1 hunks)d-engine-core/src/network/backgroup_snapshot_transfer.rs(1 hunks)d-engine-core/src/network/backgroup_snapshot_transfer_test.rs(2 hunks)d-engine-core/src/network/mod.rs(7 hunks)d-engine-core/src/purge/default_executor.rs(2 hunks)d-engine-core/src/purge/default_executor_test.rs(1 hunks)d-engine-core/src/purge/mod.rs(2 hunks)d-engine-core/src/raft.rs(10 hunks)d-engine-core/src/raft_context.rs(3 hunks)d-engine-core/src/raft_role/candidate_state.rs(16 hunks)d-engine-core/src/raft_role/follower_state.rs(15 hunks)d-engine-core/src/raft_role/leader_state.rs(34 hunks)d-engine-core/src/raft_role/learner_state.rs(16 hunks)d-engine-core/src/raft_role/mod.rs(9 hunks)d-engine-core/src/raft_role/raft_role_test.rs(1 hunks)d-engine-core/src/raft_role/role_state.rs(6 hunks)d-engine-core/src/raft_test.rs(1 hunks)d-engine-core/src/replication/mod.rs(2 hunks)d-engine-core/src/replication/replication_handler.rs(9 hunks)d-engine-core/src/state_machine_handler/default_state_machine_handler.rs(12 hunks)d-engine-core/src/state_machine_handler/default_state_machine_handler_test.rs(46 hunks)d-engine-core/src/state_machine_handler/mod.rs(4 hunks)d-engine-core/src/state_machine_handler/snapshot_assembler.rs(2 hunks)d-engine-core/src/state_machine_handler/snapshot_assembler_test.rs(1 hunks)d-engine-core/src/state_machine_handler/snapshot_policy/composite_test.rs(2 hunks)d-engine-core/src/state_machine_handler/snapshot_policy/log_size.rs(1 hunks)d-engine-core/src/state_machine_handler/snapshot_policy/log_size_test.rs(7 hunks)d-engine-core/src/state_machine_handler/snapshot_policy/mod.rs(2 hunks)d-engine-core/src/state_machine_handler/snapshot_policy/time_based_test.rs(4 hunks)d-engine-core/src/storage/lease.rs(1 hunks)d-engine-core/src/storage/mod.rs(1 hunks)d-engine-core/src/storage/raft_log.rs(1 hunks)d-engine-core/src/storage/snapshot_path_manager.rs(1 hunks)d-engine-core/src/storage/state_machine.rs(2 hunks)d-engine-core/src/storage/state_machine_test.rs(4 hunks)d-engine-core/src/storage/storage_engine.rs(3 hunks)d-engine-core/src/storage/storage_engine_test.rs(3 hunks)d-engine-core/src/storage/storage_test.rs(1 hunks)d-engine-core/src/test_utils/common.rs(4 hunks)d-engine-core/src/test_utils/common_test.rs(1 hunks)d-engine-core/src/test_utils/entry_builder.rs(1 hunks)d-engine-core/src/test_utils/entry_builder_test.rs(1 hunks)d-engine-core/src/test_utils/mock/mock_raft_builder.rs(6 hunks)d-engine-core/src/test_utils/mock/mock_rpc.rs(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (22)
d-engine-client/src/proto/client_ext_test.rs (1)
d-engine-client/src/error.rs (2)
code(401-409)message(412-420)
d-engine-client/src/builder.rs (1)
d-engine-client/src/grpc_kv_client.rs (1)
new(40-42)
d-engine-client/src/pool_test.rs (1)
d-engine-client/src/pool.rs (5)
parse_cluster_metadata(190-212)load_cluster_metadata(154-187)create_channel(116-130)create(39-54)get_all_channels(138-142)
d-engine-client/src/cluster.rs (1)
d-engine-client/src/lib.rs (1)
cluster(151-153)
d-engine-core/src/raft_test.rs (1)
d-engine-core/src/raft.rs (2)
new(79-93)new(101-143)
d-engine-core/src/storage/state_machine_test.rs (2)
d-engine-server/src/storage/lease_integration_test.rs (2)
create_insert_entry(129-153)create_insert_entry(700-724)d-engine-proto/src/exts/client_ext.rs (1)
insert(53-65)
d-engine-core/src/storage/mod.rs (1)
d-engine-server/src/node/builder.rs (2)
state_machine(197-203)storage_engine(188-194)
d-engine-client/src/kv_test.rs (4)
d-engine-client/src/lib.rs (1)
cluster(151-153)d-engine-client/src/grpc_kv_client.rs (1)
new(40-42)d-engine-client/src/mock_rpc_service.rs (1)
simulate_client_write_mock_server(154-183)d-engine-proto/src/exts/client_ext.rs (2)
write_success(105-111)client_error(139-145)
d-engine-core/src/storage/storage_engine_test.rs (3)
d-engine-server/src/storage/adaptors/rocksdb/rocksdb_engine_test.rs (2)
create_test_command_payload(100-120)new(26-33)examples/sled-cluster/src/sled_engine_test.rs (2)
create_test_command_payload(97-118)new(23-26)d-engine-proto/src/exts/client_ext.rs (1)
insert(53-65)
d-engine-core/src/raft_role/raft_role_test.rs (1)
d-engine-core/src/raft_role/mod.rs (1)
current_leader(147-152)
d-engine-core/src/raft_role/follower_state.rs (2)
d-engine-core/src/utils/cluster.rs (1)
error(20-25)d-engine-core/src/raft_role/leader_state.rs (1)
can_purge_logs(1798-1813)
d-engine-core/src/commit_handler/default_commit_handler_test.rs (1)
d-engine-core/src/commit_handler/default_commit_handler.rs (1)
is_self_removal_config(231-240)
d-engine-client/src/lib.rs (3)
d-engine-client/src/grpc_kv_client.rs (1)
new(40-42)d-engine-client/src/builder.rs (1)
new(35-40)d-engine-client/src/pool.rs (2)
refresh(63-80)create(39-54)
d-engine-core/src/election/election_handler.rs (2)
d-engine-core/src/lib.rs (2)
if_higher_term_found(74-86)is_target_log_more_recent(95-103)d-engine-core/src/membership.rs (1)
members(42-42)
d-engine-core/src/raft_role/candidate_state.rs (2)
d-engine-core/src/raft_role/mod.rs (1)
can_serve_read_locally(429-466)d-engine-core/src/raft_role/leader_state.rs (2)
send_become_follower_event(1749-1765)send_replay_raft_event(2441-2450)
d-engine-core/src/replication/replication_handler.rs (2)
d-engine-core/src/utils/cluster.rs (1)
is_majority(8-13)d-engine-server/src/membership/raft_membership.rs (1)
voters(106-119)
d-engine-core/src/storage/lease.rs (1)
d-engine-core/src/storage/state_machine.rs (2)
len(62-62)is_empty(66-68)
d-engine-client/src/kv_client.rs (2)
d-engine-client/src/grpc_kv_client.rs (10)
put(50-85)put(404-410)put_with_ttl(100-137)put_with_ttl(412-419)get(217-222)get(421-430)get_multi(249-254)get_multi(432-440)delete(151-180)delete(442-447)d-engine-server/src/node/client/local_kv.rs (7)
put(153-189)put(451-457)put_with_ttl(459-499)get(501-506)get_multi(508-513)delete(385-417)delete(515-520)
d-engine-client/src/grpc_kv_client.rs (3)
d-engine-server/src/node/client/local_kv.rs (7)
put_with_ttl(459-499)put(153-189)put(451-457)get(501-506)get_multi(508-513)delete(385-417)delete(515-520)d-engine-client/src/kv_client.rs (5)
put_with_ttl(101-106)put(75-79)get(137-140)get_multi(169-172)delete(194-197)d-engine-server/src/storage/adaptors/file/file_state_machine.rs (2)
new(193-211)get(987-1006)
d-engine-core/src/state_machine_handler/default_state_machine_handler.rs (2)
d-engine-proto/src/exts/client_ext.rs (2)
insert(53-65)delete(92-97)d-engine-core/src/watch/manager.rs (1)
new(276-297)
d-engine-core/src/test_utils/mock/mock_raft_builder.rs (1)
d-engine-server/src/test_utils/mock/mock_node_builder.rs (2)
build_context(164-201)new(138-159)
d-engine-core/src/state_machine_handler/default_state_machine_handler_test.rs (1)
d-engine-core/src/state_machine_handler/default_state_machine_handler.rs (2)
new(744-770)new_without_watch(777-794)
🪛 checkmake (0.2.2)
Makefile
[warning] 54-54: Target body for "help" exceeds allowed length of 5 (35).
(maxbodylength)
[warning] 403-403: Target body for "audit" exceeds allowed length of 5 (6).
(maxbodylength)
[warning] 412-412: Target body for "deny" exceeds allowed length of 5 (6).
(maxbodylength)
[warning] 444-444: Target body for "troubleshoot" exceeds allowed length of 5 (34).
(maxbodylength)
[warning] 18-18: Missing required phony target "clean"
(minphony)
[warning] 115-115: Target "install" should be declared PHONY.
(phonydeclared)
[warning] 393-393: Target "doc" should be declared PHONY.
(phonydeclared)
🪛 GitHub Check: CodeQL
.github/workflows/benchmark.yml
[warning] 27-89: Workflow does not contain permissions
Actions job or workflow does not limit the permissions of the GITHUB_TOKEN. Consider setting an explicit permissions block, using the following as a minimal starting point: {{contents: read}}
🪛 LanguageTool
benches/d-engine-bench/reports/v0.1.4/report_20251205_remeasured.md
[style] ~3-~3: Some style guides suggest that commas should set off the year in a month-day-year date.
Context: ...Benchmark Report Date: December 5, 2025 Hardware: Apple Mac mini M2 (8-co...
(MISSING_COMMA_AFTER_YEAR)
benches/d-engine-bench/reports/v0.2.0/report_20251210.md
[style] ~3-~3: Some style guides suggest that commas should set off the year in a month-day-year date.
Context: ...enchmark Report Date: December 10, 2025 Hardware: Apple Mac mini M2 (8-co...
(MISSING_COMMA_AFTER_YEAR)
benches/d-engine-bench/reports/v0.2.0/report_20251209.md
[style] ~3-~3: Some style guides suggest that commas should set off the year in a month-day-year date.
Context: ...Benchmark Report Date: December 9, 2025 Hardware: Apple Mac mini M2 (8-co...
(MISSING_COMMA_AFTER_YEAR)
benches/d-engine-bench/reports/v0.2.0/report_20251205.md
[style] ~3-~3: Some style guides suggest that commas should set off the year in a month-day-year date.
Context: ...Benchmark Report Date: December 5, 2025 Hardware: Apple Mac mini M2 (8-co...
(MISSING_COMMA_AFTER_YEAR)
🪛 markdownlint-cli2 (0.18.1)
benches/d-engine-bench/reports/v0.2.0/report_20251210.md
24-24: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
33-33: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
benches/d-engine-bench/reports/v0.2.0/report_20251209.md
121-121: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
139-139: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
155-155: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
171-171: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
187-187: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
203-203: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
MIGRATION_GUIDE.md
20-20: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Lint and Format Check
- GitHub Check: audit
- GitHub Check: Analyze (rust)
| #[test] | ||
| fn test_shared_state_concurrent_updates() { | ||
| use std::sync::Arc; | ||
| use std::thread; | ||
|
|
||
| let shared = Arc::new(SharedState::new(1, None, None)); | ||
|
|
||
| // Simulate concurrent leader updates | ||
| let handles: Vec<_> = (0..10) | ||
| .map(|i| { | ||
| let shared = Arc::clone(&shared); | ||
| thread::spawn(move || { | ||
| shared.set_current_leader(i as u32); | ||
| }) | ||
| }) | ||
| .collect(); | ||
|
|
||
| for h in handles { | ||
| h.join().unwrap(); | ||
| } | ||
|
|
||
| // Final value is one of the concurrent updates | ||
| let final_leader = shared.current_leader(); | ||
| assert!(final_leader.is_some()); | ||
| assert!(final_leader.unwrap() < 10); | ||
| } |
There was a problem hiding this comment.
Potential test flakiness: thread setting leader to 0 can cause assertion failure.
The concurrent test spawns threads setting leader IDs from 0 to 9. If thread 0 wins the race, set_current_leader(0) is called, which per the current_leader() implementation returns None for value 0. The assertion assert!(final_leader.is_some()) would then fail.
Consider starting the range from 1:
let handles: Vec<_> = (0..10)
.map(|i| {
let shared = Arc::clone(&shared);
thread::spawn(move || {
- shared.set_current_leader(i as u32);
+ shared.set_current_leader((i + 1) as u32);
})
})
.collect();Or adjust the assertion:
- assert!(final_leader.is_some());
- assert!(final_leader.unwrap() < 10);
+ // final_leader could be None if thread 0 won (set_current_leader(0) clears leader)
+ match final_leader {
+ Some(id) => assert!(id < 10),
+ None => {} // thread 0 won the race
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[test] | |
| fn test_shared_state_concurrent_updates() { | |
| use std::sync::Arc; | |
| use std::thread; | |
| let shared = Arc::new(SharedState::new(1, None, None)); | |
| // Simulate concurrent leader updates | |
| let handles: Vec<_> = (0..10) | |
| .map(|i| { | |
| let shared = Arc::clone(&shared); | |
| thread::spawn(move || { | |
| shared.set_current_leader(i as u32); | |
| }) | |
| }) | |
| .collect(); | |
| for h in handles { | |
| h.join().unwrap(); | |
| } | |
| // Final value is one of the concurrent updates | |
| let final_leader = shared.current_leader(); | |
| assert!(final_leader.is_some()); | |
| assert!(final_leader.unwrap() < 10); | |
| } | |
| #[test] | |
| fn test_shared_state_concurrent_updates() { | |
| use std::sync::Arc; | |
| use std::thread; | |
| let shared = Arc::new(SharedState::new(1, None, None)); | |
| // Simulate concurrent leader updates | |
| let handles: Vec<_> = (0..10) | |
| .map(|i| { | |
| let shared = Arc::clone(&shared); | |
| thread::spawn(move || { | |
| shared.set_current_leader((i + 1) as u32); | |
| }) | |
| }) | |
| .collect(); | |
| for h in handles { | |
| h.join().unwrap(); | |
| } | |
| // Final value is one of the concurrent updates | |
| let final_leader = shared.current_leader(); | |
| assert!(final_leader.is_some()); | |
| assert!(final_leader.unwrap() < 10); | |
| } |
🤖 Prompt for AI Agents
d-engine-core/src/raft_role/raft_role_test.rs around lines 284 to 309: the
concurrent test spawns threads that set leader IDs 0..9 but
SharedState::current_leader() treats 0 as None, so if thread 0 wins the race the
test flakes; fix by making threads set non-zero IDs (e.g., iterate 1..=10 or
1..11) or alternatively relax the assertions to accept None/0 (for example
assert that final_leader.unwrap_or(0) < 10) — pick one approach and update the
test so it never relies on 0 being treated as Some.
d-engine-core/src/state_machine_handler/default_state_machine_handler.rs
Show resolved
Hide resolved
| ## clippy Run Clippy linter (fail on warnings with -D flag) | ||
| ## clippy Run Clippy linter on all crates | ||
| clippy: check-workspace clippy-excluded |
There was a problem hiding this comment.
Duplicate comment for clippy target.
Lines 154-155 have two ## comments for the same target, which may cause issues with the help parser.
-## clippy Run Clippy linter (fail on warnings with -D flag)
-## clippy Run Clippy linter on all crates
+## clippy Run Clippy linter on all crates (fail on warnings)
clippy: check-workspace clippy-excluded📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## clippy Run Clippy linter (fail on warnings with -D flag) | |
| ## clippy Run Clippy linter on all crates | |
| clippy: check-workspace clippy-excluded | |
| ## clippy Run Clippy linter on all crates (fail on warnings) | |
| clippy: check-workspace clippy-excluded |
🤖 Prompt for AI Agents
In Makefile around lines 154 to 156, there are duplicate comment lines
describing the clippy target which can confuse the help parser; remove the
redundant comment so only one descriptive comment remains (e.g., keep a single
"## clippy Run Clippy linter on all crates" line) ensuring the comments are
unique and accurately describe the target.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Changes: - Remove commented legacy package config from workspace Cargo.toml - Update README benchmark reference to v0.2.0 (dengine_comparison_v0.2.0.png) - Upgrade rustfmt edition from 2021 to 2024 - Clean up outdated v0.2.0 benchmark reports (report_20251205/09/10.md) - Add final v0.2.0 benchmark report and comparison chart - Sync version references across all example Cargo.toml files
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
Makefile (1)
154-155: Fix duplicate comment for clippy target.Lines 154-155 contain duplicate comment lines describing the
clippytarget, which may confuse the help parser. Keep a single, descriptive comment.Apply this diff:
-## clippy Run Clippy linter (fail on warnings with -D flag) -## clippy Run Clippy linter on all crates +## clippy Run Clippy linter on all crates (fail on warnings)
🧹 Nitpick comments (3)
benches/d-engine-bench/reports/v0.2.0/report_v0.2.0_final.md (1)
174-174: Minor grammar refinement: "a small number of keys" → "10 keys".At line 174, the phrase "a small number of keys (10)" can be tightened to simply "10 keys" or "few keys" for clearer, more direct prose.
README.md (1)
80-80: Clarify benchmark methodology and source attribution.The footnote "*Benchmarks show 51% higher write throughput vs etcd..." references specific numbers but doesn't clearly link to the detailed benchmark report. Consider explicitly directing readers to
benches/d-engine-bench/reports/v0.2.0/for the comprehensive methodology and environment details.Consider updating to: "Benchmarks show 51% higher write throughput vs etcd in test environments (see
benches/d-engine-bench/reports/v0.2.0/for methodology and environment details). Production results vary by hardware."Makefile (1)
1-576: Approve comprehensive Makefile structure.The Makefile provides excellent developer experience with:
- Clear organization across logical sections (quality, build, test, docs, security, cleanup, troubleshooting)
- Detailed help output with color-coded messages and workflow examples
- Environment validation and workspace structure checks
- Comprehensive test targets (unit, integration, doc, examples, detailed output, verbose, all)
- Documentation generation with browser automation
- Security audit targets
- Composite workflows for common tasks (pre-release, check-all-projects)
- Color-coded output for improved readability
The overall structure and functionality are well-designed and should significantly improve the development experience.
Consider addressing the static analysis warnings for consistency: while body length warnings are acceptable and don't affect functionality, ensuring all targets are declared
.PHONYis a Makefile best practice that improves portability and clarity.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
benches/d-engine-bench/Cargo.lockis excluded by!**/*.lockbenches/d-engine-bench/reports/v0.2.0/dengine_comparison_v0.2.0.pngis excluded by!**/*.png
📒 Files selected for processing (5)
Cargo.toml(2 hunks)Makefile(1 hunks)README.md(4 hunks)benches/d-engine-bench/reports/v0.2.0/report_v0.2.0_final.md(1 hunks)benches/d-engine-bench/rustfmt.toml(1 hunks)
🧰 Additional context used
🪛 checkmake (0.2.2)
Makefile
[warning] 54-54: Target body for "help" exceeds allowed length of 5 (35).
(maxbodylength)
[warning] 407-407: Target body for "audit" exceeds allowed length of 5 (6).
(maxbodylength)
[warning] 416-416: Target body for "deny" exceeds allowed length of 5 (6).
(maxbodylength)
[warning] 448-448: Target body for "troubleshoot" exceeds allowed length of 5 (34).
(maxbodylength)
[warning] 18-18: Missing required phony target "clean"
(minphony)
[warning] 115-115: Target "install" should be declared PHONY.
(phonydeclared)
[warning] 397-397: Target "doc" should be declared PHONY.
(phonydeclared)
🪛 LanguageTool
benches/d-engine-bench/reports/v0.2.0/report_v0.2.0_final.md
[style] ~174-~174: Specify a number, remove phrase, use “a few”, or use “some”
Context: ...st simulates contention scenarios where a small number of keys (10) receive high request volume. ...
(SMALL_NUMBER_OF)
🪛 markdownlint-cli2 (0.18.1)
benches/d-engine-bench/reports/v0.2.0/report_v0.2.0_final.md
178-178: Multiple headings with the same content
(MD024, no-duplicate-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: audit
- GitHub Check: Analyze (rust)
- GitHub Check: audit
🔇 Additional comments (4)
Cargo.toml (2)
50-76: Approve profile configuration tuning.The profile configurations are well-tuned for different build scenarios:
- dev: Low optimization (opt-level=1) for fast iteration, incremental compilation enabled
- release: Aggressive optimization (opt-level='z' for size, LTO, single codegen unit) for production
- test: Balanced approach with incremental compilation and multi-core codegen
This is a good setup for a Rust workspace.
2-10: All workspace members are correctly declared and paths are valid.Verification confirms all 6 members exist with proper Cargo.toml files: d-engine, d-engine-proto, d-engine-core, d-engine-client, d-engine-docs, and d-engine-server. Each crate name matches its path reference.
README.md (1)
10-11: Verify new public APIs and features are properly exported.The README introduces several new public APIs and features that must exist and be properly exported:
- EmbeddedEngine (lines 22, 107, 116): Public struct in
d-engine-serveratsrc/embedded/mod.rs, re-exported viad-engine::prelude- LocalKvClient (line 23): Public struct in
d-engine-server, re-exported viad-engine::prelude- prelude module (line 101): Public module at
d-engine/src/lib.rs:203that exports EmbeddedEngine, LocalKvClient, Node, NodeBuilder, and related types (conditional on features)- Feature flags (lines 48, 51):
clientandserverfeatures properly defined ind-engine/Cargo.tomlas optional dependencies- TTL/Lease (lines 20-21): Lease trait re-exported from
d-engine-core, withDefaultLeaseimplementation ind-engine-server/src/storage/lease.rs- Watch API (lines 20-21): Implemented via
watch()method inEmbeddedEngineatsrc/embedded/mod.rs:322andleader_notifier()for leader change notificationsAll mentioned example directories exist:
examples/quick-start/,examples/single-node-expansion/,examples/sled-cluster/,examples/three-nodes-cluster/.Makefile (1)
18-22: Add missing.PHONYdeclarations.Several targets are missing from the
.PHONYdeclaration at lines 18-22, includinginstall(line 115) anddoc/doc-check(lines 397-400). Update the.PHONYlist to include all non-file targets.Apply this diff to add missing phony targets:
.PHONY: help all check fmt fmt-check clippy clippy-fix test test-unit \ test-integration test-doc bench clean doc build build-release \ pre-release install check-env audit deny troubleshoot \ test-crate test-examples test-detailed \ - docs docs-all docs-check docs-check-all + docs docs-all docs-check docs-check-all \ + install-tools doc doc-check check-workspace check-examples \ + check-benches check-all-projects build-examples fmt-fix \ + clippy-excluded test-verbose fix clean-depsLikely an incorrect or invalid review comment.
- Fix typo: rename backgroup → background in snapshot transfer - Remove obsolete storage_test.rs - Update client error handling and KV operations - Fix network module exports - Update candidate state and state machine handler - Fix single-node cluster examples documentation - Update benchmark workflow configuration
Cache cluster metadata in LeaderState to eliminate 3 async calls per write.
Performance: +2-3% throughput, -13-21% p99 latency vs v0.2.0
Changes:
- Add ClusterMetadata { is_single_node, total_voters } cache
- Initialize on leader election, update on membership changes
- Pass to replication handler to avoid repeated queries
- Fix 6 test failures due to missing metadata initialization
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (7)
d-engine-client/src/grpc_kv_client.rs (1)
376-395: Potential panic if channel pool is empty.
rng.gen_range(0..channels.len())will panic with an empty range ifget_all_channels()returns an empty vector (e.g., during initialization or when all nodes are disconnected).Add a guard for empty channels:
pub(super) async fn make_client( &self ) -> std::result::Result<RaftClientServiceClient<Channel>, ClientApiError> { let client_inner = self.client_inner.load(); // Balance from read clients let mut rng = StdRng::from_entropy(); let channels = client_inner.pool.get_all_channels(); + if channels.is_empty() { + return Err(ClientApiError::Network("No available channels".into())); + } let i = rng.gen_range(0..channels.len()); let mut client = RaftClientServiceClient::new(channels[i].clone());d-engine-core/src/replication/mod.rs (1)
189-220: Docs referenceAppendEntriesResponse::{success,higher_term,conflict}—likely stale after helper removal.
check_append_entries_request_is_legal’s docs still point at constructor-style APIs that (per this PR) were removed fromAppendEntriesResponse. Update doc return bullets to match the new construction path (e.g., “returns anAppendEntriesResponsewithsuccess_result/conflict_result/term semantics…”), so downstream implementors don’t chase non-existent helpers.d-engine-core/src/state_machine_handler/default_state_machine_handler.rs (1)
1199-1224: Critical: Hardcoded offset breaks configurable prefix.Line 1211 uses a hardcoded offset of 9 bytes, assuming the prefix is always "snapshot-". However, the function now accepts a configurable
snapshot_dir_prefixparameter. If a different prefix is used (e.g., "snap-", "backup-", or any non-9-character prefix), the parsing will:
- Slice at the wrong position, producing incorrect index/term values
- Potentially panic if the string is shorter than 9 characters
- Cause snapshot cleanup and management to fail
Apply this diff to use the dynamic prefix length:
- // Remove fixed parts - let core = &name[9..name.len()]; // "snapshot-".len() = 9, + // Remove prefix (now configurable) + let core = &name[snapshot_dir_prefix.len()..];d-engine-client/src/error.rs (1)
150-205: Preserve server message in fallback case for better diagnostics.The fallback handler (
_ => Self::Business) dropsstatus.message()when mapping unhandled status codes, replacing it only with a code debug representation. This makes diagnosing server-side failures harder. Include the original message:_ => Self::Business { code: ErrorCode::Uncategorized, - message: format!("Unhandled status code: {code:?}"), + message: format!("Unhandled status code: {code:?}: {message}"), required_action: None, },d-engine-core/src/raft_role/candidate_state.rs (1)
243-284: Avoidself.voted_for().unwrap()in vote-request handling (panic risk).
Ifvoted_for()can ever error (storage, state init), Candidate will crash instead of rejecting / stepping down cleanly.- if ctx.election_handler().check_vote_request_is_legal( + if ctx.election_handler().check_vote_request_is_legal( &vote_request, my_term, last_log_index, last_log_term, - self.voted_for().unwrap(), + self.voted_for()?, ) {d-engine-core/src/replication/replication_handler.rs (2)
564-614: New public API: avoidassert!panics ingenerate_new_entries.
assert!(!id_range.is_empty())(Line 579) will abort the process if allocation ever fails. For apub async fnreturningResult, prefer non-panicking handling (at minimumdebug_assert!+ a structured error path).
686-706: New public helper must not panic: remove.unwrap()on protobuf encoding.
cmd.encode(&mut buf).unwrap()(Line 699) can panic; for a public API, use the infallibleencode_to_vec()method instead.pub fn client_command_to_entry_payloads(commands: Vec<WriteCommand>) -> Vec<EntryPayload> { commands .into_iter() .map(|cmd| { - let mut buf = BytesMut::with_capacity(cmd.encoded_len()); - cmd.encode(&mut buf).unwrap(); - EntryPayload { - payload: Some(Payload::Command(buf.freeze())), + payload: Some(Payload::Command(cmd.encode_to_vec().into())), } }) .collect() }
♻️ Duplicate comments (2)
d-engine-core/src/network/mod.rs (1)
9-13: Typo fix from past review has been addressed.The module has been correctly renamed from
backgroup_snapshot_transfertobackground_snapshot_transfer.d-engine-client/src/error.rs (1)
104-147: Default transport failure: make invalid-URI detection robust + align retry hint.
err.to_string().contains("invalid uri")is case-sensitive; tonic errors often say “invalid URI”, so this can silently misclassify.- The fallback now correctly uses
Uncategorized(good), butretry_after_ms: Some(5000)implies retryability despite being “unclassified”; past guidance suggestedNone.@@ - // Check for invalid address errors - if err.to_string().contains("invalid uri") { + // Check for invalid address errors (be case-insensitive) + let err_s = err.to_string(); + if err_s.to_ascii_lowercase().contains("invalid uri") { return Self::Network { code: ErrorCode::InvalidAddress, - message: format!("Invalid address: {err}"), + message: format!("Invalid address: {err}"), retry_after_ms: None, // Not retryable - needs address correction leader_hint: None, }; } @@ Self::Network { code: ErrorCode::Uncategorized, message: format!("Transport error: {err}"), - retry_after_ms: Some(5000), + retry_after_ms: None, leader_hint: None, }
🧹 Nitpick comments (14)
d-engine-core/src/storage/lease.rs (3)
114-141: Clarify caller responsibilities for expired keys.Both
get_expired_keysandon_applyreturnVec<Bytes>of expired keys, but the documentation doesn't explicitly state what the caller should do with them. While it's implied that these keys should be deleted from the main storage, making this contract explicit would improve API clarity.Consider adding to the documentation:
/// # Returns -/// List of expired keys (keys are removed from lease tracking) +/// List of expired keys (keys are removed from lease tracking). +/// The caller is responsible for deleting these keys from the main storage.Similar clarification for
on_applywould be helpful.
182-188: Consider returning Result fromto_snapshotfor error transparency.The method returns
Vec<u8>rather thanResult<Vec<u8>>, preventing serialization errors from being propagated. The implementation in d-engine-server uses.unwrap_or_default(), which silently returns an empty snapshot on failure. This could mask snapshot creation issues.While
reloadproperly returnsResult<()>for deserialization errors, the asymmetry means snapshot creation failures go unnoticed until restoration fails.Consider changing the signature to:
-fn to_snapshot(&self) -> Vec<u8>; +fn to_snapshot(&self) -> Result<Vec<u8>>;This allows callers to handle serialization failures explicitly.
156-169: Documentation is slightly imprecise about performance.The documentation states "Should be O(1) or O(log N) at most" (line 165), but the implementation in d-engine-server samples the first 10 entries, which is O(1). The "or O(log N)" part might be unnecessary given the heuristic nature of this check.
Consider tightening the performance documentation:
/// # Performance -/// Should be O(1) or O(log N) at most. +/// Should be O(1) - typically a fast heuristic check (e.g., sampling a few entries).d-engine-client/src/grpc_kv_client.rs (1)
99-136: Consider extracting shared write logic to reduce duplication.The
put_with_ttlimplementation mirrorsputexcept for theWriteCommandconstruction. The request building, leader client acquisition, and response handling are duplicated.Consider extracting a helper method:
+ async fn execute_write(&self, commands: Vec<WriteCommand>, op_name: &str) -> std::result::Result<(), ClientApiError> { + let client_inner = self.client_inner.load(); + let request = ClientWriteRequest { + client_id: client_inner.client_id, + commands, + }; + let mut client = self.make_leader_client().await?; + match client.handle_client_write(request).await { + Ok(response) => { + debug!("[:KvClient:{}] response: {:?}", op_name, response); + response.get_ref().validate_error() + } + Err(status) => { + error!("[:KvClient:{}] status: {:?}", op_name, status); + Err(status.into()) + } + } + }This would simplify
put,put_with_ttl, anddeleteto single command construction + helper call.d-engine-client/src/error.rs (3)
55-98: Delete commented-out public enums instead of keeping them in-tree.
Large commented blocks (NetworkErrorType/ProtocolErrorType/etc.) add noise and tend to rot; rely on git history or reintroduce with a proper deprecation cycle if needed.
239-275: Avoid manual “JSON-like” parsing forx-raft-leadermetadata.
The current split-by-,/ split-by-:parser will break if values contain commas/colons, or if encoding changes slightly. If the header is JSON, parse it as JSON; if it’s not JSON, standardize it (e.g., base64 JSON) or use a structured delimiter format.@@ fn parse_leader_from_metadata(status: &Status) -> Option<LeaderInfo> { status .metadata() .get("x-raft-leader") .and_then(|v| v.to_str().ok()) - .and_then(|s| { - // Manually parse JSON-like string - let mut id = None; - let mut address = None; - let mut last_contact = None; - - // Remove whitespace and outer braces - let s = s.trim().trim_start_matches('{').trim_end_matches('}'); - - // Split into key-value pairs - for pair in s.split(',') { - let pair = pair.trim(); - if let Some((key, value)) = pair.split_once(':') { - let key = key.trim().trim_matches('"'); - let value = value.trim().trim_matches('"'); - - match key { - "id" => id = Some(value.to_string()), - "address" => address = Some(value.to_string()), - "last_contact" => last_contact = value.parse().ok(), - _ => continue, - } - } - } - - Some(LeaderInfo { - id: id?, - address: address?, - last_contact: last_contact?, - }) - }) + .and_then(|s| serde_json::from_str::<LeaderInfo>(s).ok()) }
277-397: Consider mappingErrorCode::UncategorizedtoGeneral(notBusiness).
Right nowErrorCode::UncategorizedbecomesClientApiError::Business, which can drive callers toward “client must change behavior” semantics for unknown failures.Generalseems more appropriate unless you have a specific reason.d-engine-client/src/pool.rs (1)
153-187: Improve final failure signal fromload_cluster_metadataby including last error.
Currently you log each failure but return a genericClusterUnavailable. Consider capturing the last error string and embedding it inClientApiError(or attach as context) to help operators.d-engine-core/src/raft_role/role_state.rs (1)
322-353:create_not_leader_response()likely does an avoidable O(n) scan; prefer a direct leader lookup if Membership supports it.
Right now it awaitsmembers()theniter().find(...). If there’s aretrieve_node_meta(id)/map-like accessor, it’ll be cheaper and clearer.d-engine-core/src/raft_role/candidate_state.rs (1)
571-575:StepDownSelfRemovedasunreachable!()is OK, but consider a defensive fallback in release builds.
If routing ever regresses and this becomes reachable, this will crash the node. Awarn!+BecomeFollower(None)fallback can make the system more resilient.d-engine-core/src/raft_role/leader_state.rs (2)
1235-1253: Step-down on self-removal is good; consider also clearingcurrent_leader_idto avoid stale redirection.
After stepping down, clients hitting this node could still be redirected to the old leader id until the next heartbeat updates state elsewhere.RaftEvent::StepDownSelfRemoved => { + self.shared_state().clear_current_leader(); warn!( "[Leader-{}] Removed from cluster membership, stepping down to Follower", self.node_id() );
1008-1032:snapshot_in_progressuses mixed orderings (Acquire/Release then SeqCst); consider standardizing.
Not wrong, but it’s harder to reason about than sticking to Acquire loads + Release stores (or SeqCst consistently).d-engine-core/src/replication/replication_handler.rs (2)
543-549: New public struct: add#[non_exhaustive](or keep fields private) to reduce semver breakage.
MakingReplicationDataand all fieldspublocks you into this shape;#[non_exhaustive]is a low-cost guardrail.#[derive(Debug)] +#[non_exhaustive] pub struct ReplicationData { pub leader_last_index_before: u64, pub current_term: u64, pub commit_index: u64, pub peer_next_indices: HashMap<u32, u64>, }
617-631: Publicprepare_peer_entriesreturnsDashMap+ clones input—document ownership/perf expectations.
Now that this ispub, callers may assume it’s cheap; but it clonesnew_entriesand later clones per peer. A short doc note (or a future API taking shared buffers) will prevent misuse.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
.github/workflows/benchmark.yml(1 hunks)d-engine-client/src/error.rs(8 hunks)d-engine-client/src/grpc_kv_client.rs(6 hunks)d-engine-client/src/kv_client.rs(1 hunks)d-engine-client/src/pool.rs(10 hunks)d-engine-client/src/pool_test.rs(1 hunks)d-engine-core/src/network/background_snapshot_transfer.rs(1 hunks)d-engine-core/src/network/background_snapshot_transfer_test.rs(2 hunks)d-engine-core/src/network/mod.rs(7 hunks)d-engine-core/src/raft.rs(10 hunks)d-engine-core/src/raft_role/candidate_state.rs(16 hunks)d-engine-core/src/raft_role/leader_state.rs(40 hunks)d-engine-core/src/raft_role/mod.rs(9 hunks)d-engine-core/src/raft_role/role_state.rs(7 hunks)d-engine-core/src/replication/mod.rs(3 hunks)d-engine-core/src/replication/replication_handler.rs(10 hunks)d-engine-core/src/state_machine_handler/default_state_machine_handler.rs(12 hunks)d-engine-core/src/storage/lease.rs(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- d-engine-client/src/pool_test.rs
- d-engine-core/src/raft.rs
🧰 Additional context used
🧬 Code graph analysis (4)
d-engine-core/src/network/mod.rs (1)
d-engine-core/src/network/background_snapshot_transfer.rs (1)
mpsc(83-83)
d-engine-core/src/replication/replication_handler.rs (1)
d-engine-core/src/utils/cluster.rs (1)
is_majority(8-13)
d-engine-core/src/storage/lease.rs (2)
d-engine-server/src/storage/lease.rs (10)
register(200-214)unregister(221-227)is_expired(234-243)get_expired_keys(252-269)on_apply(280-293)has_lease_keys(300-302)may_have_expired_keys(312-329)len(336-338)to_snapshot(345-354)reload(363-395)d-engine-core/src/storage/state_machine.rs (2)
len(62-62)is_empty(66-68)
d-engine-core/src/raft_role/mod.rs (2)
d-engine-core/src/raft_role/leader_state.rs (6)
new(105-113)new(2008-2049)update_voted_for(290-295)voted_for(284-286)match_index(327-332)next_index(297-306)d-engine-core/src/raft_role/role_state.rs (7)
update_voted_for(295-300)current_term(230-232)voted_for(289-291)commit_index(242-244)match_index(65-71)node_id(42-44)next_index(49-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: audit
- GitHub Check: Lint and Format Check
- GitHub Check: Analyze (rust)
- GitHub Check: Lint and Format Check
- GitHub Check: audit
🔇 Additional comments (36)
.github/workflows/benchmark.yml (2)
3-4: Permissions block properly implemented, addressing previous CodeQL feedback.The explicit
permissions: contents: readblock correctly restricts the GITHUB_TOKEN to read-only access, which aligns with the principle of least privilege. This addresses the earlier security concern flagged by CodeQL.
1-83: Well-structured benchmark workflow with sound caching and artifact strategies.The workflow is well-designed for performance testing:
- Scheduling & triggers (lines 7–13): Weekly runs with manual override provide flexibility for on-demand benchmarking.
- Rust setup (lines 30–36): Stable toolchain + protobuf-compiler installation is correct for the workspace.
- Caching (lines 38–42): Custom cache key
benchmark-*appropriately isolates benchmark cache from test runs.- Full history checkout (line 28):
fetch-depth: 0enables meaningful performance comparisons across commits.- Output & artifacts (lines 62–83): Criterion results + log tee + 30-day retention and clear summary messaging support post-hoc analysis and iteration.
d-engine-core/src/network/background_snapshot_transfer.rs (1)
30-33: LGTM! Import migration to external proto crate is consistent.The migration from
crate::proto::server::storage::*tod_engine_proto::server::storage::*aligns with the PR's architectural goal of using the external proto crate.d-engine-core/src/network/background_snapshot_transfer_test.rs (2)
24-46: LGTM! Well-structured test helper for creating snapshot streams.The helper function properly:
- Creates test chunks with metadata on seq 0
- Maps tonic Status errors to NetworkError appropriately
- Returns a properly boxed stream
19-22: Import migration to d_engine_proto is consistent with main module.The test file properly imports
SnapshotChunkandSnapshotResponsefromd_engine_proto::server::storage, matching the production code's import paths.d-engine-core/src/network/mod.rs (4)
36-44: Verify public API exposure is intentional forAppendResults.The visibility has been widened from
pub(crate)topub. Ensure this struct is properly documented and re-exported from the crate's public API surface if intended for external use.
93-98: LGTM! Extended automock availability for integration testing.The
#[cfg_attr(any(test, feature = "test-utils"), automock)]pattern correctly enablesMockTransportboth for internal tests and for downstream crates using thetest-utilsfeature.
24-32: Import migration to d_engine_proto is consistent and complete.All protobuf types used in the Transport trait have been properly migrated from internal paths to
d_engine_proto::server::*.
305-309: Confirmgrpc_task_with_timeout_and_exponential_backoffshould be public API.This utility function is now
pubinstead ofpub(crate). While useful for consumers implementing custom gRPC operations, verify this is intentionally part of the v0.2.0 public API and document its usage accordingly.d-engine-core/src/storage/lease.rs (2)
1-24: LGTM! Excellent module documentation.The module documentation provides clear design philosophy, architecture overview, and concurrency guidelines. This sets appropriate expectations for trait implementers.
73-87: Parameter type choice is appropriate.The
registermethod takes ownership ofkey: Byteswhile other methods (likeunregister,is_expired) use&[u8]. This asymmetry is intentional and efficient:registerneeds to store the key, while lookups only need a reference.d-engine-client/src/kv_client.rs (2)
1-51: Well-documented trait interface with clear consistency guarantees.The module documentation clearly explains the design principles, implementations, and performance characteristics. The trait's thread safety requirements (
Send + Sync) and consistency guarantees are well-specified.
74-197: LGTM!Method signatures are consistent with idiomatic Rust async trait patterns. Using
impl AsRef<[u8]> + Sendprovides flexibility for callers while maintaining type safety. The idempotent delete semantics are correctly documented.d-engine-client/src/grpc_kv_client.rs (3)
14-36: LGTM!Imports are well-organized with clear aliasing (
KvClient as CoreKvClient). The struct definition withArc<ArcSwap<ClientInner>>enables efficient concurrent access with runtime-updatable configuration.
310-358: LGTM!The watch implementation correctly uses load-balanced client selection since followers can serve watch streams. Documentation clearly notes the v1 limitation (no prefix/range watch support).
398-447: LGTM!The
CoreKvClienttrait implementation correctly delegates to the inherent methods and properly mapsClientResulttoBytes(extracting the.valuefield). Error propagation viaInto::intois clean and type-safe.d-engine-core/src/replication/mod.rs (2)
105-113: No action required; parameter change is complete and properly integrated.All implementors and call sites have been updated correctly. The single
ReplicationCoreimplementation inreplication_handler.rsincludes the newcluster_metadataparameter (line 65) and actively uses it on line 76 (let is_single_node = cluster_metadata.is_single_node;). The production call site inleader_state.rs(lines 1383-1389) properly passes&self.cluster_metadata, and all 30+ test cases pass the parameter correctly.
21-26: No issues found. Thetest-utilsfeature properly gatesmockallas an optional dependency inCargo.toml, and all#[cfg(any(test, feature = "test-utils"))]guards correctly restrict mockall imports to test/test-utils builds only. Theasync-traitcrate is a non-optional dependency (needed for all async trait definitions), sotonic::async_traitis unconditionally available—this is correct. Tonic v0.12.3 is declared with proper features. Test-only dependencies do not leak into production.d-engine-core/src/state_machine_handler/default_state_machine_handler.rs (5)
111-113: Good design for optional feature integration.The use of
Option<Arc<WatchManager>>provides zero-overhead when watch functionality is disabled, with clear documentation explaining the behavior.
186-198: LGTM: Watcher notification correctly guarded by success check.The previous concern about notifying watchers even on error has been properly addressed. The notification now only occurs when
apply_result.is_ok()is true, ensuring watchers are not notified about failed operations.
709-739: LGTM: Clean watcher notification implementation.The method correctly:
- Gracefully handles decode failures with
if let Ok- Uses pattern matching for operation variants
- Maintains performance with inline attribute and early returns
The silent skipping of decode errors is appropriate here, as not all entries may contain decodable WriteCommands.
746-796: LGTM: Constructor properly updated with test helper.The changes correctly:
- Add the
watch_managerparameter to the main constructor- Initialize all fields including the new watch_manager
- Provide a test-only
new_without_watchhelper for backward compatibility in testsThe
#[cfg(test)]attribute appropriately limits the helper to test builds.
400-476: LGTM: Snapshot cleanup properly parameterized.The cleanup method now correctly accepts a configurable
snapshot_dir_prefixparameter and threads it through toparse_snapshot_dirnamecalls, both for direct filenames and.tar.gzsuffixed variants.d-engine-client/src/pool.rs (2)
189-214: Good fix: “no leader” now maps to retryableClusterUnavailable.
This matches typical “election in progress / temporarily unavailable” semantics and should improve client retry behavior.
39-54: Returning/storingcurrent_leader_idlooks correct; ensure callers treat it as advisory.
current_leader_idis snapshotted at pool build/refresh time and used only during initialization to map node ID to leader address. It's exposed viaget_leader_id()for metadata queries, not for per-request routing. Staleness is handled naturally by subsequent RPC redirects when the leader changes.d-engine-core/src/raft_role/role_state.rs (3)
394-424: Leader discovery notification gating looks good; double-check correctness of “leader committed” semantics.
You now (1) atomically store leader_id, (2) persist/recordVotedFor{committed:true}, and (3) emitLeaderDiscoveredonly whenupdate_voted_for()reports a new committed transition. That’s a solid de-dupe strategy—just ensureSharedState::update_voted_for()is the single source of truth for “committed leader” and is persisted where required.
35-37: No action required — all callers ofupdate_voted_foralready properly handle theResult<bool>return type. The API migration is complete with no compilation breaks or missing updates in implementations, tests, or call sites.
89-96: This concern is already properly addressed in the code.
init_cluster_metadata()is correctly called in theRoleEvent::BecomeLeaderhandler (raft.rs:343) immediately after the Leader role is created and before any hot paths execute. The call sequence ensures cluster topology is cached beforeprocess_batchand other leader operations accesscluster_metadatafor quorum calculations. Error handling with the?operator propagates any failures, and the Leader implementation correctly populatesis_single_nodeandtotal_votersbased on the membership configuration.d-engine-core/src/raft_role/candidate_state.rs (1)
390-398: Returning NOT_LEADER with leader metadata on Candidate proposals is the right UX.
This aligns Candidate behavior with redirection and avoids clients waiting on election completion.d-engine-core/src/raft_role/mod.rs (3)
79-88: Atomic leader-id ordering upgraded to Acquire/Release: good.
This matches the intended “publish leader info / observe leader info” pattern for cross-thread visibility.Also applies to: 146-167
190-214:SharedState::update_voted_for()semantics look consistent with event-driven leader discovery.
The “only true on committed transition/leader-term change” contract is clear and matches how roles consume it.
48-61:HardState: Copyis valid;VotedForimplementsCopyand serde traits.
The proto-generatedVotedForstruct explicitly derivesClone, Copy, PartialEq, ::prost::Message, serde::Serialize, serde::Deserialize. All its fields (u32, u64, bool) are primitives, makingCopysafe.Option<VotedFor>is therefore alsoCopy. The current derives onHardStateare correct and the code compiles without issues.d-engine-core/src/raft_role/leader_state.rs (1)
500-520: Verify comment: ensure voters() excludes self and remove the off-by-one concern.The
voters.len() + 1calculation is correct—membership.voters()explicitly excludes the leader node (filtered bynode.id != self.node_id), so adding 1 accounts for self. This is not an off-by-one error.However, a maintainability concern remains:
update_cluster_metadata()only updatestotal_votersand does not recomputeis_single_node, whileinit_cluster_metadata()computes both. Sinceis_single_node_cluster()derives from the immutableinitial_cluster_size, desynchronization is impossible at runtime. Consider adding a comment inupdate_cluster_metadata()explaining thatis_single_nodeis immutable configuration and does not need recomputation, to prevent future confusion or refactoring mistakes.d-engine-core/src/replication/replication_handler.rs (3)
634-671:build_append_requestlogic looks consistent with follower “virtual log” handling.
Defaulting to(prev_log_index, prev_log_term) = (0, 0)aligns withcheck_append_entries_request_is_legal’s special-case accept path.
60-67: Verifyis_single_nodeconsistency with actual cluster membership (lines 74-96).
cluster_metadata.is_single_nodeis used to determine quorum behavior; if this field diverges from the authoritative membership state, replication can be incorrectly skipped. Ensure either: (1)is_single_nodeis derived fromMembershipat call time, or (2) there is an assertion verifying consistency withMembershipwhencluster_metadatais instantiated.
149-164: No action needed. Single-node commit index advancement is correctly implemented with a dedicated code path that does not depend on peer match indices.
| #[cfg_attr(any(test, feature = "test-utils"), automock)] | ||
| #[async_trait] |
There was a problem hiding this comment.
test-utils automock is enabled, but tests modules are still #[cfg(test)] only.
If the goal is to compile/use mocks under --features test-utils outside cfg(test) (e.g., integration tests in other crates), consider also widening local test module gating (e.g., batch_buffer_test) or documenting that test-utils only affects mock generation, not unit tests in this crate.
Also applies to: 13-15
🤖 Prompt for AI Agents
In d-engine-core/src/replication/mod.rs around lines 59-60 (and also apply same
change for lines 13-15), the automock attribute is enabled for the "test-utils"
feature but the test modules themselves remain gated with #[cfg(test)],
preventing mocks from being available when building with --features test-utils
outside of unit-test compilation; either change the test module gating to
#[cfg(any(test, feature = "test-utils"))] (or equivalent per-module cfg) so
mocks are compiled when the feature is enabled, or update crate documentation to
state that "test-utils" only controls mock generation and does not enable the
internal test modules — implement the cfg change for relevant test modules if
you want mocks usable by integration tests in other crates.
…ale cache Bug: update_cluster_metadata() only updated total_voters, missing is_single_node. Caused test_embedded_node_rejoin to hang intermittently. Fix: - Update BOTH is_single_node and total_voters in update_cluster_metadata() - Call update_cluster_metadata() after learner promotion - Add unit test to catch this bug
Race condition: gRPC servers start async, election RPCs timeout before ready. Solution: Increase test election timeout 100ms→2000ms, retries 3→5. Only affects tests/common/mod.rs test configuration.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
d-engine-core/src/raft_role/leader_state.rs (1)
1599-1608: Consider replacingprintln!with structured logging.While the "Plan B" print statements serve immediate visibility needs, consider using the existing tracing infrastructure (info!/warn!) for consistency and production deployability.
Also applies to: 1985-1992
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
d-engine-core/src/raft_role/leader_state.rs(40 hunks)d-engine-core/src/raft_role/leader_state_test.rs(1 hunks)d-engine-core/src/raft_role/mod.rs(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
d-engine-core/src/raft_role/mod.rs (2)
d-engine-core/src/raft_role/leader_state.rs (6)
new(105-113)new(2014-2055)update_voted_for(290-295)voted_for(284-286)match_index(327-332)next_index(297-306)d-engine-core/src/raft_role/role_state.rs (5)
update_voted_for(295-300)voted_for(289-291)match_index(65-71)node_id(42-44)next_index(49-55)
🪛 GitHub Actions: CI
d-engine-core/src/raft_role/leader_state_test.rs
[error] 11-11: Rustfmt would modify this file: formatting changes detected (diffs shown at lines 11, 18, 27, 34, 41). Run 'cargo fmt' to apply formatting.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: audit
- GitHub Check: Analyze (rust)
- GitHub Check: audit
🔇 Additional comments (9)
d-engine-core/src/raft_role/mod.rs (4)
77-170: LGTM: Hot-path leadership tracking with correct memory ordering.The atomic
current_leader_idfield with Acquire/Release semantics properly synchronizes with persistent state updates, addressing the previous review concern about memory ordering.
193-217: LGTM: Leader commitment detection logic is sound.The boolean return value correctly identifies transitions to a committed leader vote, enabling event-driven leader discovery without hot-path overhead.
310-368: LGTM: Test-only accessor methods properly gated.The
cfg(any(test, feature = "test-utils"))guards correctly restrict these internal accessors to test contexts.
15-15: LGTM: Public API expansion aligns with v0.2.0 objectives.The newly public exports (
ClusterMetadata,as_i32,current_term,QuorumVerificationResult) support the broader API surface documented in the release.Also applies to: 267-267, 305-305, 421-421
d-engine-core/src/raft_role/leader_state.rs (5)
116-127: LGTM: ClusterMetadata struct design is sound.The cached topology metadata approach avoids repeated async membership queries in hot paths, with clear documentation of immutability constraints.
1416-1427: LGTM: Single-node commit optimization is correct.Single-node clusters can commit immediately (quorum of 1), while multi-node clusters use the existing majority-based calculation. The cached metadata avoids async calls in this hot path.
1930-1933: LGTM: Cluster metadata updated after successful join.Updating the cached metadata after membership changes ensures the hot-path commit logic uses current topology information.
2440-2442: LGTM: Leader ID set immediately in role transition.Setting
current_leader_idatomically during theCandidateState→LeaderStateconversion ensures the hot-pathcurrent_leader()reads are correct from the moment the new leader is created.
500-520: > Likely an incorrect or invalid review comment.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
cliff.toml (1)
66-67: Consider removing the redundant commit_parser rule.Line 67 with the
docs?regex pattern matches both "doc" and "docs" followed by an issue number, making line 66's^doc #\\d+:rule redundant. Both rules assign to the same "Documentation" group.If line 66 is intended only as a comment or historical reference, consider removing it to adhere to the DRY principle and simplify the configuration:
{ message = "^feat #\\d+:", group = "Features" }, { message = "^fix #\\d+:", group = "Bug Fixes" }, { message = "^refs #\\d+:", group = "Work in progress" }, - { message = "^doc #\\d+:", group = "Documentation" }, { message = "^docs? #\\d+:", group = "Documentation" }, { message = "^perf #\\d+:", group = "Performance" },
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/commit-message-check.yml(1 hunks)cliff.toml(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- .github/workflows/commit-message-check.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: audit
- GitHub Check: Analyze (rust)
- GitHub Check: audit
…tations - Add Project Status section to README (pre-1.0 + compatibility promise) - Simplify MIGRATION_GUIDE (remove NodeBuilder API, keep WAL format only) - Streamline CHANGELOG v0.2.0 highlights (code examples + concise features) - Update benchmark reports with actual test data (Dec 13, 2025) - Restructure performance report (TL;DR first, detailed results collapsed) Focus: Lower cognitive load for new users while maintaining transparency
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
README.md (1)
259-259: Fix typo in license section.Line 259 contains "d-eninge" which should be "d-engine".
Apply this diff:
-d-eninge is licensed under the terms of the [MIT License](https://en.wikipedia.org/wiki/MIT_License#License_terms) +d-engine is licensed under the terms of the [MIT License](https://en.wikipedia.org/wiki/MIT_License#License_terms)
🧹 Nitpick comments (5)
benches/d-engine-bench/reports/v0.1.4/report_20251205_remeasured.md (1)
3-3: Optional: Apply consistent date formatting per style guide.LanguageTool suggests that some style guides recommend commas to set off the year in month-day-year date format. If your project follows a formal style guide, consider updating both date mentions.
Apply this diff to add commas after the year:
-**Date:** December 13, 2025 +**Date:** December 13, 2025,-**Test Date:** December 13, 2025 +**Test Date:** December 13, 2025,Also applies to: 185-185
d-engine-client/src/kv_client.rs (3)
20-25: Example return type inconsistent with actual API.The example uses
Result<()>but trait methods returnKvResult<()>. Consider updating for consistency:-//! async fn store_config<C: KvClient>(client: &C) -> Result<()> { +//! async fn store_config<C: KvClient>(client: &C) -> KvResult<()> {
80-105: Consider documenting TTL edge case behavior.The behavior of
ttl_secs = 0is not specified. Does it mean "no expiration" or "immediate expiration"? Clarifying this edge case in the documentation would help implementers maintain consistent semantics.
168-171: API signature inconsistency with other methods.
get_multirequires&[Bytes]while other methods acceptimpl AsRef<[u8]> + Send. This forces callers to pre-convert keys toBytesfor batch operations, whereas single-key operations accept byte slices directly.If this is intentional for performance (avoiding per-key allocations), consider documenting the rationale. Otherwise, a more flexible signature could improve ergonomics:
async fn get_multi<K: AsRef<[u8]> + Send>( &self, keys: &[K], ) -> KvResult<Vec<Option<Bytes>>>;benches/d-engine-bench/reports/v0.2.0/report_v0.2.0_final.md (1)
260-260: Minor style suggestion: Add comma after year in date.Some style guides suggest commas after the year in month-day-year format. Currently "December 13, 2025" could be "December 13, 2025," (comma before or after depending on guide).
This is purely stylistic and can be deferred.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
benches/d-engine-bench/reports/v0.2.0/dengine_comparison_v0.2.0.pngis excluded by!**/*.png
📒 Files selected for processing (6)
CHANGELOG.md(1 hunks)MIGRATION_GUIDE.md(1 hunks)README.md(4 hunks)benches/d-engine-bench/reports/v0.1.4/report_20251205_remeasured.md(1 hunks)benches/d-engine-bench/reports/v0.2.0/report_v0.2.0_final.md(1 hunks)d-engine-client/src/kv_client.rs(1 hunks)
🧰 Additional context used
🪛 LanguageTool
benches/d-engine-bench/reports/v0.2.0/report_v0.2.0_final.md
[style] ~260-~260: Some style guides suggest that commas should set off the year in a month-day-year date.
Context: ... v0.2.0 Report Date: December 13, 2025 Test Environment: Apple M2 Mac mi...
(MISSING_COMMA_AFTER_YEAR)
benches/d-engine-bench/reports/v0.1.4/report_20251205_remeasured.md
[style] ~3-~3: Some style guides suggest that commas should set off the year in a month-day-year date.
Context: ...enchmark Report Date: December 13, 2025 Hardware: Apple Mac mini M2 (8-co...
(MISSING_COMMA_AFTER_YEAR)
[style] ~185-~185: Some style guides suggest that commas should set off the year in a month-day-year date.
Context: ...ne v0.1.4 Test Date: December 13, 2025 Benchmark Runs: Single run per co...
(MISSING_COMMA_AFTER_YEAR)
🪛 markdownlint-cli2 (0.18.1)
MIGRATION_GUIDE.md
71-71: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: audit
- GitHub Check: Analyze (rust)
- GitHub Check: audit
🔇 Additional comments (12)
benches/d-engine-bench/reports/v0.1.4/report_20251205_remeasured.md (1)
1-187: Excellent benchmark documentation structure.This remeasured baseline report is well-organized and provides clear, reproducible performance data for v0.1.4. The sections are logically arranged, test commands are fully specified, and the comparison with the original report includes reasonable context for differences. This provides a solid foundation for comparing v0.2.0 performance.
d-engine-client/src/kv_client.rs (4)
32-52: Well-designed trait definition.Good use of
async_traitwith explicitSend + Syncbounds. The documentation clearly explains consistency guarantees and performance characteristics for each implementation.
74-78: LGTM!The generic bounds
impl AsRef<[u8]> + Sendprovide good flexibility for callers while maintaining type safety.
136-139: LGTM!Consistent API design with clear semantics for missing/expired keys returning
None.
173-197: LGTM!Idempotent delete semantics are appropriate for a distributed KV store, avoiding race conditions and simplifying retry logic.
MIGRATION_GUIDE.md (1)
42-90: Migration strategies are clear and well-documented.The three migration options (Clean Start, Rolling Upgrade, Snapshot-based) are practical, actionable, and appropriately scoped for different deployment scenarios. The verification section and timeline are helpful for operators. No concerns.
CHANGELOG.md (1)
7-86: v0.2.0 release entry is comprehensive and well-structured.The new release notes clearly articulate feature value, provide actionable examples, explicitly call out breaking changes with migration guidance, and back performance claims with benchmarks. Structure and tone are professional. No concerns.
benches/d-engine-bench/reports/v0.2.0/report_v0.2.0_final.md (2)
3-20: TL;DR is clear and provides honest hardware context.The executive summary effectively communicates v0.2.0 advantages while explicitly noting hardware differences between test setups. This transparency is important for readers evaluating the results. Good practice.
86-134: Reproduction steps are detailed and executable.The benchmark commands include all necessary flags (endpoints, connections, clients, key/value sizes, consistency levels) for reproducibility. Commands span single-client, high-concurrency, and hot-key scenarios. Clear structure. No concerns.
README.md (3)
37-53: v0.2.0 features and core capabilities are well-articulated.The distinction between new features (Modular Workspace, TTL/Lease, Watch API, etc.) and core capabilities (Single-Node Start, Strong Consistency, Pluggable Storage) is clear and helpful. Feature descriptions are concise with concrete examples (e.g., "<0.1ms latency" for LocalKvClient).
15-31: Project Status section sets proper expectations for pre-1.0 users.Clearly states that breaking changes may occur between minor versions, explicitly references MIGRATION_GUIDE.md for compatibility promise, and provides timeline context (v1.0 expected Q2 2026). This is valuable transparency for production users. Approach is professional and encourages adoption while managing expectations.
57-116: Quick Start reorganization is significantly improved.Splitting installation guidance into three distinct modes (Embedded/Standalone/Custom Storage) with use cases, TOML examples, and "when to use" guidance resolves the ambiguity that likely existed in v0.1.x. Each option includes:
- Clear TOML syntax for each mode
- Use case examples (e.g., "Go/Python/Java apps for Standalone")
- Performance note for context
- Link to relevant documentation
This structure is intuitive for both new users and experienced operators choosing deployment architecture.
…ments This PR consolidates architecture review improvements and critical bug fixes for v0.2.0 release. ## 🎯 Critical Fixes - **#212**: Fix learner promotion stuck (voter count + role transition bugs) - **#218**: Fix leader next_index initialization for new learners - **#222**: Return NOT_LEADER with leader metadata for client redirection - **#209**: Fix node restart wait_ready() timeout (leader notification race) ## ✨ Feature Additions - **#213**: Implement READ_ONLY Learner nodes for permanent analytics - **#219**: Pre-generate Go protobuf code for zero-config experience ## 🔧 Refactoring & Performance - **#223**: Optimize check_learner_progress() lock contention - Extract 5 helper methods (SRP), add 11 unit tests - **#211**: Remove Arc::get_mut anti-pattern from lease injection - **#210**: Simplify watch architecture (tokio::broadcast, 90% code sharing) - **#217**: Refactor Node::run() with strategy pattern - **#209**: Consolidate Raft unit tests (migrate 27 tests from server to core) ## 📚 Documentation - Restructure quick-start docs (embedded + standalone examples) - Add integration-modes.md and use-cases.md - Delete 911 lines of internal architecture docs (20/80 principle) - Fix Go client example with pre-generated protobufs ## 🧪 Testing & Validation - 14+ new unit tests across components - Fix flaky tests and timing issues (#204, #209) - Optimize test suite with nextest - All 430 core + 305 server + 292 integration tests passing ## 📊 Performance - Benchmarks stable (±3% variance) - CI optimized: removed benchmark compile check (saves 2-3 min/run) --- **Closes:** #209, #210, #211, #212, #213, #217, #218, #219, #222, #223 **Migration Notes:** - Config: `raft.watch.enabled` removed (breaking) - API: `StateMachine::start()` changed to async (breaking) - NodeStatus enum refactored: PROMOTABLE/READ_ONLY/ACTIVE (non-breaking) **Files Changed:** 100+ files, ~5000 insertions, ~1500 deletions
Type
Description
Release v0.2.0
Related Issues
This release includes work from the following issues:
rolefield from cluster config - use voters/learners lists only #202, Fix ClusterMembership API: Separate static role from dynamic leader info #201make test-all#197, feat(watch): Implement client Watch gRPC streaming service #196, docs(bench): Update v0.2.0 benchmark report with accurate v0.1.4 baseline #195Checklist
Summary by CodeRabbit
New Features
Documentation
Chores
Tests
✏️ Tip: You can customize this high-level summary in your review settings.