Skip to content

Conversation

@PeaBrane
Copy link
Contributor

@PeaBrane PeaBrane commented Oct 27, 2025

Overview:

as titled

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Enhanced error messages for network communication failures, including granular details for reader/writer task failures and message decoding errors.
  • New Features

    • Added RunningMean utility module for statistical calculations and sliding-window averaging operations.

Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
Signed-off-by: PeaBrane <yanrpei@gmail.com>
@PeaBrane PeaBrane requested a review from a team as a code owner October 27, 2025 18:29
@github-actions github-actions bot added the chore label Oct 27, 2025
@PeaBrane PeaBrane closed this Oct 27, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 27, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

The PR removes the worker_id parameter from KV publisher initialization across C and Python bindings, deriving it from component's primary lease instead. Mocker's KV event handling shifts from move-block response channels to publisher-based approach. A new RunningMean utility supports hit-rate tracking. MoveBlock enum variants are extended with additional hash fields for block tracking. Error handling in TCP client is refined.

Changes

Cohort / File(s) Summary
Worker ID parameter removal
lib/bindings/c/src/lib.rs, lib/bindings/python/rust/llm/kv.rs, lib/llm/src/kv_router/publisher.rs
worker_id parameter removed from dynamo_llm_init, KvEventPublisher::new, and related KV publisher functions; derived from component's primary lease instead.
Mocker KV event publisher refactoring
lib/llm/src/mocker/kv_manager.rs, lib/llm/src/mocker/engine.rs
KV event publishing path introduced, replacing move-block response channel with optional Arc-wrapped KvEventPublisher. KvManager::new delegates to new_with_publisher accepting optional Component and dp_rank. Engine removes direct KV event publishing, adds conditional component propagation based on worker type.
MoveBlock enum and protocol updates
lib/llm/src/mocker/protocols.rs, lib/llm/src/mocker/sequence.rs
MoveBlock::Use extended to carry Vec<BlockHash>; MoveBlock::Promote extended with BlockHash parameter. Removed block_response_to_kv_event helper. Updated PrefillCost::predict_prefill_compute formula constants. JSON builder extended for worker type determination.
Mocker scheduler refactoring
lib/llm/src/mocker/scheduler.rs
Scheduler constructor updated to accept optional Component; removed kv_events_tx channel. Replaced VecDeque with RunningMean for hit-rate tracking. Refactored scheduling loop with dedicated try_schedule function. Updated timing models and removed try_prefill return of block hashes.
New running mean utility
lib/llm/src/mocker/running_mean.rs
New generic sliding-window running mean calculator with new, push, mean, len, is_empty, and clear methods.
Mocker module exports
lib/llm/src/mocker.rs
Added pub mod running_mean export.
KV router logging refinement
lib/llm/src/kv_router/indexer.rs
Minor logging adjustments: dp_rank logged as plain value; additional num_blocks field in parent block lookup failure; block_hash and worker_id references in remove operation failure path.
TCP client error handling
lib/runtime/src/pipeline/network/tcp/client.rs
Granular join-task error handling with separate cases for reader error, writer error, and both failing. Improved decoding error message context.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • High heterogeneity: Interconnected changes across KV publisher initialization, mocker refactoring, protocol updates, and scheduler logic requiring coordination.
  • Dense logic: Substantial refactoring in scheduler.rs (timing models, scheduling loop), kv_manager.rs (publisher-based event handling), and protocols.rs (enum signature changes).
  • Interdependent changes: MoveBlock variant changes cascade across multiple files; scheduler and kv_manager changes depend on protocol updates.
  • Wide file spread: 12 affected files with varying levels of change complexity.

Areas requiring extra attention:

  • lib/llm/src/mocker/scheduler.rs — Complex refactoring of scheduling loop, timing calculations, and signal handling; validate try_schedule logic and RunningMean integration.
  • lib/llm/src/mocker/kv_manager.rs — Publisher-based event generation logic; ensure publish_kv_event correctly converts block data and handles all code paths.
  • lib/llm/src/mocker/protocols.rs and sequence.rs — Validate MoveBlock enum changes are correctly propagated and that all callsites match new signatures.
  • lib/llm/src/kv_router/publisher.rs — Verify worker_id derivation from component's primary lease is correct and handles edge cases.

🐰 Worker IDs drift away, blocks hash in the light,
RunningMean smooths the jitter through the night,
Mocker dances free with publish'd events so bright,
Component-led routing sets the scheduler right!


📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5953568 and b743823.

📒 Files selected for processing (12)
  • lib/bindings/c/src/lib.rs (2 hunks)
  • lib/bindings/python/rust/llm/kv.rs (1 hunks)
  • lib/llm/src/kv_router/indexer.rs (2 hunks)
  • lib/llm/src/kv_router/publisher.rs (1 hunks)
  • lib/llm/src/mocker.rs (1 hunks)
  • lib/llm/src/mocker/engine.rs (7 hunks)
  • lib/llm/src/mocker/kv_manager.rs (7 hunks)
  • lib/llm/src/mocker/protocols.rs (2 hunks)
  • lib/llm/src/mocker/running_mean.rs (1 hunks)
  • lib/llm/src/mocker/scheduler.rs (8 hunks)
  • lib/llm/src/mocker/sequence.rs (9 hunks)
  • lib/runtime/src/pipeline/network/tcp/client.rs (2 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants