Skip to content

CacheRoute-v0.1.8

Latest

Choose a tag to compare

@zhy1658858023 zhy1658858023 released this 08 May 04:00
ccd566d

CacheRoute v0.1.8

Overview

CacheRoute v0.1.8 focuses on making scheduling-time prediction and runtime timing observability consistent across both KVCache injection and text injection workflows.

This release improves:

  • prepare-stage time modeling,
  • KDN→Instance KV transfer serialization behavior,
  • high-RPS prediction stability,
  • and per-task timing transparency for controlled experiments.

Highlights

1) KVCache injection timing model upgrade

  • Added more complete KVCache prepare-time decomposition.
  • Introduced KDN-link-aware prediction behavior for KV transfer waiting/servicing.
  • Improved alignment between predicted and observed timings under contention.

2) Prepare-stage prediction is now more practical for experiments

  • Prediction fields now better represent the full prepare lifecycle for KVCache tasks.
  • Timing breakdown is easier to interpret for replay/debug/reporting workflows.

3) Better pending-task prediction behavior under concurrency

  • Added pending-task recompute behavior to reduce drift when many KVCache tasks queue on the same KDN link.
  • Helps reduce underestimation during sustained load.

4) Stronger observability and timing diagnostics

  • Added/expanded task-level trace fields and timing logs for:
    • prepare prefix,
    • KV link wait,
    • KV transfer estimate,
    • KV ack timestamps,
    • full prepare correction points.
  • This improves reproducibility and supports strategy research.

What’s unchanged in v0.1.8

  • Text-mode serving behavior remains compatible.
  • Ready-queue core flow and decode-path logic are preserved.
  • Existing Redis load predictor input convention remains unchanged.
  • Existing KV size modeling convention remains unchanged.

Why this matters

v0.1.8 is a foundation release for strategy research:

  • it improves timing signal quality,
  • keeps behavior incremental and observable,
  • and prepares the scheduler/proxy stack for policy-level routing decisions.

Next Stage (Planned)

A) Injection-intent decision strategy

Build upper-layer policy logic to dynamically choose among:

  • kvcache
  • text
  • hybrid

based on queue state, transfer cost, compute cost, and resource availability.

B) Prefix-cache-aware routing

Develop routing policies that consider:

  • prefix reuse potential,
  • cache affinity,
  • and routing cost/benefit tradeoffs across instances.

Goal: improve TTFT/cost efficiency while keeping throughput and experiment stability.


Version

Release: v0.1.8