CacheRoute v0.1.8
Overview
CacheRoute v0.1.8 focuses on making scheduling-time prediction and runtime timing observability consistent across both KVCache injection and text injection workflows.
This release improves:
- prepare-stage time modeling,
- KDN→Instance KV transfer serialization behavior,
- high-RPS prediction stability,
- and per-task timing transparency for controlled experiments.
Highlights
1) KVCache injection timing model upgrade
- Added more complete KVCache prepare-time decomposition.
- Introduced KDN-link-aware prediction behavior for KV transfer waiting/servicing.
- Improved alignment between predicted and observed timings under contention.
2) Prepare-stage prediction is now more practical for experiments
- Prediction fields now better represent the full prepare lifecycle for KVCache tasks.
- Timing breakdown is easier to interpret for replay/debug/reporting workflows.
3) Better pending-task prediction behavior under concurrency
- Added pending-task recompute behavior to reduce drift when many KVCache tasks queue on the same KDN link.
- Helps reduce underestimation during sustained load.
4) Stronger observability and timing diagnostics
- Added/expanded task-level trace fields and timing logs for:
- prepare prefix,
- KV link wait,
- KV transfer estimate,
- KV ack timestamps,
- full prepare correction points.
- This improves reproducibility and supports strategy research.
What’s unchanged in v0.1.8
- Text-mode serving behavior remains compatible.
- Ready-queue core flow and decode-path logic are preserved.
- Existing Redis load predictor input convention remains unchanged.
- Existing KV size modeling convention remains unchanged.
Why this matters
v0.1.8 is a foundation release for strategy research:
- it improves timing signal quality,
- keeps behavior incremental and observable,
- and prepares the scheduler/proxy stack for policy-level routing decisions.
Next Stage (Planned)
A) Injection-intent decision strategy
Build upper-layer policy logic to dynamically choose among:
kvcachetexthybrid
based on queue state, transfer cost, compute cost, and resource availability.
B) Prefix-cache-aware routing
Develop routing policies that consider:
- prefix reuse potential,
- cache affinity,
- and routing cost/benefit tradeoffs across instances.
Goal: improve TTFT/cost efficiency while keeping throughput and experiment stability.
Version
Release: v0.1.8