Releases: BJTU-ANT/CacheRoute
Release list
CacheRoute-v0.1.8
CacheRoute v0.1.8
Overview
CacheRoute v0.1.8 focuses on making scheduling-time prediction and runtime timing observability consistent across both KVCache injection and text injection workflows.
This release improves:
- prepare-stage time modeling,
- KDN→Instance KV transfer serialization behavior,
- high-RPS prediction stability,
- and per-task timing transparency for controlled experiments.
Highlights
1) KVCache injection timing model upgrade
- Added more complete KVCache prepare-time decomposition.
- Introduced KDN-link-aware prediction behavior for KV transfer waiting/servicing.
- Improved alignment between predicted and observed timings under contention.
2) Prepare-stage prediction is now more practical for experiments
- Prediction fields now better represent the full prepare lifecycle for KVCache tasks.
- Timing breakdown is easier to interpret for replay/debug/reporting workflows.
3) Better pending-task prediction behavior under concurrency
- Added pending-task recompute behavior to reduce drift when many KVCache tasks queue on the same KDN link.
- Helps reduce underestimation during sustained load.
4) Stronger observability and timing diagnostics
- Added/expanded task-level trace fields and timing logs for:
- prepare prefix,
- KV link wait,
- KV transfer estimate,
- KV ack timestamps,
- full prepare correction points.
- This improves reproducibility and supports strategy research.
What’s unchanged in v0.1.8
- Text-mode serving behavior remains compatible.
- Ready-queue core flow and decode-path logic are preserved.
- Existing Redis load predictor input convention remains unchanged.
- Existing KV size modeling convention remains unchanged.
Why this matters
v0.1.8 is a foundation release for strategy research:
- it improves timing signal quality,
- keeps behavior incremental and observable,
- and prepares the scheduler/proxy stack for policy-level routing decisions.
Next Stage (Planned)
A) Injection-intent decision strategy
Build upper-layer policy logic to dynamically choose among:
kvcachetexthybrid
based on queue state, transfer cost, compute cost, and resource availability.
B) Prefix-cache-aware routing
Develop routing policies that consider:
- prefix reuse potential,
- cache affinity,
- and routing cost/benefit tradeoffs across instances.
Goal: improve TTFT/cost efficiency while keeping throughput and experiment stability.
Version
Release: v0.1.8
CacheRoute-v0.1.7
This release formalizes the queue-prediction path around Prefill-first (TTFT-centric) behavior and adds explicit task stage lifecycle markers for future Decode modeling.
🚀 Highlights
Prefill-focused queue prediction remains the mainline behavior
Queue reservation/recompute continues to advance by first_token in the prefill chain, preserving TTFT-oriented queue semantics and avoiding decode-driven queue contamination.
Stage-aware task metadata is now explicit
ProxyTask includes predict_stage (default: "prefill"), and stage transitions can be represented directly in task state and trace fields.
Decode lifecycle tracking is introduced (without decode scheduling impact)
On the first token, tasks transition into the decode stage and are added to per-instance active decode tracking; on the forward end, they are removed, and the decode end timestamp is recorded. This is lifecycle observability only and does not alter prefill scheduling decisions.
Timing observability expanded
Existing timing logs now include stage visibility and active decode count for easier experiment analysis and future model extension planning.
✅ Compatibility
No queue-mainline redesign.
No decode batch-size tracker was introduced in this version.
No change to the forward request flow contract.
Existing prefill-driven queue prediction behavior is preserved while adding decode-stage observability hooks.
🔭 What this enables next
v0.1.7 establishes clean extension points for upcoming KVCache-based injection predictors while keeping the current prefill queue behavior stable and reproducible.
CacheRoute-v0.1.6
CacheRoute-v0.1.6 是一个面向 Scheduler 侧 CacheRoute 策略 的阶段版本:在不引入加权打分黑箱的前提下,完成了从“可运行”到“可解释、可验证”的关键能力落地,包括 KDN/Proxy 联动决策、策略观测与实验入口统一化。
核心更新亮点
- 新增 CacheRoute 调度策略(Scheduler)
本版本引入 cacheroute 策略,采用 非加权词典序规则,避免权重参数难以自证的问题:
KDN 选择:text_full -> not_overloaded -> kv_cover_len -> load/tie-break。
Proxy 选择:topology_best_group -> load_safe_window -> knowledge_affinity -> load/tie-break。
该实现可直接消费 Scheduler 内部上下文,不依赖任务注入模式前置分流。.
- 可观测性增强:策略加载显示 + 决策快照
Scheduler 启动时明确输出当前加载策略(如 cacheroute)。
/debug/status 新增 strategy 字段,可快速确认生效策略。
/debug/strategy 返回最近一次策略决策快照(候选与最终选择)。
CacheRoute 默认输出简洁的一行日志(请求ID、选中KDN/Proxy、候选数),并支持开关关闭。.
- KDN↔Proxy 拓扑接入能力(静态 tier)
支持通过 meta.kdn_links 提供 KDN 到 Proxy 的拓扑分层信息(如 bandwidth/latency tier),供 CacheRoute 在 proxy 侧选择时使用,且保持兼容(不提供则自动退化)。
同时 demo 入口提供了便捷参数:
demo_scheduler.py --cacheroute
demo_proxy.py --kdn-links-json。
- 文档同步:阶段总结与验证路径
scheduler/README.md 与主 README 已同步补充阶段说明、最小验证命令、关键观测字段,便于发布后团队统一验证流程。
What's Changed
- Add CacheRoute strategy, KDN knowledge index and pass request context to strategy.select by @zhy1658858023 in #1
- Add CacheRoute strategy and KDN knowledge indexing; propagate request_ctx to scheduler and strategies by @zhy1658858023 in #2
- Add CacheRoute strategy: per-KDN index, topology-aware proxy selection, and demo flags by @zhy1658858023 in #3
New Contributors
- @zhy1658858023 made their first contribution in #1
Full Changelog: v0.1.5...v0.1.6
CacheRoute-v0.1.5
The CacheRoute prototype system supports concurrent task knowledge injection. It enhances the client's concurrent requests and KDN batch knowledge registration. It features a foundational knowledge base and task set. However, it still fails to resolve the issue of fixed hash block keys and lacks specific resource maintenance and knowledge policy integration.
CacheRoute-v0.1.4
The process actions based on text and KVCache injection have been completed. The next steps involve resource maintenance and upper-level policies, along with proxy and instance parallel task processing.
CacheRoute-v0.1.3
The basic CacheRoute architecture.
Finish the KDN_server.
Finish the interface between the scheduler and KDN servers. During its lifespan, the scheduler initiates the control plane[7002] to listen for registration requests from KDN. The scheduler supports dynamic maintenance of knowledge within the KDN pool.
Finish the interface between the scheduler and the proxy. The scheduler can maintain a dynamic proxy pool in the control plane. Improve scheduler output, strategy, and resource maintenance.
Finish integrating the scheduler with the KDN and proxy selection strategies, supporting simple round-robin scheduling.
Finish the interface between the proxy and the instance. The proxy can maintain a dynamic instance pool in the control plane[8002].
Build the prepare-ready parallel task queue for proxy, prepare queue for knowledge injection. The ready queue forward task to instances.
The knowledge-oriented routing of the scheduler strategy TBD.
The scheduler maintains the status of tasks TBD.
Proxy parallel queue strategy TBD.
Scheduler/proxy resource updater TBD.
The KDN UI for easy use, TBD.
The instance resource collector above vLLM TBD.
CacheRoute-v0.1.2
The basic CacheRoute architecture.
Finish the KDN_server.
Finish the interface between the scheduler and KDN servers. During its lifespan, the scheduler initiates the control plane[7002] to listen for registration requests from KDN. The scheduler supports dynamic maintenance of knowledge within the KDN pool.
Finish the interface between the scheduler and the proxy. The scheduler can maintain a dynamic proxy pool in the control plane.
Finish integrating the scheduler with the KDN and proxy selection strategies, supporting simple round-robin scheduling.
Finish the interface between the proxy and the instance. The proxy can maintain a dynamic instance pool in the control plane[8002].
The knowledge-oriented routing of the scheduler strategy TBD.
The scheduler maintains the status of tasks TBD.
Proxy parallel queue strategy TBD.
Scheduler/proxy resource updater TBD.
The KDN UI for easy use, TBD.
The instance resource collector above vLLM TBD.
CacheRoute-v0.1.1
The basic CacheRoute architecture.
Finish the KDN_server.
Finish the interface between the scheduler and KDN servers. The scheduler can snapshot knowledge from the KDN server at lifespan and update dynamically.
Finish the interface between the scheduler and the proxy. The scheduler can maintain a dynamic proxy pool in the control plane.
Finish the interface between the proxy and the instance. The proxy can maintain a dynamic instance pool in the control plane.
The knowledge-oriented routing of the scheduler strategy TBD.
The scheduler maintains the status of tasks TBD.
Proxy parallel queue strategy TBD.
Scheduler/proxy resource updater TBD.
The KDN UI for easy use, TBD.
The instance resource collector above vLLM TBD.
CacheRoute-v0.1.0
The basic CacheRoute architecture.
Finish the KDN_server.
scheduler strategy TBD.
Proxy queue strategy TBD.
Scheduler/proxy resource updater TBD.