CacheRoute-v0.1.7
This release formalizes the queue-prediction path around Prefill-first (TTFT-centric) behavior and adds explicit task stage lifecycle markers for future Decode modeling.
🚀 Highlights
Prefill-focused queue prediction remains the mainline behavior
Queue reservation/recompute continues to advance by first_token in the prefill chain, preserving TTFT-oriented queue semantics and avoiding decode-driven queue contamination.
Stage-aware task metadata is now explicit
ProxyTask includes predict_stage (default: "prefill"), and stage transitions can be represented directly in task state and trace fields.
Decode lifecycle tracking is introduced (without decode scheduling impact)
On the first token, tasks transition into the decode stage and are added to per-instance active decode tracking; on the forward end, they are removed, and the decode end timestamp is recorded. This is lifecycle observability only and does not alter prefill scheduling decisions.
Timing observability expanded
Existing timing logs now include stage visibility and active decode count for easier experiment analysis and future model extension planning.
✅ Compatibility
No queue-mainline redesign.
No decode batch-size tracker was introduced in this version.
No change to the forward request flow contract.
Existing prefill-driven queue prediction behavior is preserved while adding decode-stage observability hooks.
🔭 What this enables next
v0.1.7 establishes clean extension points for upcoming KVCache-based injection predictors while keeping the current prefill queue behavior stable and reproducible.