v0.14.0 — Performance & Benchmarking
Performance & Benchmarking
This release focuses on profiling hot paths across the framework and delivering measurable performance improvements, along with expanded benchmark coverage for recently added components.
Performance Improvements
| Area | Metric | Improvement |
|---|---|---|
| ToolRegistry spec generation | uncached 50-tool lookup | -33% (10.4 µs → 6.9 µs) |
| Chain transforms | 3-stage pipeline | -30% (287 ns → 210 ns) |
| HotSwapAgent prompt | simple prompt | -18% (2.5 µs → 1.8 µs) |
| HotSwapAgent swap | model swap | -26% (140 ns → 112 ns) |
| DAG fan-out | 3-way fan-out + merge | -11% (10.9 µs → 10.5 µs) |
What Changed
- ToolRegistry: generation-based cache invalidation —
tool_specs()uses a generation counter to detect stale caches, avoiding redundant recomputation. - Memory contiguous slice clone —
SlidingWindowMemoryandTokenWindowMemorynow usemake_contiguous().to_vec()instead ofiter().cloned().collect(), producing a single memcpy. - SlidingWindowMemory: single-pop eviction — replaced
whileloop with singleifcheck since only one message is added at a time. - ReAct loop: reduced cloning — tool calls moved with
std::mem::takeinstead of.to_vec(); middleware short-circuit paths move messages instead of cloning. - MiddlewareStack: early return when empty — all three middleware pipeline methods return
Continueimmediately when no middleware is registered.
New Benchmarks
HotSwapAgent(prompt, swap_model)InProcessBroker(submit/receive/complete roundtrip)InProcessEventBus(publish/receive)InMemoryCheckpoint(save/load)SerializableStreamEvent(serialize/deserialize)
Full Changelog: v0.13.0...v0.14.0