Skip to content

v0.14.0 — Performance & Benchmarking

Choose a tag to compare

@quinnjr quinnjr released this 04 Mar 20:34
· 8 commits to develop since this release
3d170a9

Performance & Benchmarking

This release focuses on profiling hot paths across the framework and delivering measurable performance improvements, along with expanded benchmark coverage for recently added components.

Performance Improvements

Area Metric Improvement
ToolRegistry spec generation uncached 50-tool lookup -33% (10.4 µs → 6.9 µs)
Chain transforms 3-stage pipeline -30% (287 ns → 210 ns)
HotSwapAgent prompt simple prompt -18% (2.5 µs → 1.8 µs)
HotSwapAgent swap model swap -26% (140 ns → 112 ns)
DAG fan-out 3-way fan-out + merge -11% (10.9 µs → 10.5 µs)

What Changed

  • ToolRegistry: generation-based cache invalidationtool_specs() uses a generation counter to detect stale caches, avoiding redundant recomputation.
  • Memory contiguous slice cloneSlidingWindowMemory and TokenWindowMemory now use make_contiguous().to_vec() instead of iter().cloned().collect(), producing a single memcpy.
  • SlidingWindowMemory: single-pop eviction — replaced while loop with single if check since only one message is added at a time.
  • ReAct loop: reduced cloning — tool calls moved with std::mem::take instead of .to_vec(); middleware short-circuit paths move messages instead of cloning.
  • MiddlewareStack: early return when empty — all three middleware pipeline methods return Continue immediately when no middleware is registered.

New Benchmarks

  • HotSwapAgent (prompt, swap_model)
  • InProcessBroker (submit/receive/complete roundtrip)
  • InProcessEventBus (publish/receive)
  • InMemoryCheckpoint (save/load)
  • SerializableStreamEvent (serialize/deserialize)

Full Changelog: v0.13.0...v0.14.0