Skip to content

v0.2.0

Choose a tag to compare

@quinnjr quinnjr released this 03 Mar 20:18
· 13 commits to develop since this release

Added

  • Provider prompt caching across all backends:
    • Anthropic: with_prompt_caching() now correctly injects cache_control blocks on the system message and the last tool definition, enabling actual cache hits. Parses cache_creation_input_tokens and cache_read_input_tokens from usage with tracing.
    • OpenAI: Parses prompt_tokens_details.cached_tokens from the usage response (automatic prompt caching).
    • Azure OpenAI: Same as OpenAI — parses prompt_tokens_details.cached_tokens.
    • Google Gemini: New with_cached_content(name) builder method to reference a cachedContents/<id> resource. Parses cachedContentTokenCount from usage metadata.
    • AWS Bedrock: New with_prompt_caching() builder inserts CachePoint blocks after system messages and tool definitions. Parses cache_read_input_tokens from the Converse API response.
    • Ollama: New with_keep_alive(duration) builder to control KV cache retention (e.g. "5m", "1h", "0").
  • Usage::cached_tokens field — number of input tokens served from the provider's cache (subset of input_tokens, defaults to 0).

Fixed

  • Anthropic caching was non-functional: the with_prompt_caching() flag sent the beta header but never added cache_control content blocks, so no caching actually occurred. Now correctly marks the system prompt and last tool definition as cache breakpoints.