Add provider-specific prompt caching support in owrap.ai.js (OpenAI + Anthropic)#1841
Merged
Conversation
Agent-Logs-Url: https://github.com/OpenAF/openaf/sessions/62a5d58d-75bd-4f94-a820-746c4b047a40 Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com>
Agent-Logs-Url: https://github.com/OpenAF/openaf/sessions/62a5d58d-75bd-4f94-a820-746c4b047a40 Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com>
Agent-Logs-Url: https://github.com/OpenAF/openaf/sessions/62a5d58d-75bd-4f94-a820-746c4b047a40 Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com>
Agent-Logs-Url: https://github.com/OpenAF/openaf/sessions/62a5d58d-75bd-4f94-a820-746c4b047a40 Co-authored-by: nmaguiar <11761746+nmaguiar@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Improve prompt caching for GPT provider implementations
Add provider-specific prompt caching support in May 21, 2026
owrap.ai.js (OpenAI + Anthropic)
Contributor
There was a problem hiding this comment.
Pull request overview
Extends owrap.ai.js GPT provider implementations to expose provider-specific prompt-caching token accounting (OpenAI + Anthropic) and adds opt-in Anthropic prompt caching controls, alongside new focused regression tests and small ODoc updates.
Changes:
- OpenAI:
_captureStatsnow captures cache/audio/reasoning token details from nestedusage.*_tokens_detailsfields. - Anthropic: introduces
promptCachingoption to send the beta header and injectcache_controlmarkers;_captureStatsnow captures cache read/creation token counters. - Adds automated tests covering OpenAI stats capture and Anthropic prompt-caching header/body/stat behavior; updates ODoc to document the new option and stats fields.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
js/owrap.ai.js |
Adds OpenAI nested usage capture; Anthropic opt-in prompt caching header + payload shaping; documents promptCaching and expanded stats. |
tests/autoTestAll.AI.js |
Adds unit tests validating OpenAI cached/audio/reasoning token capture and Anthropic prompt-caching header/body/stat handling. |
tests/autoTestAll.AI.yaml |
Wires new AI tests into the test job list and TODO list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR extends GPT provider handling in
js/owrap.ai.jsto properly surface prompt-cache token accounting and enable Anthropic prompt caching controls without changing default behavior. Gemini/Ollama behavior remains unchanged aside from documentation clarity.OpenAI: capture cache-aware usage stats
_captureStatsto include nested usage fields when present:usage.prompt_tokens_details.cached_tokens→tokens.cachedusage.prompt_tokens_details.audio_tokens→tokens.audiousage.completion_tokens_details.reasoning_tokens→tokens.reasoningAnthropic: opt-in prompt caching support
aOptions.promptCaching(defaultfalse)._requestand_requestStreamsend:anthropic-beta: prompt-caching-2024-07-31_captureStatswith:usage.cache_creation_input_tokens→tokens.cacheCreationusage.cache_read_input_tokens→tokens.cacheReadAnthropic: cache boundary hints in payload
promptCachingis enabled:systemis emitted as content blocks withcache_control: { type: "ephemeral" }.cache_control: { type: "ephemeral" }(while avoiding tool_result-only blocks).ODoc updates (
ow.ai.gpt+$gpt)promptCachingoption for Anthropic.getLastStats()now includes OpenAI cached-token accounting and Anthropic cache read/creation token counters.Focused AI tests
cache_controlpayload shaping and cache stats extraction.Original prompt
Overview
Improve
js/owrap.ai.jsto handle prompt caching for each of the GPT provider implementations (openai,gemini,anthropic,ollama).Changes Required
1. OpenAI — Capture cached token stats
OpenAI returns
prompt_tokens_details.cached_tokensin the usage response for models that support prompt caching (e.g.gpt-4o). Update_captureStatsin theopenaiimplementation to capture this:Also capture
completion_tokens_detailsif present (e.g.reasoning_tokensforo1/o3models):2. Anthropic — Full prompt caching support
Anthropic's prompt caching requires opt-in via a beta header and explicit
cache_controlmarkers on content blocks.a) New option:
aOptions.promptCaching(boolean, defaultfalse)When
true:"anthropic-beta": "prompt-caching-2024-07-31"to both_requestand_requestStream.b) Capture cache stats in
_captureStatsc) Allow
cache_controlon system promptWhen
promptCachingis enabled and thesystemfield is set, wrap the system content as a content block array withcache_controlon the last block:d) Allow
cache_controlon messagesWhen
promptCachingis enabled, mark the last user message (or the last large content block) withcache_control: { type: "ephemeral" }to hint to Anthropic where the cache boundary should be.3. Gemini — Already captures
cachedContentTokenCount✅No changes required for stats capture. Optionally, add a note in the
addSystemPromptODoc that Gemini performs implicit caching on large system instructions.4. Ollama — No changes needed
Ollama uses local KV cache implicitly. No API-level prompt caching changes are needed.
5. Update ODoc comments
Update the
ow.ai.gpt(aType, aOptions)ODoc to document the newpromptCachingoption for Anthropic and note that OpenAI cached token stats are now captured ingetLastStats().Update the
$gptODoc similarly.Files to Modify
js/owrap.ai.jsNotes
promptCachingdefaults tofalse.tokens.cachedfield ingetLastStats()for OpenAI andtokens.cacheRead/tokens.cacheCreationfor Anthropic allow callers to track cache efficiency.cache_controlsupport should only be applied whenaOptions.promptCaching === trueto avoid sending unsupported fields to older API versions or incompatible models.The following is the prior conversation context from the user's chat exploration (may be truncated):
User: Could owrap.ai.js be improved for each of the gpt implantations to handle prompt caching in the different providers?
Assistant: Let me start by finding and reading the
owrap.ai.jsfile in the repository.Assistant: Here's a detailed analysis of how prompt caching could be improved for each provider in
js/owrap.ai.js:Current State
✅ Gemini — Partially handled
Gemini already captures
cachedContentTokenCountin_captureStats(line 875):However, explicit Context Cache creation (Gemini's
cachedContentsAPI for large system prompts) is not supported. Users must rely on implicit caching only.❌ OpenAI — Not captured
OpenAI returns
prompt_tokens_details.cached_tokenswhen prompt caching kicks in (for models that support it, e.g.gpt-4o), but_captureStatsonly reads the top-levelusagefields. The fix would be: