Feature hasn't been suggested before.
Describe the enhancement you want to request
Specifically, I have verified this is not a duplicate of #5416, #4317 or similar. These features are about generic prompt caching, while my suggestion specifically targets tools and skills.
Disclaimer: I did use Claude Sonnet 4.6 to help me structure the request. I certify that I (the human) am the owner and respondent of the idea.
Problem
We are seeing the adoption of long-running agents with a wide range of functionalities provided by tools and skills. As users build out more complex agents, the number of registered tools and skills can grow significantly.
With that amount of tools and skills:
- Every LLM step carries the full catalog in the context window — tool definitions through the AI SDK's
tools dict, and skill names/descriptions through the SkillTool description string. This grows linearly with the number of tools/skills and is re-sent on every step.
- Manually controlling the tools and skills becomes overwhelming for users. In fact, these agents are likely automatic, so humans may not even be in the loop.
Proposed Solution
I build on a belief of 80/20 principle: 80% of the time, the LLM only needs 20% (or a small subset) of the available tools/skills to complete the task. By keeping the full catalog in a discoverable but not immediately visible "L2 cache", we can reduce context size and improve relevance without sacrificing capability.
An L1/L2 cache layer sitting between the raw tool/skill discovery and the LLM call:
- L1: Actively used items. Injected normally (full schema for tools, listed in SkillTool description for skills).
- L2: Registered but cold items. Hidden from the LLM's default context. The LLM is told a discovery mechanism exists and uses it when standard tools don't suffice.
- LRU promotion: Every successful tool/skill use updates
last_used_at. On each promotion, if L1 exceeds the configured max, the least recently used L1 item is demoted to L2.
This does not change how tools/skills that are not registered in the cache behave. An uncached tool (e.g. through local skill folder, inside the json file) always passes through normally. This ensures 100% backward compatibility.
The LLM is provided with a single tool that allows it to do vector search of the L2 cache with a natural language description of what it needs. The recall of vector search can be improved over time with better algorithms.
Feature hasn't been suggested before.
Describe the enhancement you want to request
Specifically, I have verified this is not a duplicate of #5416, #4317 or similar. These features are about generic prompt caching, while my suggestion specifically targets tools and skills.
Disclaimer: I did use Claude Sonnet 4.6 to help me structure the request. I certify that I (the human) am the owner and respondent of the idea.
Problem
We are seeing the adoption of long-running agents with a wide range of functionalities provided by tools and skills. As users build out more complex agents, the number of registered tools and skills can grow significantly.
With that amount of tools and skills:
toolsdict, and skill names/descriptions through theSkillTooldescription string. This grows linearly with the number of tools/skills and is re-sent on every step.Proposed Solution
I build on a belief of 80/20 principle: 80% of the time, the LLM only needs 20% (or a small subset) of the available tools/skills to complete the task. By keeping the full catalog in a discoverable but not immediately visible "L2 cache", we can reduce context size and improve relevance without sacrificing capability.
An L1/L2 cache layer sitting between the raw tool/skill discovery and the LLM call:
last_used_at. On each promotion, if L1 exceeds the configured max, the least recently used L1 item is demoted to L2.This does not change how tools/skills that are not registered in the cache behave. An uncached tool (e.g. through local
skillfolder, inside the json file) always passes through normally. This ensures 100% backward compatibility.The LLM is provided with a single tool that allows it to do vector search of the L2 cache with a natural language description of what it needs. The recall of vector search can be improved over time with better algorithms.