Make Agents Cheaper

A Rust CLI and research toolkit for agent token-cost optimization.

The current focus is a Token Saver for coding agents: improve prompt-cache reuse at the harness layer, reduce paid uncached input, and keep task success measurable.

Phase 1 is Codex-first. The longer-term goal is to make the same cache-hit discipline useful for Claude Code, Cursor, custom agent runners, and multi-agent routers.

The idea is simple:

Do not remove context. Make repeated context cacheable.

In one sentence:

Make long-horizon coding agents cheaper by optimizing prompt-cache reuse outside the model.

What It Does

Audits local Codex config for cache-friendly provider, transport, session, and model settings.
Fingerprints prompt layers and tool schemas without printing private prompt text.
Normalizes Claude Code direct JSON or optional claude-trace logs into comparable JSONL.
Compares baseline vs cache-friendly runs with quality gates, per-slice tables, and paper-facing reports.
Generates reproducible pilot plans and bounded pilot runner scripts.

This is a harness-level counterpart to model-side efficiency work: model providers make token processing cheaper; this project tries to make repeated agent context cheaper to reuse. More background is in docs/project-positioning.md.

Product Map And Experimental Object

There are three related layers, but they should not be mixed:

make-agents-cheaper: the Rust audit/eval tool. This is the experiment and measurement engine. It fingerprints prompt layers, checks tool schema stability, analyzes cache breakpoints, records token usage, and compares baseline vs cache-friendly runs.
skills/cheaper-skill-for-claude: the reusable Claude Code skill adapter. A skill turns the method into instructions and runbooks that another agent can apply, but the skill itself is not the primary measurement instrument.
cheapcode or a future cheaper agent: a possible full agent harness that would own prompt assembly, tools, memory, and routing directly. This is a later product direction, not the current experimental object.

In current experiments, Codex is the development environment used to build the tooling and write the reports. The studied harness is Claude Code, and the backend model/provider in the current setup is MiMo, such as mimo-v2.5-pro. The paper should therefore describe the object of study as a Claude Code harness running on a MiMo-compatible model route, with make-agents-cheaper used as the audit/eval instrumentation.

So yes: experiments use the audit/eval layer, not the skill layer, as evidence. The skill layer lives in skills/ for reuse and deployment of the same cache-friendly discipline after the method has been made explicit and measurable.

Why This Can Be Cheap

Coding agents are expensive in long sessions because every turn can resend a large repeated prefix:

system and developer instructions
tool definitions and JSON schemas
repo rules such as AGENTS.md
stable project context
previous session and conversation identifiers

Prompt caching can make that repeated prefix cheaper, but only when the provider sees the same beginning of the request again. The cache is strict: similar text is not enough; the prefix has to stay stable enough to match.

make-agents-cheaper helps with the parts a user can control:

Stable provider: do not bounce the same task between providers or upstream keys.
Stable transport: prefer one agent path for the task, especially Responses API for Codex.
Stable session: WebSocket mode and session-aware routing make it easier for later turns to land near existing cache.
Stable model settings: model and reasoning effort changes can create different request buckets.
Stable static context: keep repeated rules and tool context stable; avoid injecting changing bridge text before it.

The savings come from the provider charging or processing cached input more cheaply than uncached input. This project does not hide context from the agent, truncate important instructions, or rewrite the model's task. It makes the official cache path easier to hit.

The rough mental model is:

same long prefix + same session route + compatible transport
  -> higher prompt-cache hit probability
  -> less repeated prefill work
  -> lower repeated-input cost and latency

In other words:

It reduces paid uncached input, not necessarily total input.

Fixed-Prefix Explainer

If you need to explain the method to a non-technical audience, the shortest version is:

We are not making tokens disappear. We are keeping the repeated beginning of each request stable, so the provider can recognize and reuse it through prompt cache. Then the expensive uncached part is mostly the smaller changing tail of the request.

The bad request shape puts changing state at the front:

request 1: current time A + tool state A + fixed system prompt + task
request 2: current time B + tool state B + fixed system prompt + task
request 3: current time C + tool state C + fixed system prompt + task

Even if most of the request is semantically similar, the early prefix has already changed. Strict prompt caches may not be able to reuse as much of the request.

The cache-friendly shape keeps the stable prefix first:

request 1: fixed system prompt + fixed tool description + current time A + task
request 2: fixed system prompt + fixed tool description + current time B + task
request 3: fixed system prompt + fixed tool description + current time C + task

Now the beginning is identical across requests, so more of the prefix can become cached input. Only the later changing portion needs to be paid as uncached input again.

In Chinese, the intuition is:

我们不是魔法般让 token 消失，而是把每次都一样的开头摆整齐，让模型服务端认出来并复用；这样重复的前缀变成 cached input，真正贵的是后面少量变化的 uncached input，所以花费会下降。

And the careful boundary is:

这个方法省的是未缓存输入成本，不是保证总 token 变少；前提是请求之间确实有一大段稳定前缀，而且服务商能观测并计费 prompt cache。

Claude Code Prefix-Cache Tutorial

Claude/Anthropic prompt caching should be explained at two levels:

API level: the official cumulative order is tools -> system -> messages.
Claude Code harness level: project context is a real assembled layer even though it is not an Anthropic API top-level field.

The important mechanism is:

A(B1+B2)PC  ->  AB1P(B2+C)

Where:

A  = tools
B1 = fixed system prompt / Claude Code fixed rules
B2 = dynamic system sections, such as cwd, env, git, memory paths
P  = project context, such as CLAUDE.md, repo rules, stable memory, skills context
C  = current task / current turn messages

The flag does not merely change parentheses. It moves volatile B2 from before the stable project context P to the later message tail, so a longer prefix A+B1+P can be reused.

This distinction matters. If the transformation were only:

A(B1+B2)C  ->  AB1(B2+C)

then the linear order would still be A -> B1 -> B2 -> C; the volatile B2 would still block everything after it. The useful transformation needs a stable project-context region between the dynamic system section and the current task:

A -> B1 -> B2 -> P -> C

becomes, in the cache-friendly target:

A -> B1 -> P -> B2 -> C

P is not an Anthropic API top-level field. It is a Claude Code harness concept: the stable project context that Claude Code assembles from files and runtime state, then serializes into the API system or messages content blocks. In practice, P is important because it can be large and stable: CLAUDE.md, repository rules, durable memory, and skill or workflow context can easily dominate the repeated prefix.

The source-shaped version below is condensed from a Claude Code-style rebuild and is included as a structural example. It shows why the baseline can be modeled as A(B1+B2)PC.

// systemContext contains dynamic local state such as git status.
const fullSystemPrompt = asSystemPrompt(
  appendSystemContext(systemPrompt, systemContext),
)

// userContext contains project context such as CLAUDE.md/currentDate and is
// prepended before the normal conversation messages.
messages: prependUserContext(messagesForQuery, userContext),
systemPrompt: fullSystemPrompt,

The relevant helpers have this shape:

function appendSystemContext(systemPrompt, context) {
  return [
    ...systemPrompt,
    Object.entries(context)
      .map(([key, value]) => `${key}: ${value}`)
      .join('\n'),
  ].filter(Boolean)
}

function prependUserContext(messages, context) {
  return [
    createUserMessage({
      content: `<system-reminder>
As you answer the user's questions, you can use the following context:
${Object.entries(context)
  .map(([key, value]) => `# ${key}\n${value}`)
  .join('\n')}
</system-reminder>`,
      isMeta: true,
    }),
    ...messages,
  ]
}

So in the observed Claude Code-style assembly, the baseline order is:

tools
-> fixed system prompt
-> dynamic systemContext, for example gitStatus
-> prepended userContext, for example CLAUDE.md/currentDate
-> conversation history and current task

That is the source-backed reason for the formula:

A(B1+B2)PC

When --exclude-dynamic-system-prompt-sections is used, Claude Code moves machine-local dynamic system sections into the first user message/messages area. The exact within-message ordering should be verified with raw request traces for the specific Claude Code release, but the cache-friendly target is:

AB1P(B2+C)

Read the claim in two strengths:

Weak claim, documented by the flag:
  B2 leaves system and moves into messages.
  This makes A+B1 more stable.

Strong claim, verified by request traces:
  B2 moves after P.
  This makes A+B1+P reusable.

The strong claim is the larger savings mechanism. It explains why the flag is more than a small system-prompt cleanup when project context is long and stable.

Prompt-cache blocks are API content blocks, not semantic labels like cwd or git. System strings are converted into text blocks:

function buildSystemPromptBlocks(systemPrompt, enablePromptCaching) {
  return splitSysPromptPrefix(systemPrompt).map(block => ({
    type: 'text',
    text: block.text,
    ...(enablePromptCaching && block.cacheScope !== null
      ? { cache_control: getCacheControl({ scope: block.cacheScope }) }
      : {}),
  }))
}

User strings are also converted into text blocks when cache control is applied:

function userMessageToMessageParam(message, addCache, enablePromptCaching) {
  if (addCache && typeof message.message.content === 'string') {
    return {
      role: 'user',
      content: [{
        type: 'text',
        text: message.message.content,
        ...(enablePromptCaching
          ? { cache_control: getCacheControl() }
          : {}),
      }],
    }
  }
}

The cache object is an accumulated prefix:

cache_prefix(block N) = tools + system + messages[0..N]

Therefore, if B2 changes before P, the prefix containing P changes too. Moving B2 later lets the stable, often large P participate in the reusable prefix. This increases the chance of higher cache hit rate and lower paid uncached input, but the claim still requires warm-up, measured calls, token accounting, and task validation.

To verify the strong version for a concrete Claude Code release, capture or normalize the raw request shape and check:

1. tools are byte-stable across the paired calls.
2. fixed system blocks are byte-stable.
3. dynamic cwd/env/git/memory sections are not inside the early system prefix.
4. project context P appears before the moved dynamic sections in the serialized
   prompt order, or at least before the cache breakpoint being tested.
5. token accounting exposes cached input and uncached input fields.
6. measured calls are compared after warm-up, and validation still passes.

Safe wording:

This improves the chance that a longer stable prefix is read from prompt cache
and reduces paid uncached input when provider accounting confirms it.

Avoid wording:

This always reduces total tokens.
This removes context from Claude.
This proves quality is unchanged without validation.

References:

Anthropic prompt caching documents the cumulative order tools -> system -> messages and content-block cache breakpoints: https://platform.claude.com/docs/en/build-with-claude/prompt-caching
Claude Code CLI documents --exclude-dynamic-system-prompt-sections as moving dynamic system-prompt sections into the first user message: https://code.claude.com/docs/en/cli-usage

Prefix-Cache Evidence Snapshot

The fixed V2 dynamic-drift diagnostic now supports the narrow prefix-cache claim: moving dynamic harness state later reduced paid uncached input while preserving task success.

Summary:

Cache hit rate improved from 91.66% to 97.67%.
Paid uncached input fell from 30,082 to 7,817 tokens (0.260x).
Observed cost fell from $0.366976 to $0.258237 (0.704x).
Validation and task success stayed at 3/3 vs 3/3.
Output tokens increased slightly, from 2,054 to 2,224, so tool-output optimization remains a separate future layer.

The earlier V2 mixed/negative pilot is retained as a regression case. Diagnosis found a behavioral outlier plus fixture Git-isolation leakage; after fixing absolute prompt paths and fixture-local Git state, the bounded 3-repeat diagnostic returned to the expected direction. This is an incremental prefix optimization: it reduces repeated paid input, but it does not guarantee that tool calls, output verbosity, or agent trajectory will become cheaper.

See docs/v2-prefix-fixed-diagnostic.md, docs/v2-regression-diagnosis.md, and docs/data/v2-prefix-fixed-diagnostic-summary.csv for the derived, commit-safe data. Raw run logs stay ignored under runs/.

Feature 1: Codex Cache-Hit Audit

Cache-aware routers can improve cache hits in the routing layer. This repository focuses on the missing client-side step:

Before blaming the router or model, verify that your local Codex config is actually cache-hit friendly.

The bundled Rust CLI is read-only by default. It inspects a Codex config.toml and reports:

whether the configured provider has a stable base_url
whether wire_api = "responses" is set
whether WebSocket mode is enabled when you expect long sessions
whether env_key is configured and present in the current shell
whether model and reasoning settings are stable enough for repeat sessions
whether the config looks likely to drift between providers or transport modes

It also prints HTTP and WebSocket configuration templates with placeholder router settings. Set MAKE_AGENTS_CHEAPER_EXPECTED_BASE_URL when you want the audit to verify a private endpoint without putting that endpoint in source control.

Quick Start

Install Or Run Locally

Prerequisites:

Rust toolchain with cargo.
Optional for paper builds: latexmk, pdflatex, bibtex, pdfinfo, and pdffonts.
Optional for Claude Code experiments: claude CLI.

On macOS, the CLI path is the same as Linux:

git clone https://github.com/3873225350/make-agents-cheaper.git
cd make-agents-cheaper
cargo test
cargo run --quiet -- --help

To install the binary from a local checkout:

cargo install --path .
make-agents-cheaper --help

Or install directly from GitHub:

cargo install --git https://github.com/3873225350/make-agents-cheaper.git
make-agents-cheaper --help

macOS release binaries and a Homebrew formula template are tracked for tagged releases; see docs/release.md.

Workflow 1: Audit Codex Config

Ask Codex:

Use $make-agents-cheaper to inspect my Codex config and tell me whether it is prompt-cache friendly.

Or run the CLI directly:

cargo run --quiet

Run explicit Codex config audit:

cargo run --quiet -- audit --config ~/.codex/config.toml

Print the recommended WebSocket template:

cargo run --quiet -- --print-ws-config

Print the simpler HTTP template:

cargo run --quiet -- --print-http-config

Inspect a custom config path:

cargo run --quiet -- --config /path/to/config.toml

Workflow 2: Compare Existing Run Logs

Try the bundled sanitized real example first:

cargo run --quiet -- eval \
  --baseline examples/baseline.jsonl \
  --candidate examples/cache-friendly.jsonl

cargo run --quiet -- task-report \
  --baseline examples/baseline.jsonl \
  --candidate examples/cache-friendly.jsonl

The default examples/ pair is derived from a real Claude Code + MiMo paired-drift run with raw trace paths removed. A fixed V2 diagnostic pair is also available as examples/v2-fixed-diagnostic-baseline.jsonl and examples/v2-fixed-diagnostic-cache-friendly.jsonl. The earlier mixed/negative V2 pilot remains available as examples/v2-mixed-baseline.jsonl and examples/v2-mixed-cache-friendly.jsonl for regression analysis.

Then use the same commands on normalized benchmark records:

cargo run --quiet -- eval \
  --baseline runs/<experiment>/baseline.jsonl \
  --candidate runs/<experiment>/cache-friendly.jsonl

cargo run --quiet -- task-report \
  --baseline runs/<experiment>/baseline.jsonl \
  --candidate runs/<experiment>/cache-friendly.jsonl

cargo run --quiet -- analysis-report \
  --baseline runs/<experiment>/baseline.jsonl \
  --candidate runs/<experiment>/cache-friendly.jsonl \
  --output runs/<experiment>/analysis-report.md

Extract tool-claimed code changes from a session or tool-history JSONL without reading the current worktree:

cargo run --quiet -- evidence-diff \
  --input runs/<experiment>/raw/session.jsonl \
  --output runs/<experiment>/code-changes.json

Read the result conservatively:

A win requires lower uncached input and no task-success regression.
Warm-up calls should stay out of measured JSONL.
If cache_accounting_observable=false, do not claim token-cost savings from that row.
Lower output tokens or fewer total tokens are not the main claim.

Workflow 3: Plan A Claude Code Pilot

Generate a reproducible experiment directory and a paired command plan from the V2 manifest:

cargo run --quiet -- init-experiment --dir runs/<date>-claude-mimo-real-coding-v2-pilot

cargo run --quiet -- pilot-plan \
  --manifest docs/task-suites/real-coding-ablation-v2.manifest.json \
  --task docs-token-accounting \
  --experiment-dir runs/<date>-claude-mimo-real-coding-v2-pilot \
  --slice dynamic-drift \
  --repeats 1

The generated plan prints the prompt file, warm-up calls, measured calls, validation logs, direct Claude JSON capture path, claude-json-import, eval, task-report, and analysis-report commands.

To generate a runnable script instead of only printing the plan:

cargo run --quiet -- run-pilot \
  --manifest docs/task-suites/real-coding-ablation-v2.manifest.json \
  --task docs-token-accounting \
  --experiment-dir runs/<date>-claude-mimo-real-coding-v2-pilot \
  --slice dynamic-drift \
  --repeats 1

By default, run-pilot only writes runs/<experiment>/notes/run-pilot.sh. Execute it manually with bash, or pass --execute true when you intentionally want the CLI to call Claude. Running it may incur model cost.

The V2 pilot manifest points at runs/fixtures/real-coding-v2, which is intentionally ignored as local experiment state. Fresh clones can use examples/ immediately; Claude pilot execution requires creating or restoring that fixture first.

Command Reference

These commands are the first executable pieces of the portable cache-hit layer for existing agents.

Fingerprint prompt or harness layers without printing private prompt text:

cargo run --quiet -- fingerprint --input layers.json
cargo run --quiet -- fingerprint --input current-layers.json --previous previous-layers.json

Inspect tool schema stability:

cargo run --quiet -- tool-schema --input tools.json
cargo run --quiet -- tool-schema --input current-tools.json --previous previous-tools.json

Inspect explicit cache_control breakpoint placement:

cargo run --quiet -- breakpoints --input request.json

Compare baseline and cache-friendly benchmark records:

cargo run --quiet -- eval --baseline baseline.jsonl --candidate cache-friendly.jsonl

Print per-task token usage:

cargo run --quiet -- task-report --baseline baseline.jsonl --candidate cache-friendly.jsonl

Write paper-facing Markdown tables and interpretation guardrails:

cargo run --quiet -- analysis-report \
  --baseline baseline.jsonl \
  --candidate cache-friendly.jsonl \
  --output runs/exp/analysis-report.md

Normalize direct Claude Code JSON output into the eval schema:

cargo run --quiet -- claude-json-import \
  --input runs/exp/raw/claude-json/run-1.json \
  --run-id run-1 \
  --task-id docs-token-accounting \
  --condition cache-friendly \
  --slice dynamic-drift \
  --repeat-id 1 \
  --phase measured \
  --output runs/exp/cache-friendly.jsonl \
  --validation-path runs/exp/validation/run-1.txt \
  --validation-passed true

Optional: if a raw claude-trace JSONL file exists, normalize it into the eval schema and request/layer/tool artifacts:

cargo run --quiet -- trace-import \
  --input runs/exp/raw/claude-trace/run-1.jsonl \
  --run-id run-1 \
  --task-id docs-token-accounting \
  --condition baseline \
  --slice dynamic-drift \
  --repeat-id 1 \
  --phase measured \
  --output runs/exp/baseline.jsonl \
  --artifacts-dir runs/exp \
  --validation-path runs/exp/validation/run-1.txt \
  --validation-passed true

The current roadmap uses direct Claude JSON as the default evidence path. It preserves usage/cost/validation accounting, but it cannot prove request-shape facts such as system/tool/message ordering.

Compare with provider prices, expressed as USD per million tokens:

cargo run --quiet -- eval \
  --baseline baseline.jsonl \
  --candidate cache-friendly.jsonl \
  --uncached-input-per-mtok <USD> \
  --cached-input-per-mtok <USD> \
  --output-per-mtok <USD>

Print a cache-aware compact / reactivation template:

cargo run --quiet -- compact-template

The expected JSONL benchmark record format is documented in docs/evaluation-metrics.md.

Initialize a reproducible experiment log directory:

cargo run --quiet -- init-experiment --dir runs/2026-05-09-claude-mimo-cache

Generate a paired pilot command plan from the V2 task manifest:

cargo run --quiet -- pilot-plan \
  --manifest docs/task-suites/real-coding-ablation-v2.manifest.json \
  --task docs-token-accounting \
  --experiment-dir runs/2026-05-09-claude-mimo-real-coding-v2-pilot \
  --slice dynamic-drift \
  --repeats 1

Generate the full task-matrix command plan:

cargo run --quiet -- matrix-plan \
  --manifest docs/task-suites/real-coding-ablation-v2.manifest.json \
  --experiment-dir runs/2026-05-09-claude-mimo-real-coding-v2-full \
  --repeats 3

Full protocol: docs/evaluation-protocol.md.

Recommended Codex WebSocket Config

Use this when you want stronger long-session continuity:

model_provider = "cache_router"
model = "gpt-5.4"
model_reasoning_effort = "xhigh"
plan_mode_reasoning_effort = "xhigh"
model_reasoning_summary = "none"
model_verbosity = "medium"
approval_policy = "never"
sandbox_mode = "danger-full-access"
suppress_unstable_features_warning = true

[model_providers.cache_router]
name = "OpenAI"
base_url = "https://router.example/v1"
wire_api = "responses"
requires_openai_auth = false
env_key = "CACHE_ROUTER_API_KEY"
supports_websockets = true

[features]
responses_websockets_v2 = true

Recommended Codex HTTP Config

Use this when you prefer a simpler, broadly compatible setup:

model_provider = "cache_router"
model = "gpt-5.4"
model_reasoning_effort = "xhigh"
plan_mode_reasoning_effort = "xhigh"
model_reasoning_summary = "none"
model_verbosity = "medium"
approval_policy = "never"
sandbox_mode = "danger-full-access"

[model_providers.cache_router]
name = "OpenAI"
base_url = "https://router.example/v1"
wire_api = "responses"
requires_openai_auth = false
env_key = "CACHE_ROUTER_API_KEY"

Cheapness Checklist

Keep static instructions, tool schemas, and repo rules stable.
Avoid switching providers, models, or transport modes mid-task.
Prefer Responses API for Codex-style workflows.
Use WebSocket mode for long interactive sessions when available.
Keep session and conversation continuity intact.
Put dynamic task details after stable context when you control prompt layout.
Do not chase artificial cache metrics by rewriting request semantics.

What It Does Not Do

It does not make every token cheap.
It does not train or fine-tune a model.
It does not cache model outputs or replay old answers.
It does not share cache across organizations.
It does not mutate ~/.codex/config.toml unless a future command explicitly implements that and you ask for it.
It does not print API keys.
It does not claim support for every agent yet; Codex is the first supported target.

Roadmap

Phase 1: Codex config audit and cache-aware router-friendly templates.
Phase 2: prefix fingerprinting, tool-schema drift checks, breakpoint analysis, benchmark comparison, and cache-aware compact templates.
Phase 3: package reusable agent skills for Codex-first workflows, then Claude Code and Cursor cache-friendliness checks where reliable local signals exist.
Phase 4: router and multi-agent workflow diagnostics.

Technical Report And Evaluation

LaTeX report: paper/main.tex
Evaluation metric spec: docs/evaluation-metrics.md
Full experiment protocol: docs/evaluation-protocol.md
Paired ablation runbook: docs/paired-ablation-runbook.md
Project positioning and origin: docs/project-positioning.md
First task-suite dataset: docs/task-suites/claude-cache-ablation-v1.md
Real coding-task suite: docs/task-suites/real-coding-ablation-v1.md
Phenomena analysis log: docs/phenomena-analysis.md
MiMo token accounting note: docs/mimo-token-accounting.md
V2 fixed prefix diagnostic: docs/v2-prefix-fixed-diagnostic.md
V2 direct-json regression snapshot: docs/v2-direct-json-pilot.md
V2 regression diagnosis: docs/v2-regression-diagnosis.md
RTK inspiration and next runtime layer: docs/rtk-inspiration.md
Session/tool evidence diff: docs/evidence-diff.md
Release and Homebrew plan: docs/release.md
Long-term task plan: taskplan/roadmap.md

The evaluation goal is not to show fewer total tokens. It is to show:

cached tokens go up
uncached paid input goes down
observed or estimated cost goes down when output/tool behavior is comparable
latency does not regress
task success does not regress

Build

cargo build --release

The binary will be available at:

target/release/make-agents-cheaper

Run validation:

cargo test

Install As A Skill

Ask Codex:

Install the make-agents-cheaper skill from https://github.com/3873225350/make-agents-cheaper

Or clone/copy this folder into your Codex skills directory as make-agents-cheaper.

Privacy And Safety

Report mode does not write files. It prints only configuration health and hides environment variable values. It never prints API keys.

If you share reports publicly, review local paths and provider names first.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.claude/skills/claude-trace-recovery		.claude/skills/claude-trace-recovery
.github/workflows		.github/workflows
docs		docs
examples		examples
packaging/homebrew		packaging/homebrew
paper		paper
references		references
runs		runs
skills/cheaper-skill-for-claude		skills/cheaper-skill-for-claude
src		src
taskplan		taskplan
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Make Agents Cheaper

What It Does

Product Map And Experimental Object

Why This Can Be Cheap

Fixed-Prefix Explainer

Claude Code Prefix-Cache Tutorial

Prefix-Cache Evidence Snapshot

Feature 1: Codex Cache-Hit Audit

Quick Start

Install Or Run Locally

Workflow 1: Audit Codex Config

Workflow 2: Compare Existing Run Logs

Workflow 3: Plan A Claude Code Pilot

Command Reference

Recommended Codex WebSocket Config

Recommended Codex HTTP Config

Cheapness Checklist

What It Does Not Do

Roadmap

Technical Report And Evaluation

Build

Install As A Skill

Privacy And Safety

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Make Agents Cheaper

What It Does

Product Map And Experimental Object

Why This Can Be Cheap

Fixed-Prefix Explainer

Claude Code Prefix-Cache Tutorial

Prefix-Cache Evidence Snapshot

Feature 1: Codex Cache-Hit Audit

Quick Start

Install Or Run Locally

Workflow 1: Audit Codex Config

Workflow 2: Compare Existing Run Logs

Workflow 3: Plan A Claude Code Pilot

Command Reference

Recommended Codex WebSocket Config

Recommended Codex HTTP Config

Cheapness Checklist

What It Does Not Do

Roadmap

Technical Report And Evaluation

Build

Install As A Skill

Privacy And Safety

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages