Problem
In cbusillo/codex-skills, we are building a rollout-memory evaluation harness that compares local and cloud models over large private prompt shards. The current code llm request path is the right shape for a strict side-channel model call because it supports structured output and does not behave like a full agent, but it only accepts --message as a command-line argument.
On macOS this hits ARG_MAX for larger prompt shards:
- Quarter shard can fit and validate.
- Half and three-quarter shards are blocked by argv size before they can test the model.
Using code exec is not a good replacement for this benchmark because it behaves like an agent: it may inspect files, recover from truncated prompt context, and produce agent progress text instead of a clean one-shot structured response.
Why this matters
We need to run reliable, resumable model/budget comparisons without burning rate limit or getting misleading results. The desired behavior is a strict one-shot model request that can ingest large prompt content from a file or stdin and produce structured output, so downstream tooling can validate exact candidate coverage.
The important product need is not specifically --message-file; that is just one possible implementation. A better implementation might be stdin support, request-body file support, direct Responses API file plumbing, or another design that fits Code's architecture.
Desired capability
A side-channel structured request command should support large message input without argv limits, while preserving the useful properties of code llm request:
- no agent/tool behavior
- no file inspection unless explicitly part of the prompt content
- compatible with
--schema-file / strict structured output
- usable in scripts with deterministic stdout/stderr behavior
- able to return clear errors for context-limit, transport, or provider failures
Possible command shapes, purely illustrative:
code llm request --developer "..." --message-file prompt.txt --schema-file schema.json --format-strict --model gpt-5.4
or:
code llm request --developer "..." --message - --schema-file schema.json --format-strict --model gpt-5.4 < prompt.txt
Evidence from current testing
The rollout-memory matrix harness recorded:
gpt-5.4 / quarter: passed via code llm request, about 928k prompt chars.
gpt-5.4 / half: blocked before model call by argv size, about 1.94M estimated argv chars on a host with ARG_MAX=1048576.
gpt-5.4 / three-quarter: same transport block, about 2.96M estimated argv chars.
This prevents testing the model's actual long-context behavior even when the model may support it.
Success criteria
- A script can send prompt content larger than host argv limits through a side-channel structured request.
- The output is a clean structured response suitable for JSON-schema validation.
- Failures distinguish provider/context/access errors from local transport limits.
- The implementation does not require using
code exec or agent mode.
Problem
In
cbusillo/codex-skills, we are building a rollout-memory evaluation harness that compares local and cloud models over large private prompt shards. The currentcode llm requestpath is the right shape for a strict side-channel model call because it supports structured output and does not behave like a full agent, but it only accepts--messageas a command-line argument.On macOS this hits
ARG_MAXfor larger prompt shards:Using
code execis not a good replacement for this benchmark because it behaves like an agent: it may inspect files, recover from truncated prompt context, and produce agent progress text instead of a clean one-shot structured response.Why this matters
We need to run reliable, resumable model/budget comparisons without burning rate limit or getting misleading results. The desired behavior is a strict one-shot model request that can ingest large prompt content from a file or stdin and produce structured output, so downstream tooling can validate exact candidate coverage.
The important product need is not specifically
--message-file; that is just one possible implementation. A better implementation might be stdin support, request-body file support, direct Responses API file plumbing, or another design that fits Code's architecture.Desired capability
A side-channel structured request command should support large message input without argv limits, while preserving the useful properties of
code llm request:--schema-file/ strict structured outputPossible command shapes, purely illustrative:
code llm request --developer "..." --message-file prompt.txt --schema-file schema.json --format-strict --model gpt-5.4or:
Evidence from current testing
The rollout-memory matrix harness recorded:
gpt-5.4 / quarter: passed viacode llm request, about 928k prompt chars.gpt-5.4 / half: blocked before model call by argv size, about 1.94M estimated argv chars on a host withARG_MAX=1048576.gpt-5.4 / three-quarter: same transport block, about 2.96M estimated argv chars.This prevents testing the model's actual long-context behavior even when the model may support it.
Success criteria
code execor agent mode.