Skip to content

Support reliable large prompt structured side-channel requests #336

@cbusillo

Description

@cbusillo

Problem

In cbusillo/codex-skills, we are building a rollout-memory evaluation harness that compares local and cloud models over large private prompt shards. The current code llm request path is the right shape for a strict side-channel model call because it supports structured output and does not behave like a full agent, but it only accepts --message as a command-line argument.

On macOS this hits ARG_MAX for larger prompt shards:

  • Quarter shard can fit and validate.
  • Half and three-quarter shards are blocked by argv size before they can test the model.

Using code exec is not a good replacement for this benchmark because it behaves like an agent: it may inspect files, recover from truncated prompt context, and produce agent progress text instead of a clean one-shot structured response.

Why this matters

We need to run reliable, resumable model/budget comparisons without burning rate limit or getting misleading results. The desired behavior is a strict one-shot model request that can ingest large prompt content from a file or stdin and produce structured output, so downstream tooling can validate exact candidate coverage.

The important product need is not specifically --message-file; that is just one possible implementation. A better implementation might be stdin support, request-body file support, direct Responses API file plumbing, or another design that fits Code's architecture.

Desired capability

A side-channel structured request command should support large message input without argv limits, while preserving the useful properties of code llm request:

  • no agent/tool behavior
  • no file inspection unless explicitly part of the prompt content
  • compatible with --schema-file / strict structured output
  • usable in scripts with deterministic stdout/stderr behavior
  • able to return clear errors for context-limit, transport, or provider failures

Possible command shapes, purely illustrative:

code llm request --developer "..." --message-file prompt.txt --schema-file schema.json --format-strict --model gpt-5.4

or:

code llm request --developer "..." --message - --schema-file schema.json --format-strict --model gpt-5.4 < prompt.txt

Evidence from current testing

The rollout-memory matrix harness recorded:

  • gpt-5.4 / quarter: passed via code llm request, about 928k prompt chars.
  • gpt-5.4 / half: blocked before model call by argv size, about 1.94M estimated argv chars on a host with ARG_MAX=1048576.
  • gpt-5.4 / three-quarter: same transport block, about 2.96M estimated argv chars.

This prevents testing the model's actual long-context behavior even when the model may support it.

Success criteria

  • A script can send prompt content larger than host argv limits through a side-channel structured request.
  • The output is a clean structured response suitable for JSON-schema validation.
  • Failures distinguish provider/context/access errors from local transport limits.
  • The implementation does not require using code exec or agent mode.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions