Support reliable large prompt structured side-channel requests

## Problem

In `cbusillo/codex-skills`, we are building a rollout-memory evaluation harness that compares local and cloud models over large private prompt shards. The current `code llm request` path is the right shape for a strict side-channel model call because it supports structured output and does not behave like a full agent, but it only accepts `--message` as a command-line argument.

On macOS this hits `ARG_MAX` for larger prompt shards:

- Quarter shard can fit and validate.
- Half and three-quarter shards are blocked by argv size before they can test the model.

Using `code exec` is not a good replacement for this benchmark because it behaves like an agent: it may inspect files, recover from truncated prompt context, and produce agent progress text instead of a clean one-shot structured response.

## Why this matters

We need to run reliable, resumable model/budget comparisons without burning rate limit or getting misleading results. The desired behavior is a strict one-shot model request that can ingest large prompt content from a file or stdin and produce structured output, so downstream tooling can validate exact candidate coverage.

The important product need is not specifically `--message-file`; that is just one possible implementation. A better implementation might be stdin support, request-body file support, direct Responses API file plumbing, or another design that fits Code's architecture.

## Desired capability

A side-channel structured request command should support large message input without argv limits, while preserving the useful properties of `code llm request`:

- no agent/tool behavior
- no file inspection unless explicitly part of the prompt content
- compatible with `--schema-file` / strict structured output
- usable in scripts with deterministic stdout/stderr behavior
- able to return clear errors for context-limit, transport, or provider failures

Possible command shapes, purely illustrative:

```bash
code llm request --developer "..." --message-file prompt.txt --schema-file schema.json --format-strict --model gpt-5.4
```

or:

```bash
code llm request --developer "..." --message - --schema-file schema.json --format-strict --model gpt-5.4 < prompt.txt
```

## Evidence from current testing

The rollout-memory matrix harness recorded:

- `gpt-5.4 / quarter`: passed via `code llm request`, about 928k prompt chars.
- `gpt-5.4 / half`: blocked before model call by argv size, about 1.94M estimated argv chars on a host with `ARG_MAX=1048576`.
- `gpt-5.4 / three-quarter`: same transport block, about 2.96M estimated argv chars.

This prevents testing the model's actual long-context behavior even when the model may support it.

## Success criteria

- A script can send prompt content larger than host argv limits through a side-channel structured request.
- The output is a clean structured response suitable for JSON-schema validation.
- Failures distinguish provider/context/access errors from local transport limits.
- The implementation does not require using `code exec` or agent mode.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support reliable large prompt structured side-channel requests #336

Problem

Why this matters

Desired capability

Evidence from current testing

Success criteria

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support reliable large prompt structured side-channel requests #336

Description

Problem

Why this matters

Desired capability

Evidence from current testing

Success criteria

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions