Skip to content

inference proxy: no end-to-end test coverage for the three truncation → SSE-error paths in route_inference_request #846

@mjamiv

Description

@mjamiv

Agent Diagnostic

Pointed an agent at the repo to validate follow-up coverage for PR #834 (fix(inference): prevent silent truncation of large streaming responses, merged 2026-04-14). Read crates/openshell-router/src/backend.rs::StreamingProxyResponse, crates/openshell-router/tests/backend_integration.rs, and crates/openshell-sandbox/src/proxy.rs::route_inference_request. Confirmed the gap by tracing the test mock down through next_chunk()StreamingBody::Buffered(Option<bytes::Bytes>) returns the entire body in exactly one next_chunk() call, then None. No per-chunk timing control, no mid-stream error injection, and no byte-cap exhaustion path reachable without a live reqwest backend. Verified against master 355d845.

Description

PR #834 added four code paths inside route_inference_request (crates/openshell-sandbox/src/proxy.rs:1263..1317) that each emit an SSE proxy_stream_error event before closing the chunked stream:

  1. Upstream byte limit exceeded (total_bytes > MAX_STREAMING_BODY) — line 1267
  2. Upstream next_chunk error (Ok(Err(e))) — line 1283
  3. Upstream next_chunk idle timeout (Err(_)) — line 1299
  4. Normal completion (Ok(Ok(None))) — line 1282 (the happy path)

The PR ships two new tests in crates/openshell-router/tests/backend_integration.rs (streaming_proxy_completes_despite_exceeding_route_timeout, buffered_proxy_enforces_route_timeout) that correctly assert the router-level timeout behaviour — the streaming path omits the total request timeout, and the buffered path still enforces it. Those are good tests.

But the three sandbox-level truncation → SSE injection round-trips added to route_inference_request — the byte-limit path, the upstream-error path, and the chunk-idle-timeout path — are not exercised end-to-end. The integration harness in crates/openshell-sandbox/tests/system_inference.rs uses InferenceContext with a mock route, which flows through Router::proxy_with_candidates_streamingStreamingProxyResponse::from_bufferedStreamingBody::Buffered. That body drains in one next_chunk() call and never reaches any of the three error paths.

Concretely, the helper that makes the test dead-end is in crates/openshell-router/src/backend.rs:

enum StreamingBody {
    /// Live upstream response — call `chunk().await` to read incrementally.
    Live(reqwest::Response),
    /// Pre-buffered body (e.g. from mock routes). Drained on first `next_chunk()`.
    Buffered(Option<bytes::Bytes>),
}

impl StreamingProxyResponse {
    pub async fn next_chunk(&mut self) -> Result<Option<bytes::Bytes>, RouterError> {
        match &mut self.body {
            StreamingBody::Live(response) => response.chunk().await.map_err(|e| {
                RouterError::UpstreamProtocol(format!("failed to read response chunk: {e}"))
            }),
            StreamingBody::Buffered(buf) => Ok(buf.take()),
        }
    }
}

There is no way to construct a StreamingProxyResponse whose next_chunk() returns a deliberate Err(...), hangs past CHUNK_IDLE_TIMEOUT, or yields multiple chunks whose total exceeds MAX_STREAMING_BODY — without standing up a real HTTP server.

Why it matters

The practical consequence is that the format_sse_error(...) emissions added by #834 — the operator-facing truncation signal clients rely on to detect dropped content — have no regression guard in CI. If a future refactor accidentally skips the let _ = write_all(...) before break in any of those three match arms, or re-introduces the BufWriter that the PR's second commit explicitly reverted, tests stay green.

I have also been catching related format_sse_error correctness gaps in a separate PR (#842) — that fix is unit-level, but a solid round-trip test would let both correctness classes be exercised together.

Reproduction Steps

  1. Check out master (git checkout 355d845).
  2. Read crates/openshell-router/src/backend.rs:58..83 — confirm StreamingBody::Buffered(Option<bytes::Bytes>) has no per-chunk control.
  3. Read crates/openshell-sandbox/src/proxy.rs:1263..1317 — identify the three truncation branches that each emit format_sse_error then break.
  4. Read crates/openshell-sandbox/tests/system_inference.rs — see that the integration harness builds InferenceContext from mock routes, which land on StreamingBody::Buffered and cannot reach any of the three branches.
  5. Try to write a test that asserts "client receives a parseable SSE proxy_stream_error event when the upstream errors mid-stream". It is not possible with the existing primitives.

Environment

  • Master 355d845
  • cargo 1.94.1, Linux x86_64
  • Static analysis / code inspection only; no runtime repro is needed to see the gap

Proposed fix

Add a mock-streaming variant to StreamingBody that drives the three paths deterministically:

#[derive(Debug)]
pub enum MockStep {
    /// Yield these bytes on the next `next_chunk()` call.
    Chunk(bytes::Bytes),
    /// Sleep this duration on the next call (to trigger the sandbox's
    /// `CHUNK_IDLE_TIMEOUT`).
    Delay(std::time::Duration),
    /// Fail the next call with this router error.
    Error(RouterError),
}

enum StreamingBody {
    Live(reqwest::Response),
    Buffered(Option<bytes::Bytes>),
    /// Test-only: deterministic chunk sequence with optional delays and errors.
    Mock(std::collections::VecDeque<MockStep>),
}

next_chunk() pops the head of the deque on each call. End of deque → Ok(None). This keeps the existing public API unchanged and gated behind #[cfg(test)] or a pub(crate) constructor if you prefer not to expose it in the public API. Then add one integration test in crates/openshell-sandbox/tests/system_inference.rs per branch:

  • truncation_on_byte_limit_injects_sse_error — yield chunks until MAX_STREAMING_BODY is exceeded, assert the client stream contains data: {"error":{"message":"response truncated: exceeded maximum streaming body size",...}}.
  • truncation_on_upstream_read_error_injects_sse_error — mock Err(...) mid-stream, assert data: {"error":{"message":"response truncated: upstream read error",...}}.
  • truncation_on_idle_timeout_injects_sse_error — mock Delay(CHUNK_IDLE_TIMEOUT + 1s), assert data: {"error":{"message":"response truncated: chunk idle timeout exceeded",...}}.

I have a working sketch on a local branch. Happy to convert this into a PR once a maintainer confirms the approach is acceptable — I want to check whether the mock variant should live under a #[cfg(test)] or a pub(crate) fn from_mock(...) constructor, since it technically lives in a non-test module.

Logs

n/a — this is a test-gap analysis, not a runtime bug.

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (read backend.rs, proxy.rs, system_inference.rs, inference.rs, traced the StreamingBody variants end-to-end)
  • My agent could not resolve this on its own — the diagnostic above explains the gap and proposes a concrete fix direction

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions