Agent Diagnostic
Pointed an agent at the repo to validate follow-up coverage for PR #834 (fix(inference): prevent silent truncation of large streaming responses, merged 2026-04-14). Read crates/openshell-router/src/backend.rs::StreamingProxyResponse, crates/openshell-router/tests/backend_integration.rs, and crates/openshell-sandbox/src/proxy.rs::route_inference_request. Confirmed the gap by tracing the test mock down through next_chunk() — StreamingBody::Buffered(Option<bytes::Bytes>) returns the entire body in exactly one next_chunk() call, then None. No per-chunk timing control, no mid-stream error injection, and no byte-cap exhaustion path reachable without a live reqwest backend. Verified against master 355d845.
Description
PR #834 added four code paths inside route_inference_request (crates/openshell-sandbox/src/proxy.rs:1263..1317) that each emit an SSE proxy_stream_error event before closing the chunked stream:
- Upstream byte limit exceeded (
total_bytes > MAX_STREAMING_BODY) — line 1267
- Upstream
next_chunk error (Ok(Err(e))) — line 1283
- Upstream
next_chunk idle timeout (Err(_)) — line 1299
- Normal completion (
Ok(Ok(None))) — line 1282 (the happy path)
The PR ships two new tests in crates/openshell-router/tests/backend_integration.rs (streaming_proxy_completes_despite_exceeding_route_timeout, buffered_proxy_enforces_route_timeout) that correctly assert the router-level timeout behaviour — the streaming path omits the total request timeout, and the buffered path still enforces it. Those are good tests.
But the three sandbox-level truncation → SSE injection round-trips added to route_inference_request — the byte-limit path, the upstream-error path, and the chunk-idle-timeout path — are not exercised end-to-end. The integration harness in crates/openshell-sandbox/tests/system_inference.rs uses InferenceContext with a mock route, which flows through Router::proxy_with_candidates_streaming → StreamingProxyResponse::from_buffered → StreamingBody::Buffered. That body drains in one next_chunk() call and never reaches any of the three error paths.
Concretely, the helper that makes the test dead-end is in crates/openshell-router/src/backend.rs:
enum StreamingBody {
/// Live upstream response — call `chunk().await` to read incrementally.
Live(reqwest::Response),
/// Pre-buffered body (e.g. from mock routes). Drained on first `next_chunk()`.
Buffered(Option<bytes::Bytes>),
}
impl StreamingProxyResponse {
pub async fn next_chunk(&mut self) -> Result<Option<bytes::Bytes>, RouterError> {
match &mut self.body {
StreamingBody::Live(response) => response.chunk().await.map_err(|e| {
RouterError::UpstreamProtocol(format!("failed to read response chunk: {e}"))
}),
StreamingBody::Buffered(buf) => Ok(buf.take()),
}
}
}
There is no way to construct a StreamingProxyResponse whose next_chunk() returns a deliberate Err(...), hangs past CHUNK_IDLE_TIMEOUT, or yields multiple chunks whose total exceeds MAX_STREAMING_BODY — without standing up a real HTTP server.
Why it matters
The practical consequence is that the format_sse_error(...) emissions added by #834 — the operator-facing truncation signal clients rely on to detect dropped content — have no regression guard in CI. If a future refactor accidentally skips the let _ = write_all(...) before break in any of those three match arms, or re-introduces the BufWriter that the PR's second commit explicitly reverted, tests stay green.
I have also been catching related format_sse_error correctness gaps in a separate PR (#842) — that fix is unit-level, but a solid round-trip test would let both correctness classes be exercised together.
Reproduction Steps
- Check out master (
git checkout 355d845).
- Read
crates/openshell-router/src/backend.rs:58..83 — confirm StreamingBody::Buffered(Option<bytes::Bytes>) has no per-chunk control.
- Read
crates/openshell-sandbox/src/proxy.rs:1263..1317 — identify the three truncation branches that each emit format_sse_error then break.
- Read
crates/openshell-sandbox/tests/system_inference.rs — see that the integration harness builds InferenceContext from mock routes, which land on StreamingBody::Buffered and cannot reach any of the three branches.
- Try to write a test that asserts "client receives a parseable SSE
proxy_stream_error event when the upstream errors mid-stream". It is not possible with the existing primitives.
Environment
- Master
355d845
cargo 1.94.1, Linux x86_64
- Static analysis / code inspection only; no runtime repro is needed to see the gap
Proposed fix
Add a mock-streaming variant to StreamingBody that drives the three paths deterministically:
#[derive(Debug)]
pub enum MockStep {
/// Yield these bytes on the next `next_chunk()` call.
Chunk(bytes::Bytes),
/// Sleep this duration on the next call (to trigger the sandbox's
/// `CHUNK_IDLE_TIMEOUT`).
Delay(std::time::Duration),
/// Fail the next call with this router error.
Error(RouterError),
}
enum StreamingBody {
Live(reqwest::Response),
Buffered(Option<bytes::Bytes>),
/// Test-only: deterministic chunk sequence with optional delays and errors.
Mock(std::collections::VecDeque<MockStep>),
}
next_chunk() pops the head of the deque on each call. End of deque → Ok(None). This keeps the existing public API unchanged and gated behind #[cfg(test)] or a pub(crate) constructor if you prefer not to expose it in the public API. Then add one integration test in crates/openshell-sandbox/tests/system_inference.rs per branch:
truncation_on_byte_limit_injects_sse_error — yield chunks until MAX_STREAMING_BODY is exceeded, assert the client stream contains data: {"error":{"message":"response truncated: exceeded maximum streaming body size",...}}.
truncation_on_upstream_read_error_injects_sse_error — mock Err(...) mid-stream, assert data: {"error":{"message":"response truncated: upstream read error",...}}.
truncation_on_idle_timeout_injects_sse_error — mock Delay(CHUNK_IDLE_TIMEOUT + 1s), assert data: {"error":{"message":"response truncated: chunk idle timeout exceeded",...}}.
I have a working sketch on a local branch. Happy to convert this into a PR once a maintainer confirms the approach is acceptable — I want to check whether the mock variant should live under a #[cfg(test)] or a pub(crate) fn from_mock(...) constructor, since it technically lives in a non-test module.
Logs
n/a — this is a test-gap analysis, not a runtime bug.
Agent-First Checklist
Agent Diagnostic
Pointed an agent at the repo to validate follow-up coverage for PR #834 (
fix(inference): prevent silent truncation of large streaming responses, merged 2026-04-14). Readcrates/openshell-router/src/backend.rs::StreamingProxyResponse,crates/openshell-router/tests/backend_integration.rs, andcrates/openshell-sandbox/src/proxy.rs::route_inference_request. Confirmed the gap by tracing the test mock down throughnext_chunk()—StreamingBody::Buffered(Option<bytes::Bytes>)returns the entire body in exactly onenext_chunk()call, thenNone. No per-chunk timing control, no mid-stream error injection, and no byte-cap exhaustion path reachable without a live reqwest backend. Verified against master355d845.Description
PR #834 added four code paths inside
route_inference_request(crates/openshell-sandbox/src/proxy.rs:1263..1317) that each emit an SSEproxy_stream_errorevent before closing the chunked stream:total_bytes > MAX_STREAMING_BODY) — line 1267next_chunkerror (Ok(Err(e))) — line 1283next_chunkidle timeout (Err(_)) — line 1299Ok(Ok(None))) — line 1282 (the happy path)The PR ships two new tests in
crates/openshell-router/tests/backend_integration.rs(streaming_proxy_completes_despite_exceeding_route_timeout,buffered_proxy_enforces_route_timeout) that correctly assert the router-level timeout behaviour — the streaming path omits the total request timeout, and the buffered path still enforces it. Those are good tests.But the three sandbox-level truncation → SSE injection round-trips added to
route_inference_request— the byte-limit path, the upstream-error path, and the chunk-idle-timeout path — are not exercised end-to-end. The integration harness incrates/openshell-sandbox/tests/system_inference.rsusesInferenceContextwith a mock route, which flows throughRouter::proxy_with_candidates_streaming→StreamingProxyResponse::from_buffered→StreamingBody::Buffered. That body drains in onenext_chunk()call and never reaches any of the three error paths.Concretely, the helper that makes the test dead-end is in
crates/openshell-router/src/backend.rs:There is no way to construct a
StreamingProxyResponsewhosenext_chunk()returns a deliberateErr(...), hangs pastCHUNK_IDLE_TIMEOUT, or yields multiple chunks whose total exceedsMAX_STREAMING_BODY— without standing up a real HTTP server.Why it matters
The practical consequence is that the
format_sse_error(...)emissions added by #834 — the operator-facing truncation signal clients rely on to detect dropped content — have no regression guard in CI. If a future refactor accidentally skips thelet _ = write_all(...)beforebreakin any of those three match arms, or re-introduces theBufWriterthat the PR's second commit explicitly reverted, tests stay green.I have also been catching related
format_sse_errorcorrectness gaps in a separate PR (#842) — that fix is unit-level, but a solid round-trip test would let both correctness classes be exercised together.Reproduction Steps
git checkout 355d845).crates/openshell-router/src/backend.rs:58..83— confirmStreamingBody::Buffered(Option<bytes::Bytes>)has no per-chunk control.crates/openshell-sandbox/src/proxy.rs:1263..1317— identify the three truncation branches that each emitformat_sse_errorthenbreak.crates/openshell-sandbox/tests/system_inference.rs— see that the integration harness buildsInferenceContextfrom mock routes, which land onStreamingBody::Bufferedand cannot reach any of the three branches.proxy_stream_errorevent when the upstream errors mid-stream". It is not possible with the existing primitives.Environment
355d845cargo 1.94.1, Linux x86_64Proposed fix
Add a mock-streaming variant to
StreamingBodythat drives the three paths deterministically:next_chunk()pops the head of the deque on each call. End of deque →Ok(None). This keeps the existing public API unchanged and gated behind#[cfg(test)]or apub(crate)constructor if you prefer not to expose it in the public API. Then add one integration test incrates/openshell-sandbox/tests/system_inference.rsper branch:truncation_on_byte_limit_injects_sse_error— yield chunks untilMAX_STREAMING_BODYis exceeded, assert the client stream containsdata: {"error":{"message":"response truncated: exceeded maximum streaming body size",...}}.truncation_on_upstream_read_error_injects_sse_error— mockErr(...)mid-stream, assertdata: {"error":{"message":"response truncated: upstream read error",...}}.truncation_on_idle_timeout_injects_sse_error— mockDelay(CHUNK_IDLE_TIMEOUT + 1s), assertdata: {"error":{"message":"response truncated: chunk idle timeout exceeded",...}}.I have a working sketch on a local branch. Happy to convert this into a PR once a maintainer confirms the approach is acceptable — I want to check whether the mock variant should live under a
#[cfg(test)]or apub(crate) fn from_mock(...)constructor, since it technically lives in a non-test module.Logs
n/a — this is a test-gap analysis, not a runtime bug.
Agent-First Checklist
backend.rs,proxy.rs,system_inference.rs,inference.rs, traced theStreamingBodyvariants end-to-end)