Skip to content

fix: prevent context blowup by preserving previous_response_id on bridge recovery - IMPORTANT FIX FOR CONTEXT BLOWING UP#397

Merged
Soju06 merged 7 commits intoSoju06:mainfrom
balakumardev:fix/http-bridge-context-blowup
Apr 14, 2026
Merged

fix: prevent context blowup by preserving previous_response_id on bridge recovery - IMPORTANT FIX FOR CONTEXT BLOWING UP#397
Soju06 merged 7 commits intoSoju06:mainfrom
balakumardev:fix/http-bridge-context-blowup

Conversation

@balakumardev
Copy link
Copy Markdown
Contributor

@balakumardev balakumardev commented Apr 13, 2026

Problem

When an upstream websocket session is lost (session.closed) or a request fails mid-flight, the HTTP bridge recovery paths introduced in v1.12.0 return 400 previous_response_not_found errors. This tells the Codex CLI that the server-side conversation state is gone, causing it to drop previous_response_id and resend the entire conversation history as input.

The result is catastrophic for context window usage:

Metric Before (v1.11.x) After (v1.12.0)
Per-turn growth ~2-3K tokens ~70K tokens
Turns to fill 876K window 200+ 17
Compactions per session 0 2+

Root Cause

Three recovery paths in the bridge return 400 previous_response_not_found instead of 502 upstream_unavailable:

  1. Mid-request websocket failure - when the upstream websocket dies during send() and previous_response_id is present, the error handler returns 400 previous_response_not_found instead of 502 upstream_unavailable.

  2. _retry_http_bridge_request_on_fresh_upstream - when called with send_request=True and previous_response_id is present, immediately marks the request as previous_response_not_found without attempting recovery.

  3. _retry_http_bridge_precreated_request - same pattern.

When the CLI receives 400 previous_response_not_found, it interprets this as "the server doesn't have this conversation anymore" and resends the full conversation history without previous_response_id, inflating per-turn context by ~20x.

Fix

  • When session.closed and previous_response_id is present, raise 502 immediately (matching pre-v1.12.0 behaviour) so the client retries with previous_response_id intact.
  • In error handler, always raise 502 instead of 400 previous_response_not_found.
  • In retry helpers, return False without marking previous_response_not_found so callers raise a retriable 502.

Test Results

7 regression tests added in test_bridge_context_blowup.py. Ran against both broken and fixed code:

Test Broken (pre-fix) Fixed
test_mid_request_failure_with_previous_response_id_raises_502 FAIL (returns 400) PASS (returns 502)
test_retry_with_previous_response_id_returns_false_without_marking_error FAIL (sets previous_response_not_found) PASS
test_closed_session_with_previous_response_id_raises_502 PASS PASS
test_closed_session_without_previous_response_id_attempts_recovery PASS PASS
test_healthy_growth_rate_stays_within_budget PASS PASS
test_broken_growth_rate_fills_window_fast PASS PASS
test_error_code_determines_growth_rate PASS PASS

All 44 existing test_proxy_http_bridge.py tests pass with zero regressions.

Live Verification

Tested on a single-instance deployment with gpt-5.4 (876K effective context window):

  • v1.12.0 (broken): 4 user messages -> 853K tokens in 17 turns -> compaction triggered, avg 70K tok/turn
  • v1.11.x (reverted): 4 user messages -> ~80K tokens in 17 turns -> no compaction, avg 518 tok/turn

@balakumardev balakumardev changed the title fix: prevent context blowup by preserving previous_response_id on bridge recovery fix: prevent context blowup by preserving previous_response_id on bridge recovery - IMPORTANT FIX FOR CONTEXT BLOWING UP Apr 13, 2026
balakumardev and others added 3 commits April 14, 2026 10:51
…dge recovery

When an upstream websocket session is lost (session.closed) or a request
fails mid-flight, the HTTP bridge recovery paths were returning 400
previous_response_not_found errors. This tells the Codex CLI that the
server-side conversation state is gone, causing it to drop
previous_response_id and resend the entire conversation history as input.

The result is catastrophic for context window usage: per-turn input
growth jumps from ~2-3K tokens (with previous_response_id) to ~70K
tokens (full conversation replay), filling the 876K context window in
just 17 turns instead of 200+.

Root cause analysis:
- _submit_http_bridge_request: when session.closed is True, the new code
  attempts to reconnect the upstream websocket. But the reconnected
  session is a fresh server-side session that doesn't recognise the old
  previous_response_id. When the upstream returns 'not found', the
  bridge marks the request as previous_response_not_found (400) instead
  of upstream_unavailable (502).
- _retry_http_bridge_request_on_fresh_upstream: when called with
  send_request=True and previous_response_id is present, immediately
  marks the request as previous_response_not_found without even trying.
- _retry_http_bridge_precreated_request: same pattern.

Fix:
- When session.closed and previous_response_id is present, raise 502
  immediately (matching pre-v1.12.0 behaviour) so the client retries
  with previous_response_id intact on a new session.
- In error handler, always raise 502 instead of 400
  previous_response_not_found so the client treats the failure as
  transient and retries properly.
- In retry helpers, return False without marking
  previous_response_not_found so callers can raise a retriable 502.

Before: 4 user messages -> 853K tokens in 17 turns -> compaction
After:  4 user messages -> ~80K tokens in 17 turns -> no compaction
Proves the bug exists on broken code and is fixed:

BROKEN (pre-fix):
  - test_mid_request_failure_with_previous_response_id_raises_502: FAIL
    (returns 400 previous_response_not_found instead of 502)
  - test_retry_with_previous_response_id_returns_false_without_marking_error: FAIL
    (sets error_code_override to previous_response_not_found)

FIXED:
  - All 7 tests PASS
  - 44 existing bridge tests still PASS (zero regressions)

Test coverage:
  - Closed session + previous_response_id -> 502 (not 400)
  - Closed session without previous_response_id -> recovery attempted
  - Mid-request websocket failure + previous_response_id -> 502 (not 400)
  - Retry helper preserves previous_response_id (no error override)
  - Context growth modelling: healthy (2.3K/turn) vs broken (70K/turn)
  - Error code semantics: 502 vs 400 determines CLI behavior
- P1: Allow closed-session reconnect (send_request=False) before raising
  502, so transient between-turn disconnects can still recover
- P2: In _retry_http_bridge_request_on_fresh_upstream, reconnect but skip
  send when previous_response_id is present (effective_send flag)
- Update integration tests: 400 previous_response_not_found -> 502
  upstream_unavailable/stream_incomplete/upstream_request_timeout
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented Apr 14, 2026

Thank you for this important fix @balakumardev! 🎉

The context blowup issue is a critical regression — great catch with the session log analysis showing 70K vs 2.3K tokens/turn.

I've rebased your changes on main and applied Codex adversarial review fixes (preserving the reconnect recovery path for between-turn disconnects). The reviewed version is in #401.

Closing this in favor of #401 which includes your original fix + review improvements.

@Soju06 Soju06 closed this Apr 14, 2026
@Soju06 Soju06 reopened this Apr 14, 2026
@Soju06 Soju06 force-pushed the fix/http-bridge-context-blowup branch from f562ab9 to 9a9004a Compare April 14, 2026 02:14
Soju06 added 4 commits April 14, 2026 11:14
…vious_response_id

- P1: Reconnect-only retry returned True leaving request stranded; now
  returns False so caller raises retriable 502 immediately
- P3: Fix test mock send -> send_text to match production code path
…signal

Send failure on owner-side bridge now raises bridge_owner_unreachable
instead of generic upstream_unavailable, so non-owner replicas trigger
local rebind via _http_bridge_should_attempt_local_previous_response_recovery.
@Soju06 Soju06 merged commit 85802e6 into Soju06:main Apr 14, 2026
18 checks passed
@Soju06
Copy link
Copy Markdown
Owner

Soju06 commented Apr 14, 2026

Thank you for catching this critical context blowup regression @balakumardev! 🎉

The session log analysis (2.3K vs 70K tokens/turn) was incredibly helpful in diagnosing the root cause. The Codex adversarial review identified some edge cases around multi-replica recovery and reconnect paths which were resolved together.

@all-contributors please add @balakumardev for code and test

@allcontributors
Copy link
Copy Markdown
Contributor

@Soju06

I've put up a pull request to add @balakumardev! 🎉

xirothedev added a commit to xirothedev/codex-lb that referenced this pull request Apr 24, 2026
Brings upstream Soju06/codex-lb v1.15.0 into fork:
- feat(proxy): GPT-5.5 + GPT-5.5 Pro model support (Soju06#477)
- fix(proxy): prevent admission semaphore leak + raise concurrency (Soju06#466)
- fix(proxy): inject session-level previous_response_id (Soju06#456)
- fix(proxy): harden continuity recovery + WS replay + bridge lifecycle (Soju06#415)
- fix: replace reject-fast admission with wait-then-reject (Soju06#413)
- fix(auth): accept API keys on /api/codex/usage (Soju06#417)
- fix(auth): allow explicit unauthenticated proxy client CIDRs (Soju06#399)
- fix(proxy): websocket connect-phase failover (Soju06#396)
- fix(proxy): preserve previous_response_id on bridge recovery (Soju06#397)
- fix(proxy): budget-safe routing + image-gen compat (Soju06#421)
- feat(dashboard): show account plan in request logs table (Soju06#425)
- chore: archive 36 OpenSpec changes + type cleanup (Soju06#400)

Conflict resolutions per deploy plan:
- app/modules/proxy/service.py: accept upstream websocket connect helpers
  verbatim; defer fork endpoint concurrency limits from d66db88 pending
  follow-up rebase on wait-then-reject admission base
- app/modules/proxy/api.py: drop fork _reject_or_acquire_http_proxy_endpoint_concurrency
  path in codex_usage handler; accept upstream api_key dep
- app/core/metrics/prometheus.py: keep both fork + upstream metric counters
- tests: accept upstream new tests; keep fork assertions where additive
- frontend recent-requests-table.tsx: preserve fork column gating, insert
  upstream Plan column
- openspec specs: preserve fork database-migrations requirements; merge
  upstream database-backends additions
- uv.lock: regenerated via uv lock (v1.13.1 -> v1.15.0)

Fork features temporarily off in this merge commit (follow-up PR required):
- proxy endpoint concurrency limits (d66db88)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants