🤖 Add integration tests for stream error recovery (no amnesia) #333

ammar-agent · 2025-10-19T17:33:29Z

Summary

Adds integration test to verify that stream error recovery preserves context (no amnesia bug).

Changes

Debug IPC for testing: Added DEBUG_TRIGGER_STREAM_ERROR IPC channel
StreamManager debug method: debugTriggerStreamError() triggers artificial stream errors that follow the same code path as real errors
Integration test: Single error + resume scenario verifies context preservation via structured markers

Test Design

Structured-marker approach for precise validation:

Test Flow:

Generate unique nonce for test run (random 10-char identifier)
Model counts 1-100 using structured format: ${nonce}-<n>: <word> (e.g. ai7qcnc20g-1: one)
Collect stream deltas until ≥10 complete markers detected
Trigger artificial network error mid-stream
Resume stream and wait for completion
Verify final message has both properties:
- (a) Prefix preservation: Starts with exact pre-error streamed text
- (b) Exact continuation: Contains next sequential marker (${nonce}-11) shortly after prefix

Validation:

Pre-error content captured from stream-delta events (user-visible data path)
Stable prefix truncated to last complete marker line (no partial markers)
Assertions directly prove both amnesia-prevention properties
No coupling to internal storage formats or metadata

Why this approach:

Precise: Detects exact continuation (not just "some work done")
Unambiguous: Random nonce makes false positives virtually impossible
Robust: Structured format less likely to confuse model than natural language
Fast: Haiku 4.5 completes in ~18-21 seconds

Bug Fix

Also fixed event collection bug in collectStreamUntil: properly track consumed deltas to avoid returning the same event multiple times. Previous logic returned first matching event on every poll, causing duplicate processing.

Test Results

✅ Test passes reliably in ~18-21 seconds
✅ Validates exact prefix preservation and continuation
✅ No flaky failures from timing issues
✅ Integration tests pass: 1 passed, 1 total

Generated with cmux

- Add DEBUG_TRIGGER_STREAM_ERROR IPC channel for testing - Add debugTriggerStreamError method to StreamManager that: - Aborts active stream - Writes partial with accumulated parts and error metadata - Emits error event (same as real errors) - Add integration tests that verify context preservation: - Test 1: Single stream error + resume - Test 2: Three consecutive stream errors + resume - Tests use Haiku 4.5 for speed - Tests verify accumulated parts are preserved in partial.json - Tests verify resumed streams complete successfully with context Generated with `cmux`

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

src/services/streamManager.ts

- Fix: After triggering debug error, update streamInfo.initialMetadata with error/errorType - This ensures subsequent flushPartialWrite() calls preserve the error metadata - Prevents cleanup code from overwriting error-marked partial with clean partial

- Remove direct filesystem access (partial.json reads) - Remove metadata structure verification - Use existing readChatHistory helper instead of custom implementation - Create user-focused helpers: waitForStreamWithContent, triggerStreamError, resumeAndWaitForSuccess - Verify behavioral outcomes (content delivered, topic-relevant) not internal state - Tests now read like user journeys instead of implementation checks - Add comprehensive documentation explaining test approach Makes tests resilient to refactoring - they verify the behavioral contract (no amnesia after stream errors) rather than implementation details. -18 lines, improved readability

Changed from essay/explanation tasks to counting 1-100 for more robust verification: - Extract and validate number sequences from responses - Verify sequence continuity proves context preservation - Check progress past error points to confirm no amnesia - Disable all tools via toolPolicy to ensure pure text responses Deterministic validation is less flaky than keyword matching and provides stronger proof that context was actually preserved through errors.

Changes: - Fixed counting task with descriptions for slower streaming - Adjusted validation to handle realistic model behavior (may restart count) - Removed flaky multi-error test (model completes too fast for multiple interruptions) - Single error test proves amnesia fix works correctly - Test now passes reliably in ~23s Validation now checks for substantial work (range, unique numbers) rather than perfect ascending sequence, which is more realistic for error recovery scenarios.

Replace the previous "substantial work" test with a more precise test that validates both prefix preservation and exact continuation after stream errors. Key improvements: - Use structured markers (nonce + line numbers) to detect exact continuation - Capture pre-error streamed text from stream-delta events (user-visible path) - Interrupt mid-stream after detecting stable prefix (≥10 complete markers) - Assert: (a) final text starts with exact pre-error prefix, (b) contains next sequential marker shortly after prefix - Fix event collection bug: properly track consumed deltas to avoid duplicates The test now directly proves both properties of "no amnesia" recovery: 1. Pre-error streamed content is preserved in history (prefix preservation) 2. Resumed stream continues from exact point (exact continuation) No internal storage coupling - uses only stream events and final history.

ammar-agent added 2 commits October 19, 2025 12:33

Fix lint errors

281c5b6

chatgpt-codex-connector bot reviewed Oct 19, 2025

View reviewed changes

src/services/streamManager.ts Show resolved Hide resolved

ammar-agent added 7 commits October 19, 2025 12:40

🤖 Fix formatting (prettier)

4ae4f3b

🤖 fmt: apply prettier formatting

7805b81

ammario added this pull request to the merge queue Oct 19, 2025

Merged via the queue into main with commit 2fbcd36 Oct 19, 2025
8 checks passed

ammario deleted the test-stream-error-amnesia branch October 19, 2025 23:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🤖 Add integration tests for stream error recovery (no amnesia) #333

🤖 Add integration tests for stream error recovery (no amnesia) #333

Uh oh!

ammar-agent commented Oct 19, 2025 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

🤖 Add integration tests for stream error recovery (no amnesia) #333

🤖 Add integration tests for stream error recovery (no amnesia) #333

Uh oh!

Conversation

ammar-agent commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Design

Bug Fix

Related

Test Results

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ammar-agent commented Oct 19, 2025 •

edited

Loading