fix: release lock before streaming to prevent orphan on client disconnect by beran-t · Pull Request #234 · e2b-dev/code-interpreter

beran-t · 2026-03-23T23:57:46Z

Summary

Fixes #213 — asyncio.Lock in messaging.py not released on client disconnect, causing cascading timeouts.

Narrows the lock scope in ContextWebSocket.execute() so it only covers the prepare+send phase (Phase A), not result streaming (Phase B)
Adds try/finally around streaming to clean up the _executions entry even if the generator is abandoned
Uses dict.pop() instead of del for defensive cleanup

Root Cause

The lock was held for the entire async generator lifetime, including _wait_for_result() which blocks on await queue.get(). When a client disconnects (SDK timeout), Starlette abandons the generator while it's stuck at queue.get() — aclose() is never called because Starlette only detects disconnects when trying to write. The lock stays held until the kernel finishes internally, blocking all subsequent executions.

Why This Fix Is Safe

Streaming results (Phase B) doesn't need the lock because:

Results are routed by unique message_id in _process_message() — no shared state during streaming
The kernel serializes execution internally — the lock doesn't need to enforce this
_global_env_vars (lazy init) and _cleanup_task (lifecycle management) are only accessed during Phase A

Reproduction & Verification

Tested on live E2B sandboxes with a custom code-interpreter-dev template built from this branch.

Before fix (stock code-interpreter template):

sleep(30) + 5s SDK timeout → print('hello') blocked 25.1s (orphaned lock)
sleep(60) + 3s timeout → 3 retries all timed out (cascade)

After fix (code-interpreter-dev template):

sleep(15) + 3s SDK timeout → print('hello') executes in 0.21s after kernel finishes
All 8 test cases pass: baseline, state persistence, env vars, error handling,
multiple contexts, JavaScript, lock-not-orphaned, no-cascade

Test plan

Baseline execution: print('hello') works in <1s
State persistence: variables survive across run_code calls
Environment variables: envs parameter works correctly
Error handling: ZeroDivisionError returned properly
Multiple contexts: independent state
JavaScript execution: console.log works
Lock not orphaned: after timeout+disconnect, next execution succeeds once kernel is free (0.21s, not 25s)
No cascade: retries after disconnect don't pile up behind each other

…t disconnect Narrow the lock scope in ContextWebSocket.execute() so that the lock is only held during the prepare+send phase, not during result streaming. Previously, the lock was held for the entire generator lifetime including the _wait_for_result() streaming loop. When a client disconnected (e.g. SDK timeout), Starlette abandoned the generator while it was blocked at `await queue.get()`. The lock stayed held until the kernel finished internally, blocking all subsequent executions on the same context and causing cascading timeouts. The fix moves the streaming phase (Phase B) outside the `async with self._lock` block. This is safe because results are routed by unique message_id in _process_message() — no shared state is accessed during streaming. A try/finally ensures the execution entry is cleaned up even if the generator is abandoned. Fixes e2b-dev#213

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 15ad5ab02e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-03-24T00:01:53Z

template/server/messaging.py

            # Clean up env vars in a separate request after the main code has run
            if env_vars:


Serialize env-var cleanup before allowing next execute

_cleanup_task is now created after the lock is released, so a concurrent call can enter execute() first, observe no pending cleanup, and queue its own code while the previous call’s per-request env_vars are still active. In practice, if request A sets env_vars and request B arrives before this assignment runs, B can execute with A’s environment, which is a regression from the old behavior that always waited for cleanup before starting the next execution.

Useful? React with 👍 / 👎.

Fixes #213 — asyncio.Lock in messaging.py not released on client disconnect, causing cascading timeouts. Changes: - Narrow lock scope in ContextWebSocket.execute() to only cover the prepare+send phase (Phase A), releasing it before result streaming (Phase B). This prevents orphaned locks on client disconnect. - Schedule env var cleanup task under the lock (before release) to avoid the race condition flagged in PRs #234/#235. - Add POST /contexts/{id}/interrupt endpoint that calls Jupyter's kernel interrupt API, allowing clients to stop long-running code without restarting the kernel (preserves state). - Add interrupt_code_context/interruptCodeContext to Python and JS SDKs. Co-Authored-By: vasek <vasek.mlejnsky@gmail.com>

beran-t requested review from ValentaTomas, jakubno and mishushakov as code owners March 23, 2026 23:57

beran-t closed this Mar 24, 2026

chatgpt-codex-connector bot reviewed Mar 24, 2026

View reviewed changes

devin-ai-integration bot mentioned this pull request Mar 24, 2026

fix: release lock before streaming and add kernel interrupt support #236

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: release lock before streaming to prevent orphan on client disconnect#234

fix: release lock before streaming to prevent orphan on client disconnect#234
beran-t wants to merge 1 commit intoe2b-dev:mainfrom
beran-t:fix/lock-orphan-on-disconnect

beran-t commented Mar 23, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# Clean up env vars in a separate request after the main code has run
		if env_vars:

Conversation

beran-t commented Mar 23, 2026

Summary

Root Cause

Why This Fix Is Safe

Reproduction & Verification

Test plan

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant