-
Notifications
You must be signed in to change notification settings - Fork 39
Fix code context state isolation #249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Each code context now gets a dedicated executor process from creation to deletion, preventing state leakage between contexts. Removed maximum pool size limits to allow organic scaling based on context count.
🦋 Changeset detectedLatest commit: 2feea4e The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
sandbox-site | 8c8999c | Commit Preview URL Branch Preview URL |
Nov 25 2025, 09:05 PM |
Add documentation explaining that each code context is completely isolated with a dedicated executor process. Includes example showing concurrent execution across multiple contexts. Related to cloudflare/sandbox-sdk#249 which fixed state isolation bug.
commit: |
🐳 Docker Image PublishedFROM cloudflare/sandbox:0.0.0-pr-249-562e595Version: You can use this Docker image with the preview package from this PR. |
Add atomic concurrent execution checks, exit handler cleanup, error handling for executor release failures, per-language process limits, and language validation. Include concurrent execution test.
Adds documentation for improved code context isolation: - Each context now gets a dedicated executor process (1:1 binding) - Contexts prevent concurrent execution on the same context - Complete state isolation between contexts guaranteed Related to cloudflare/sandbox-sdk#249 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Remove redundant try-catch blocks that just rethrow errors, use try-finally pattern for guaranteed cleanup, simplify comments about atomicity, and enhance concurrent execution test to verify actual state isolation rather than just error counts.
Replace isAvailable flag with executionPromise to fix race condition. The promise acts as an atomic lock preventing concurrent execution on the same context.
Revert Dockerfile platform change that should not be committed. Increase concurrent test requests to 20 for better stress testing.
Use async-mutex to prevent race conditions when concurrent requests target the same context. Execute code before creating stream to hold mutex during actual execution.
Prevents race conditions when multiple contexts reserve executors concurrently, and detects crashed executors before execution.
Documents the fix for code context isolation bug where contexts leaked state after 10 executions. Each context now has dedicated executor resources throughout its lifetime, ensuring complete isolation. Changes: - Add context isolation note explaining dedicated resources - Emphasize isolation guarantees in createCodeContext() - Add cleanup best practice for deleteCodeContext() - Remove implied pool size limitations Related to cloudflare/sandbox-sdk#249 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add cleanup on context creation failure to prevent executor leaks - Serialize stateless executions with per-language mutex
Moved all mutex locking from InterpreterService to ProcessPoolManager, using per-executor locks instead of per-context locks. This enables parallel stateless execution via borrow/return pattern and simplifies code by centralizing all concurrency control in one layer.
Adds E2E test that verifies state isolation is maintained when executing code concurrently across 12 different contexts. This validates that the executor locking mechanism prevents race conditions while maintaining proper context isolation.
The test validates concurrent execution isolation, not concurrent creation. Creating contexts concurrently causes mutex-serialized spawning that times out in CI. Sequential creation matches the working test pattern and still validates the intended behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Claude Code Review
The isolation fix architecture is sound and the two-level locking strategy is well-designed. However, there are two critical bugs in the locking implementation that could cause the exact state leakage this PR aims to fix.
Critical Issues (Must Fix Before Merge)
-
Race condition in
getExecutorLock()creates duplicate mutexes - Two concurrent executions can create separate mutex instances for the same executor, allowing concurrent execution and defeating the isolation guarantee. See inline comment. -
Memory leak: Executor locks never cleaned up - The exit handler doesn't remove mutexes from
executorLocksmap, causing unbounded growth in long-running containers. See inline comment.
Important Issues
-
Test coverage gap - The test at line 582 doesn't actually verify isolation (it only checks for errors). The code would pass even with the bug. See inline comment for better test approach.
-
Documentation needs - Force-delete behavior and unlimited process defaults should be documented prominently to prevent user confusion and resource exhaustion.
Architecture Assessment
The 1:1 context-to-executor binding is the right design for stateful code interpreter use cases. The borrow/return pattern for stateless execution is correct. The per-language pool locks properly protect structural changes. Once the critical locking bugs are fixed, this will be production-ready.
Verdict: Needs fixes before merge - the race condition in executor lock creation can cause state isolation failures.
| }); | ||
| } | ||
|
|
||
| private getExecutorLock(executorId: string): Mutex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CRITICAL: Race condition in mutex creation
Two concurrent calls with the same executorId can both create separate Mutex instances:
- Thread A checks
executorLocks.get(id)→ undefined - Thread B checks
executorLocks.get(id)→ undefined - Thread A creates Mutex1, sets it
- Thread B creates Mutex2, sets it (overwrites Mutex1)
- Thread A acquires Mutex1, Thread B acquires Mutex2
- Both execute concurrently on same process → state corruption
This defeats the entire isolation fix.
Fix:
private executorLockCreationMutex = new Mutex();
private async getExecutorLock(executorId: string): Promise<Mutex> {
let mutex = this.executorLocks.get(executorId);
if (mutex) return mutex;
return await this.executorLockCreationMutex.runExclusive(() => {
// Double-check after acquiring lock
let mutex = this.executorLocks.get(executorId);
if (!mutex) {
mutex = new Mutex();
this.executorLocks.set(executorId, mutex);
}
return mutex;
});
}Update all call sites from getExecutorLock(id) to await getExecutorLock(id) (lines 219, 227).
| lastUsed: new Date() | ||
| }; | ||
|
|
||
| // Register exit handler for cleanup (prevents memory leaks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CRITICAL: Memory leak - executor locks never cleaned up
The exit handler removes the executor from all tracking structures but doesn't remove its mutex from executorLocks. Over time, this map grows unbounded with mutexes for dead processes.
Fix: Add cleanup after removing from available executors:
const available = this.availableExecutors.get(language);
if (available) {
const index = available.indexOf(interpreterProcess);
if (index > -1) available.splice(index, 1);
}
// Clean up executor lock
this.executorLocks.delete(interpreterProcess.id);Note: releaseExecutorForContext at line 578 already has this cleanup, but the exit handler needs it too for cases where processes crash.
| ); | ||
| }, 120000); | ||
|
|
||
| test('should maintain isolation across many contexts (12+)', async () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test coverage gap: Doesn't verify isolation
This test creates 12 contexts sequentially and executes const value = 2; in each. It only checks that executions succeed without errors.
Problem: The test would pass even if contexts shared state because:
- Each context declares a new
const valuein its own scope - JavaScript allows this even if the process is shared
- No verification that state from context N doesn't leak to N+1
To actually test isolation:
for (let i = 0; i < 12; i++) {
const context = await createContext();
// Set unique value in global scope
await executeCode(context, `globalThis.testValue_${i} = ${i};`);
// Verify no leakage from previous contexts
for (let j = 0; j < i; j++) {
const checkResult = await executeCode(
context,
`typeof globalThis.testValue_${j}`
);
expect(checkResult.results[0].text).toContain('undefined');
}
// Verify own value is set
const ownResult = await executeCode(
context,
`globalThis.testValue_${i}`
);
expect(ownResult.results[0].text).toContain(String(i));
await deleteContext(context);
}This would catch the original bug where contexts shared executors after 10 executions.
| return this.availableExecutors.get(language) || []; | ||
| } | ||
|
|
||
| async shutdown(): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory leak in shutdown
The shutdown() method clears pools but doesn't clear the other tracking maps:
async shutdown(): Promise<void> {
if (this.cleanupInterval) {
clearInterval(this.cleanupInterval);
}
// Kill all processes
for (const [executor, pool] of this.pools.entries()) {
for (const process of pool) {
process.process.kill();
}
}
// Clear all tracking structures
this.pools.clear();
this.contextExecutors.clear();
this.availableExecutors.clear();
this.executorLocks.clear();
this.poolLocks.clear();
}Not critical for production (singleton pattern) but causes memory leaks in test suites.
Fixes state leakage between code contexts by assigning each context a dedicated executor process from creation to deletion. Removes maximum pool size limits to allow organic scaling.
Changes