Tags: performance, optimization, bug, sandbox
Quality Rating: ⭐ 9/10
Reporter: xiaoan
Description
The local sandbox (bwrap) can cause PID exhaustion in container environments, leading to crashes and service degradation.
Alert Context
Container: fybclaw-backend
PID Usage: 80%
Current Processes: 241 / 300
Alert Threshold: 80%
Detected: 2026-05-31 10:15:01
Root Cause Analysis
Factor 1: Each code execution forks multiple processes
Every execute_code tool call creates a process chain:
uvicorn (main process)
└── bwrap (sandbox process, with --unshare-pid and new namespace)
└── python3 / bash / node (actual code execution)
└── User code may fork additional child processes
A single execute_code call consumes at least 3-5 PIDs.
Factor 2: Orphaned child processes after timeout kill
Current timeout handling in source code:
except asyncio.TimeoutError:
proc.kill()
await proc.communicate()
# Only kills the bwrap process
# Waits for exit
Problem: proc.kill() only sends SIGKILL to the bwrap process itself. Although bwrap is launched with --die-with-parent parameter, in certain edge cases (e.g., when bwrap itself is force-killed), its child processes may become orphaned and not be automatically recycled, continuing to occupy PID quota.
Recommended Optimizations
1. Process Tree Cleanup
Use process group killing instead of single process:
# Send SIGTERM to the entire process group
os.killpg(os.getpgid(proc.pid), signal.SIGTERM)
2. Enhanced Orphan Process Detection
Implement a periodic cleanup task that:
- Detects orphaned processes belonging to terminated bwrap instances
- Reaps zombie processes with
wait() or waitpid()
- Reports PID usage metrics for monitoring
3. Process Pool / Reuse Strategy
Consider implementing a process pool for sandbox execution to reduce fork overhead and improve PID efficiency.
4. PID Quota Monitoring
Add proactive monitoring:
- Warn when PID usage exceeds 60%
- Alert when approaching 80% threshold
- Auto-trigger cleanup when exceeding threshold
Expected Behavior
- PID usage should remain stable during normal operations
- No orphaned processes after timeout or errors
- Graceful degradation when approaching PID limits
Actual Behavior
- PID usage continuously grows due to accumulating orphaned processes
- Container eventually reaches PID limit and crashes
Additional Context
- Affects:
fybclaw-backend container
- Related to:
execute_code tool, bwrap sandbox implementation
- Impact: Service availability and stability
Tags:
performance,optimization,bug,sandboxQuality Rating: ⭐ 9/10
Reporter: xiaoan
Description
The local sandbox (bwrap) can cause PID exhaustion in container environments, leading to crashes and service degradation.
Alert Context
Root Cause Analysis
Factor 1: Each code execution forks multiple processes
Every
execute_codetool call creates a process chain:A single
execute_codecall consumes at least 3-5 PIDs.Factor 2: Orphaned child processes after timeout kill
Current timeout handling in source code:
Problem:
proc.kill()only sendsSIGKILLto the bwrap process itself. Although bwrap is launched with--die-with-parentparameter, in certain edge cases (e.g., when bwrap itself is force-killed), its child processes may become orphaned and not be automatically recycled, continuing to occupy PID quota.Recommended Optimizations
1. Process Tree Cleanup
Use process group killing instead of single process:
2. Enhanced Orphan Process Detection
Implement a periodic cleanup task that:
wait()orwaitpid()3. Process Pool / Reuse Strategy
Consider implementing a process pool for sandbox execution to reduce fork overhead and improve PID efficiency.
4. PID Quota Monitoring
Add proactive monitoring:
Expected Behavior
Actual Behavior
Additional Context
fybclaw-backendcontainerexecute_codetool, bwrap sandbox implementation