This project provides a robust, API-driven Python code execution sandbox, engineered for high security, stateful session management, and strong performance. It utilizes a centralized API Gateway and a dynamic Worker Pool architecture, providing each user with a completely isolated and persistent Python session.
A key technical feature of this project is its successful implementation of a "Virtual-Disk-per-Worker" architecture. This advanced model dynamically creates, formats, and mounts a dedicated virtual disk (.img
file via losetup
) for each worker container at runtime. This approach addresses the significant challenge of reliably managing dynamic block devices in a concurrent containerized environment, enabling superior I/O and filesystem isolation that is fundamental to the system's security posture.
Each worker is sandboxed within a multi-layered security model that includes strict resource limits, a zero-trust network policy, and runtime privilege reduction. By leveraging an internal Jupyter Kernel, it maintains the complete code execution context (variables, imports, functions) across multiple API calls, ensuring session continuity, security, and high performance.
Feature | Our Approach | Standard Implementations (Common Trade-offs) |
---|---|---|
🚀 Performance | A pre-warmed pool of idle workers ensures instant session allocation. The fully asynchronous design (FastAPI, httpx) delivers high throughput (~32.8 RPS) and low latency. | High latency due to on-demand container/environment startup for each new session. Often synchronous, leading to poor concurrency under load. |
🔒 Security | A multi-layered, zero-trust security model: no internet access (internal: true ), inter-worker firewalls (iptables ), gateway inaccessibility, and privilege drop (root to sandbox ). |
Basic containerization often allows outbound internet access, lacks inter-worker firewalls (risk of lateral movement), and may run code with excessive privileges. |
🔄 Statefulness | True session persistence. Each user is mapped to a dedicated worker with a persistent Jupyter Kernel, maintaining the full execution context across all API calls. | Stateless (each call is a new environment) or emulated statefulness (e.g., saving/loading state via serialization), which is often slow and incomplete. |
🛠️ Reliability | A "Cattle, not Pets" fault tolerance model. The gateway enforces hard timeouts and monitors worker health. Any failed, hung, or crashed worker is instantly destroyed and replaced. | Workers are often treated as stateful "pets" that require complex recovery logic, increasing the risk of contaminated or inconsistent states persisting. |
💡 I/O Isolation | Virtual-Disk-per-Worker Architecture. Each worker gets its own dynamically mounted block device, providing true filesystem and I/O isolation from the host and other workers. | Often relies on shared host volumes (risk of cross-talk and security breaches) or has no persistent, isolated storage at all. |
Stress-tested on a mid-range desktop to validate its performance and scalability under realistic conditions.
- Test Scenario: 25 concurrent users, each sending 100 stateful requests (with result verification at each step).
- Total Requests: 2,500
- Throughput (RPS): ~32.8 req/s
- Request Success Rate: 100%
- State Verification Success Rate: 100%
- P95 Latency: 496.50 ms
- Test Parameters: The benchmark was conducted with the following runtime configuration:
MinIdleWorkers
: 5MaxTotalWorkers
: 30WorkerCPU
: 1.0 coreWorkerRAM_MB
: 1024 MBWorkerDisk_MB
: 500 MB
- API Gateway: The single, authenticated entry point. Its
WorkerManager
manages the entire lifecycle of worker instances, including the dynamic creation of their virtual disks. It acts as the trusted control plane. - Worker Instance: An untrusted, disposable code execution unit. It runs a
Supervisor
that manages two processes (FastAPI service, Jupyter Kernel) as a non-rootsandbox
user. At startup, a script configures aniptables
firewall to only accept traffic from the Gateway before dropping root privileges and mounting its dedicated virtual disk.
- Docker and Docker Compose installed and running.
- An HTTP client (e.g., cURL, Postman, or Python's
httpx
).
Convenience scripts are provided to start the environment. You can customize the resource allocation and pool size via command-line arguments.
- Linux / macOS:
sh start.sh [options]
- Windows (PowerShell):
.\start.ps1 [options]
The gateway will listen on http://127.0.0.1:3874
.
You can pass the following parameters to the startup scripts to configure the system's behavior.
Parameter | Shell (.sh ) |
PowerShell (.ps1 ) |
Default | Description |
---|---|---|---|---|
Min Idle Workers | --min-idle-workers |
-MinIdleWorkers |
5 |
The minimum number of idle, pre-warmed workers to keep ready in the pool. |
Max Total Workers | --max-total-workers |
-MaxTotalWorkers |
50 |
The absolute maximum number of concurrent worker containers the system is allowed to create. |
Worker CPU Limit | --worker-cpu |
-WorkerCPU |
1.0 |
The number of CPU cores to allocate to each worker container (e.g., 1.5 for one and a half cores). |
Worker RAM Limit | --worker-ram-mb |
-WorkerRAM_MB |
1024 |
The amount of RAM in megabytes to allocate to each worker container. |
Worker Disk Size | --worker-disk-mb |
-WorkerDisk_MB |
500 |
The size of the virtual disk in megabytes to create for each worker's sandboxed filesystem. |
Example (Linux/macOS):
# Start with a larger pool and more powerful workers
sh start.sh --min-idle-workers 10 --worker-cpu 2.0 --worker-ram-mb 2048
Example (Windows PowerShell):
# Start with a lightweight configuration for a low-resource environment
.\start.ps1 -MinIdleWorkers 2 -MaxTotalWorkers 10 -WorkerCPU 0.5 -WorkerRAM_MB 512
Retrieve the auto-generated token from the running gateway container:
docker exec code-interpreter_gateway cat /gateway/auth_token.txt
For a quick UI test, open the included test.html
file in your browser, paste the token, and click "New Session".
- Linux / macOS:
sh stop.sh
- Windows (PowerShell):
.\stop.ps1
All requests require the X-Auth-Token: <your-token>
header.
Executes Python code within a user's stateful session.
- Request Body:
{ "user_uuid": "string", "code": "string" }
- Success Response (200 OK):
{ "result_text": "string | null", "result_base64": "string | null" }
- Timeout/Crash Response (503/504): Indicates a fatal error. The environment has been destroyed and recycled.
Proactively terminates a user's session and destroys its worker.
- Request Body:
{ "user_uuid": "string" }
- Success Response (200 OK):
{ "status": "ok", "detail": "..." }
Returns a summary of the worker pool's status for monitoring.
import httpx
import asyncio
import uuid
import base64
import subprocess
GATEWAY_URL = "http://127.0.0.1:3874"
USER_ID = str(uuid.uuid4())
def get_auth_token():
try:
return subprocess.check_output(
["docker", "exec", "code-interpreter_gateway", "cat", "/gateway/auth_token.txt"],
text=True
).strip()
except Exception:
print("❌ Could not fetch Auth Token. Is the service running?")
return None
async def execute_code(client: httpx.AsyncClient, code: str):
print(f"\n--- Executing ---\n{code.strip()}")
payload = {"user_uuid": USER_ID, "code": code}
try:
response = await client.post(f"{GATEWAY_URL}/execute", json=payload, timeout=30.0)
response.raise_for_status()
data = response.json()
if data.get("result_text"):
print(">>> Text Result:\n" + data["result_text"])
if data.get("result_base64"):
print(">>> Image generated! (output.png saved)")
with open("output.png", "wb") as f:
f.write(base64.b64decode(data["result_base64"]))
except httpx.HTTPStatusError as e:
print(f"Execution failed: {e.response.status_code} - {e.response.text}")
async def main():
token = get_auth_token()
if not token: return
headers = {"X-Auth-Token": token}
async with httpx.AsyncClient(headers=headers) as client:
# Step 1: Define a variable
await execute_code(client, "a = 100")
# Step 2: Use the variable from the previous step (state is maintained)
await execute_code(client, "print(f'The value of a is {a}')")
if __name__ == "__main__":
asyncio.run(main())