Replies: 4 comments 4 replies
-
My thoughts are that we'll probably want something closer to per session sandboxes. We're building towards much more autonomous systems where sessions are spun up by events like tickets/issues, create PRs, and then are retriggered by comments on the PR or pipeline failures. In that world we likely just want fresh working environments each time. Another side of things is that creating per-user bespoke sandboxes also starts to create overlapping functionality with agentic coding tools. Though that being said, using CAIPE to build a agentic factory where a team of agents share a single sandbox and work together on things concurrently is also a compelling story. I think with that in mind there's probably a progression here to follow in my opinion:
I think starting small, but building with items 2/3 in mind, will allow us to ship simpler sandboxes faster and drive the development of 2/3 using real world experience/feedback/scenarios to draw from after 1 is shipped. |
Beta Was this translation helpful? Give feedback.
-
I think the agent should be granted sandbox tools in the agent builder, and the system prompt can determine the behavior. This is simple and allows for self-service creation of different functionality for different use-cases. We might need to build more generic "ask before executing" tool functionality to make this more robust though. This way we would treat and administer sandboxes just like any other tool. |
Beta Was this translation helpful? Give feedback.
-
I think the agent's permissions in a sandbox should flow from the agent... e.g. if the agent has access to some repos via MCP, it should only have access to those repos in the sandbox. This probably plays into some of Splunk's RBAC/Authentication asks as well, with "per-agent credentials". Some agents will need different permissions from other agents, and there's probably some sort of "agent credentials" resource type that needs to be designed and built into the platform so that users can self service it in some way (lots TBD there) This does get difficult if you have multiple agents in a single sandbox though... but I think that scoping will be really important to build out to keep larger teams of agents reliable. E.g. a reviewer agent that doesn't have access to push code or write files can't decide to just do the changes itself instead. |
Beta Was this translation helpful? Give feedback.
-
|
Isolated execution environments for agents need to balance security with performance. Some patterns from running 200+ agents in production: Warm pool of pre-initialized sandboxes: Instead of creating a sandbox on-demand (slow cold start), maintain a pool of pre-warmed containers with the common dependencies already loaded. When an agent needs a sandbox, it claims one from the pool. After use, the sandbox gets reset and returned to the pool. This brought our sandbox acquisition time from ~30s to ~2s. Tiered isolation by trust level: Not every agent needs the same isolation strength. Platform-verified agents (Ed25519 identity confirmed by the platform) get lighter isolation (shared kernel, cgroup limits). Unsigned/untrusted agents get full VM-level isolation. This reduces overhead for the common case while maintaining security for the risky case. Network policy as capability enforcement: Instead of just blocking all network access, define per-agent network policies based on their declared capabilities. An agent with "web search" capability gets outbound HTTP access. An agent with "code execution" capability gets no network access at all. Tie these policies to the delegation chain — a child agent can't get more network access than its parent. Cost attribution through the sandbox: The sandbox itself tracks resource usage (CPU-seconds, memory-hours, network bytes) and attributes the cost to the agent's budget. This creates natural economic pressure for agents to be efficient with compute. Related write-up on multi-tenant isolation: https://blog.kinthai.ai/openclaw-multi-tenancy-why-vm-per-user-doesnt-scale |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Authors: @artdroz, @subbaksh
Summary
We're exploring the idea of giving agents access to isolated environments where they can execute code, run CLI commands, and interact with file systems.
The key design principle: the lifecycle of a sandbox is separate from the agent runtime lifecycle. Sandboxes are long-lived, user-managed resources that can be attached to and detached from agent conversations independently.
Motivation
Today, Agents operate without persistent, isolated execution environments. When an agent needs to run code or interact with a file system, it relies on the runtime's own environment, which is ephemeral and shared. This creates limitations:
Sandboxes solve this by providing managed, isolated environments that agents can use on demand.
Architecture
Sandbox Service (Decoupled)
We envision a separate sandbox service that handles provisioning, connectivity, and lifecycle management. The service sits alongside the agent runtime. This separation is deliberate: the sandbox service could be replaced by the management plane of an upstream sandbox provider without any changes to the agent or UI layer.
Technologies Under Exploration
Both provide isolated execution environments with network and filesystem isolation. We're evaluating which fits best as the backend for our sandbox service.
User Experience
Creating Sandboxes
Users create and manage sandboxes directly from the UI, separate from any agent. Sandboxes have their own:
Attaching Sandboxes to Agents
When configuring a Custom Agent, creators choose a sandbox mode:
Mockups:
In-Chat Experience
When an agent is configured with "User Chooses" mode, the user sees a sandbox picker on the new chat welcome screen. They can:
Once attached, the sandbox appears in the agent's context panel with an option to detach it.
Mockups:
Hibernation
From initial research of agent-sandbox, the pods do not have to be running ALL the time, the storage can be saved into disk via PVCs (e.g. EBS), and pod spun down. Only when a user is interacting with the sandbox, the pod will be started and sandbox resumed.
Policy Layer and RBAC
A configurable policy layer would govern sandbox behavior:
As well as setting RBAC/visibility policy on the sandbox similar to skills and agents today.
Open Questions
Sandbox strategy: Should sandboxes be per user, per session (expensive), per agent (insecure), or do we provide that choice to the agent creator like in the mockups?
Cleanup strategy for "Fresh Per Chat": What's the right default retention period? Should it be tied to the chat archive status
Slack integration: How would this work with slack or webex integration? Should the agent ask to connect to a sandbox, or send user a link to CAIPE to attach?
Backend selection — OpenShell vs. agent-sandbox vs. pluggable provider interface — what's the right abstraction?
References
Beta Was this translation helpful? Give feedback.
All reactions