Sandbox Integration: Isolated Execution Environments for Agents #1091

subbaksh · 2026-04-01T17:40:11Z

subbaksh
Apr 1, 2026
Maintainer

Summary

We're exploring the idea of giving agents access to isolated environments where they can execute code, run CLI commands, and interact with file systems.

The key design principle: the lifecycle of a sandbox is separate from the agent runtime lifecycle. Sandboxes are long-lived, user-managed resources that can be attached to and detached from agent conversations independently.

Motivation

Today, Agents operate without persistent, isolated execution environments. When an agent needs to run code or interact with a file system, it relies on the runtime's own environment, which is ephemeral and shared. This creates limitations:

No safe way for agents to execute arbitrary code or CLI commands
No way to run file heavy operations, without loading them into memory in langchain
No way for two agents to share state

Sandboxes solve this by providing managed, isolated environments that agents can use on demand.

Architecture

Sandbox Service (Decoupled)

We envision a separate sandbox service that handles provisioning, connectivity, and lifecycle management. The service sits alongside the agent runtime. This separation is deliberate: the sandbox service could be replaced by the management plane of an upstream sandbox provider without any changes to the agent or UI layer.

Technologies Under Exploration

NVIDIA OpenShell — Provides containerized shell environments with agent-friendly APIs
Kubernetes Agent Sandbox (SIG) — Kubernetes-native sandbox provisioning for AI agents

Both provide isolated execution environments with network and filesystem isolation. We're evaluating which fits best as the backend for our sandbox service.

User Experience

Creating Sandboxes

Users create and manage sandboxes directly from the UI, separate from any agent. Sandboxes have their own:

Name and description
Visibility (private, team, or global)
Status (active or hibernating)
Type (e.g., openshell — extensible to other providers)

Attaching Sandboxes to Agents

When configuring a Custom Agent, creators choose a sandbox mode:

Mode	Behavior
No Sandbox	Agent runs without an isolated environment. Best for simple agents that don't need file system or shell access.
Shared Sandbox	A single sandbox is used for all users and chats. Ideal for read-only environments or shared team workspaces.
User Chooses	Users pick which sandbox to use (or skip) when starting a new chat. Flexible — users can select an existing sandbox or create a new one.
Fresh Per Chat	A new sandbox is automatically created for each conversation. Maximum isolation — each chat gets a clean environment.

Mockups:

In-Chat Experience

When an agent is configured with "User Chooses" mode, the user sees a sandbox picker on the new chat welcome screen. They can:

Select from their existing sandboxes
Proceed without a sandbox
Remember their choice for future chats with that agent

Once attached, the sandbox appears in the agent's context panel with an option to detach it.

Mockups:

Hibernation

From initial research of agent-sandbox, the pods do not have to be running ALL the time, the storage can be saved into disk via PVCs (e.g. EBS), and pod spun down. Only when a user is interacting with the sandbox, the pod will be started and sandbox resumed.

Policy Layer and RBAC

A configurable policy layer would govern sandbox behavior:

Allowed/blocked commands or binaries
Network egress rules and usual pod security boundaries
CPU/Mem limits via K8s

As well as setting RBAC/visibility policy on the sandbox similar to skills and agents today.

Open Questions

Sandbox strategy: Should sandboxes be per user, per session (expensive), per agent (insecure), or do we provide that choice to the agent creator like in the mockups?
Cleanup strategy for "Fresh Per Chat": What's the right default retention period? Should it be tied to the chat archive status
Slack integration: How would this work with slack or webex integration? Should the agent ask to connect to a sandbox, or send user a link to CAIPE to attach?
Backend selection — OpenShell vs. agent-sandbox vs. pluggable provider interface — what's the right abstraction?

References

cisco-erilutz · 2026-04-01T20:37:09Z

cisco-erilutz
Apr 1, 2026
Maintainer

Sandbox strategy: Should sandboxes be per user, per session (expensive), per agent (insecure), or do we provide that choice to the agent creator like in the mockups?

My thoughts are that we'll probably want something closer to per session sandboxes. We're building towards much more autonomous systems where sessions are spun up by events like tickets/issues, create PRs, and then are retriggered by comments on the PR or pipeline failures. In that world we likely just want fresh working environments each time.

Another side of things is that creating per-user bespoke sandboxes also starts to create overlapping functionality with agentic coding tools.

Though that being said, using CAIPE to build a agentic factory where a team of agents share a single sandbox and work together on things concurrently is also a compelling story.

I think with that in mind there's probably a progression here to follow in my opinion:

Shorter lived, more ephemeral per-session sandboxes that unlock features like autonomous PR creation for targeted use cases. I think we're at a point with agentic development that we can do 1-shot development like that, or development with some back and forth, autonomously and provide some value. But we're not at the point where we can reliably point agents at larger changes and have them run autonomously to end in a way where it's not better off to run this locally with coding tools like Claude Code.
More persistent user sandboxes that users can attach agents to. This will allow us to do more things in CAIPE, especially leveraging a lot of the connection already built into CAIPE that users wouldn't have to set up locally. However we still run into the overlap with tools like Claude Code but with harder authentication problems, so we'd need to be clear about use-cases.
Shared sandboxes that teams of agents can attach and detach from to orchestrate larger feature work autonomously, enabling large features to be shipped agentically through CAIPE. Multiple agents with multiple sessions per agent may all share and collaborate in the same sandbox to develop, test, and ship a larger change.

I think starting small, but building with items 2/3 in mind, will allow us to ship simpler sandboxes faster and drive the development of 2/3 using real world experience/feedback/scenarios to draw from after 1 is shipped.

1 reply

subbaksh Apr 8, 2026
Maintainer Author

Completely agree on staged rollout.
I think once we have 1 working, 2 and 3 are trivial, its mostly UI/config changes.
I think with 1. instead of "create a sandbox per session", should it be "let agent create sandboxes as needed"?

cisco-erilutz · 2026-04-01T20:40:08Z

cisco-erilutz
Apr 1, 2026
Maintainer

Slack integration: How would this work with slack or webex integration? Should the agent ask to connect to a sandbox, or send user a link to CAIPE to attach?

I think the agent should be granted sandbox tools in the agent builder, and the system prompt can determine the behavior. This is simple and allows for self-service creation of different functionality for different use-cases.

We might need to build more generic "ask before executing" tool functionality to make this more robust though.

This way we would treat and administer sandboxes just like any other tool.

2 replies

subbaksh Apr 8, 2026
Maintainer Author

Yep, from the comment above:

I think with 1. instead of "create a sandbox per session", should it be "let agent create sandboxes as needed"?

The only concern for me here is, when to tear down the sandbox. Relying on agent to tear it down on its own is gonna be dangerous :)

or maybe not? if we have a time based expiry, like max 4h after which all ephemeral sandboxes will be gone

subbaksh Apr 8, 2026
Maintainer Author

"ask before executing"

Yes a permission/approval event, this is a generic CAIPE agent runtime issue. Currently we only do this for input forms. But a middleware that checks whats being executed/ what tool calls are made and for certain ones ask the user for approval

cisco-erilutz · 2026-04-01T20:44:35Z

cisco-erilutz
Apr 1, 2026
Maintainer

Agent permissions in sandboxes

I think the agent's permissions in a sandbox should flow from the agent... e.g. if the agent has access to some repos via MCP, it should only have access to those repos in the sandbox.

This probably plays into some of Splunk's RBAC/Authentication asks as well, with "per-agent credentials". Some agents will need different permissions from other agents, and there's probably some sort of "agent credentials" resource type that needs to be designed and built into the platform so that users can self service it in some way (lots TBD there)

This does get difficult if you have multiple agents in a single sandbox though... but I think that scoping will be really important to build out to keep larger teams of agents reliable. E.g. a reviewer agent that doesn't have access to push code or write files can't decide to just do the changes itself instead.

1 reply

subbaksh Apr 8, 2026
Maintainer Author

Agreed, this is probably related to @sriaradhyula 's RBAC work.

The multiple agents in a sandbox is okay as long as its just one user.
If its multiple users sharing a sandbox then, we need to think about credential further

kinthaiofficial · 2026-04-28T23:59:08Z

kinthaiofficial
Apr 28, 2026

Isolated execution environments for agents need to balance security with performance. Some patterns from running 200+ agents in production:

Warm pool of pre-initialized sandboxes: Instead of creating a sandbox on-demand (slow cold start), maintain a pool of pre-warmed containers with the common dependencies already loaded. When an agent needs a sandbox, it claims one from the pool. After use, the sandbox gets reset and returned to the pool. This brought our sandbox acquisition time from ~30s to ~2s.

Tiered isolation by trust level: Not every agent needs the same isolation strength. Platform-verified agents (Ed25519 identity confirmed by the platform) get lighter isolation (shared kernel, cgroup limits). Unsigned/untrusted agents get full VM-level isolation. This reduces overhead for the common case while maintaining security for the risky case.

Network policy as capability enforcement: Instead of just blocking all network access, define per-agent network policies based on their declared capabilities. An agent with "web search" capability gets outbound HTTP access. An agent with "code execution" capability gets no network access at all. Tie these policies to the delegation chain — a child agent can't get more network access than its parent.

Cost attribution through the sandbox: The sandbox itself tracks resource usage (CPU-seconds, memory-hours, network bytes) and attributes the cost to the agent's budget. This creates natural economic pressure for agents to be efficient with compute.

Related write-up on multi-tenant isolation: https://blog.kinthai.ai/openclaw-multi-tenancy-why-vm-per-user-doesnt-scale

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sandbox Integration: Isolated Execution Environments for Agents #1091

Uh oh!

{{title}}

Uh oh!

Replies: 4 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Sandbox Integration: Isolated Execution Environments for Agents #1091

Uh oh!

subbaksh Apr 1, 2026 Maintainer

Summary

Motivation

Architecture

Sandbox Service (Decoupled)

Technologies Under Exploration

User Experience

Creating Sandboxes

Attaching Sandboxes to Agents

In-Chat Experience

Hibernation

Policy Layer and RBAC

Open Questions

References

Replies: 4 comments · 4 replies

Uh oh!

cisco-erilutz Apr 1, 2026 Maintainer

Uh oh!

subbaksh Apr 8, 2026 Maintainer Author

Uh oh!

cisco-erilutz Apr 1, 2026 Maintainer

Uh oh!

Uh oh!

subbaksh Apr 8, 2026 Maintainer Author

Uh oh!

subbaksh Apr 8, 2026 Maintainer Author

Uh oh!

cisco-erilutz Apr 1, 2026 Maintainer

Uh oh!

subbaksh Apr 8, 2026 Maintainer Author

Uh oh!

kinthaiofficial Apr 28, 2026

subbaksh
Apr 1, 2026
Maintainer

Replies: 4 comments 4 replies

cisco-erilutz
Apr 1, 2026
Maintainer

subbaksh Apr 8, 2026
Maintainer Author

cisco-erilutz
Apr 1, 2026
Maintainer

subbaksh Apr 8, 2026
Maintainer Author

subbaksh Apr 8, 2026
Maintainer Author

cisco-erilutz
Apr 1, 2026
Maintainer

subbaksh Apr 8, 2026
Maintainer Author

kinthaiofficial
Apr 28, 2026