Skip to content

Add E2BSandboxToolset integration for Haystack agents#448

Draft
tholor wants to merge 10 commits intomainfrom
claude/e2b-sandbox-integration-3cWWo
Draft

Add E2BSandboxToolset integration for Haystack agents#448
tholor wants to merge 10 commits intomainfrom
claude/e2b-sandbox-integration-3cWWo

Conversation

@tholor
Copy link
Member

@tholor tholor commented Mar 11, 2026

Introduces E2BSandboxToolset, a Haystack Toolset subclass that
connects to an E2B cloud sandbox and exposes four tools to any
Haystack Agent: run_bash_command, read_file, write_file, and
list_directory.

Key design points:

  • Sandbox connection is established lazily via warm_up(), which is
    called automatically by the Haystack pipeline/agent before the first
    tool invocation and is idempotent.
  • close() shuts down the sandbox and releases resources.
  • API key is managed via Haystack's Secret (defaults to the
    E2B_API_KEY environment variable).
  • Full to_dict / from_dict serialisation support; the live sandbox
    instance is not serialised and is re-created on warm_up().
  • e2b added as an optional test dependency in pyproject.toml.
  • 38 unit tests covering init, warm-up lifecycle, each tool operation,
    error handling, and round-trip serialisation.

https://claude.ai/code/session_01DwDqKPEtssXgxqEaArcXiN

Introduces `E2BSandboxToolset`, a Haystack `Toolset` subclass that
connects to an E2B cloud sandbox and exposes four tools to any
Haystack Agent: `run_bash_command`, `read_file`, `write_file`, and
`list_directory`.

Key design points:
- Sandbox connection is established lazily via `warm_up()`, which is
  called automatically by the Haystack pipeline/agent before the first
  tool invocation and is idempotent.
- `close()` shuts down the sandbox and releases resources.
- API key is managed via Haystack's `Secret` (defaults to the
  `E2B_API_KEY` environment variable).
- Full `to_dict` / `from_dict` serialisation support; the live sandbox
  instance is not serialised and is re-created on `warm_up()`.
- `e2b` added as an optional test dependency in `pyproject.toml`.
- 38 unit tests covering init, warm-up lifecycle, each tool operation,
  error handling, and round-trip serialisation.

https://claude.ai/code/session_01DwDqKPEtssXgxqEaArcXiN
@tholor tholor requested a review from a team as a code owner March 11, 2026 07:46
@tholor tholor requested review from anakin87 and removed request for a team March 11, 2026 07:46
@tholor tholor requested a review from Copilot March 11, 2026 07:56
@tholor tholor removed the request for review from anakin87 March 11, 2026 08:01
@sjrl
Copy link
Contributor

sjrl commented Mar 11, 2026

Hey @tholor thanks for working on this!

Quick high-level question/suggestion. I wonder if we should consider creating separate pre-made tools for the four tools you mention run_bash_command, read_file, write_file, and list_directory. This would in theory allow for the most customization (e.g. some users could only want to use a subset of the tools) instead requiring all four tools to always be loaded into the Agent with the custom Toolset class.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an E2B-backed Haystack Toolset to allow Haystack agents/pipelines to execute bash commands and perform basic filesystem operations inside an E2B cloud sandbox, with lazy lifecycle management and serialization support.

Changes:

  • Introduces E2BSandboxToolset with warm_up()/close() lifecycle, 4 tools, and to_dict()/from_dict() serialization.
  • Adds unit tests covering lifecycle, tool behavior, error wrapping, and serialization round-trips.
  • Adds e2b to the test environment’s optional dependencies.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
haystack_experimental/tools/e2b/sandbox_toolset.py Implements the new E2BSandboxToolset and its tools, lifecycle, and serialization.
haystack_experimental/tools/e2b/__init__.py Exposes E2BSandboxToolset via lazy import structure.
haystack_experimental/tools/__init__.py Introduces the tools package (license header).
test/tools/e2b/test_sandbox_toolset.py Adds unit tests for initialization, lifecycle, tool calls, and serialization.
test/tools/e2b/__init__.py Makes test.tools.e2b a package (license header).
test/tools/__init__.py Makes test.tools a package (license header).
pyproject.toml Adds e2b to test env extra dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +43 to +48
def test_default_parameters(self):
toolset = _make_toolset()
assert toolset.sandbox_template == "base"
assert toolset.timeout == 120
assert toolset.environment_vars == {}
assert toolset._sandbox is None
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_default_parameters isn’t actually validating the toolset’s real defaults because _make_toolset() always overrides timeout (120) and sandbox_template ("base"). This can let regressions slip through if the class defaults change. Consider instantiating E2BSandboxToolset directly (only passing api_key) for the default-parameters test, or make _make_toolset() not override non-essential defaults.

Copilot uses AI. Check for mistakes.
@tholor
Copy link
Member Author

tholor commented Mar 11, 2026

Hey @tholor thanks for working on this!

Quick high-level question/suggestion. I wonder if we should consider creating separate pre-made tools for the four tools you mention run_bash_command, read_file, write_file, and list_directory. This would in theory allow for the most customization (e.g. some users could only want to use a subset of the tools) instead requiring all four tools to always be loaded into the Agent with the custom Toolset class.

good point, thanks! let's do that

@sjrl
Copy link
Contributor

sjrl commented Mar 11, 2026

Hey @tholor thanks for working on this!
Quick high-level question/suggestion. I wonder if we should consider creating separate pre-made tools for the four tools you mention run_bash_command, read_file, write_file, and list_directory. This would in theory allow for the most customization (e.g. some users could only want to use a subset of the tools) instead requiring all four tools to always be loaded into the Agent with the custom Toolset class.

good point, thanks! let's do that

As a starting point for the implementation you could take a look at how we did this for pre-made GitHub tools https://github.com/deepset-ai/haystack-core-integrations/tree/main/integrations/github/src/haystack_integrations/tools/github

claude added 2 commits March 11, 2026 08:31
Address reviewer feedback (sjrl, tholor) to expose individual pre-made
Tool objects instead of a monolithic Toolset, so users can load any
subset of the four tools into their agent.

Changes:
- Replace E2BSandboxToolset (Toolset subclass) with E2BSandbox (plain
  dataclass) that manages the sandbox lifecycle (warm_up / close /
  to_dict / from_dict).
- Add four individual tool factory functions:
    create_run_bash_command_tool(sandbox)
    create_read_file_tool(sandbox)
    create_write_file_tool(sandbox)
    create_list_directory_tool(sandbox)
- Add create_e2b_tools() convenience factory that returns (sandbox, tools)
  so callers can pass any subset; all tools share the same E2BSandbox
  instance, preserving filesystem / process state across invocations.
- Update __init__.py to export the new public names.
- Rewrite tests to match the new API and fix the Copilot review comment:
  test_class_defaults now instantiates E2BSandbox with only api_key to
  validate the real class defaults rather than helper-overridden values.

https://claude.ai/code/session_01DwDqKPEtssXgxqEaArcXiN
@tholor tholor marked this pull request as draft March 11, 2026 08:53
claude added 3 commits March 11, 2026 08:57
RunBashCommandTool, ReadFileTool, WriteFileTool, ListDirectoryTool now
subclass haystack.tools.Tool directly. Users instantiate them with a
shared E2BSandbox instance, mirroring how chat generators are passed to
Agent. The create_e2b_tools() convenience function is kept and updated
to use the new classes.
- e2b_sandbox.py: E2BSandbox
- bash_tool.py: RunBashCommandTool
- read_file_tool.py: ReadFileTool
- write_file_tool.py: WriteFileTool
- list_directory_tool.py: ListDirectoryTool
- sandbox_toolset.py: create_e2b_tools (convenience function only)
@anakin87
Copy link
Member

@tholor feel free to evaluate if it would make sense to directly create these new classes in haystack-core-integrations. We could create and E2B integration and release it with very experimental release numbers (e.g. 0.0.1).

In case you go this route, we recently added a scaffolding script that makes life easier for contributors: https://github.com/deepset-ai/haystack-core-integrations/blob/main/CONTRIBUTING.md#create-a-new-integration

@julian-risch
Copy link
Member

I agree that this fits better into haystack-core-integrations. We have an open issue for a similar integration but with exec-sandbox deepset-ai/haystack-core-integrations#2933
Related to our earlier conversation Malte, I looked into agentfs for isolated editing of files for agents. I had in mind something similar to https://www.llamaindex.ai/blog/making-coding-agents-safe-using-llamaindex and worked on a draft locally but it doesn't cover run_bash_command so I'll pause that work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants