Agentix

The universal bridge between agents and environments.

Evaluate agents, run RL rollouts, and collect rollout data across any agent and any sandbox — one API, no bespoke microservice per pairing.

Docs · Quickstart · Cookbook · Roadmap

Any agent Claude Code · Codex · Aider · OpenHands · your own Expose as `async def run(...) -> Result`.	Any environment SWE-bench images · custom Docker · Daytona · E2B · your own backend Pick a sandbox — or bring your own.
⇣ bridged by ⇣ await sandbox.remote(fn, args, *kwargs)

Two ideas

Agentix is small on purpose. The whole framework is two operations:

	You write	You get
Bundle	`agentix build [path]`	A deploy-ready image with your code and its dependencies
Remote call	`await sandbox.remote(fn, ...)`	The return value of `fn`, executed inside the sandbox

fn is any importable Python callable — an agent, a shell helper, a scorer, or a whole multi-step rollout. Args travel in, the typed return value comes back out. There is no fixed RPC surface to conform to and no base class for your code to inherit.

from app import run

result = await sandbox.remote(run, input="hello")

Side traffic rides along automatically: stdlib logging from inside the sandbox replays into your host logs, and OTel-shaped /trace spans capture every step — ready for eval dashboards and RL buffers.

Quickstart

pip install agentixx agentix-runtime-basic agentix-provider-docker

Build a bundle once (takes a few minutes), then every remote call is seconds. From examples/hello-world:

cd examples/hello-world
uv sync
uv run agentix build . --output dist/hello-world.bundle.tar
BUNDLE=$(uv run agentix deploy docker dist/hello-world.bundle.tar --format json | jq -r .bundle)
uv run python main.py --bundle "$BUNDLE"

The host code is just provider → session → remote call:

from agentix.bash import run
from agentix.provider.base import SandboxConfig
from agentix.provider.docker import DockerProvider

config = SandboxConfig(image="python:3.13-slim", bundle=BUNDLE)

async with DockerProvider().session(config) as sandbox:
    result = await sandbox.remote(run, command="echo hello from $(uname -a)")

Build a cross-arch bundle by passing --platform linux/amd64 to both agentix build and agentix deploy. Full walkthrough: quickstart.

What you can call

The point of one call surface is that an eval or RL loop wires together out of the same primitive — the agent, the environment setup, and the scorer are all just functions you remote-call:

You have	You expose	You call
An agent (Claude Code, Codex, OpenHands, …)	`async def run(...) -> RunResult`	`await sandbox.remote(run, ...)`
Shell, files, repo setup	`async def run(command: str) -> BashResult`	`await sandbox.remote(bash_run, ...)`
A benchmark or reward model	`async def score(...) -> Score`	`await sandbox.remote(score, ...)`

examples/run-swe-rollouts is the full loop end to end: sandbox agent run → patch extraction → SWE-bench harness score → one rollout log per instance.

How it compares

vs. sandbox runners (swe-rex, E2B, Daytona, Harbor). A runner hands you a box and a fixed way to reach into it — a predefined RPC surface, or "run a shell / docker exec command" plus a vendor SDK. Anything richer means squeezing your logic through that narrow hole. Agentix inverts it: the bundle installs your real Python, and sandbox.remote(fn, ...) calls any importable function and returns its typed value. A backend decides where the box runs; Agentix decides what you can call inside it — so you layer it on top of Docker, E2B, or Daytona.

	swe-rex · E2B · Daytona · Harbor	Agentix
Reach into the sandbox	Fixed RPC surface, or shell / `docker exec` + vendor SDK	`await sandbox.remote(fn, ...)` — any importable function
Sandbox logs & stdout	Scrape command output	stdlib `logging` auto-bridged to the host over `/log`
Observability	Bring your own	`/trace` spans (OTel-shaped) for every step
Model under test	Whatever the agent's SDK speaks	`abridge` translates Claude ⇄ OpenAI ⇄ Gemini — any agent on any model

vs. rollout-as-a-service (ProRL-Agent-Server). ProRL popularized an HTTP server with task-specific handlers and token trajectories for RL trainers. Agentix shares the decoupling — training stays separate from rollout execution — with a lighter surface.

	ProRL-Agent-Server	Agentix
Add a new task	Implement a handler, register it	Write a function, install it
Call a rollout	HTTP request to the service	`await sandbox.remote(fn, ...)`
Trajectories	Token-in / token-out over the service API	Captured by `abridge` as rollout logs
Sweet spot	HPC-scale multi-turn RL fleets	Teams wiring eval + RL data without a platform team

Both designs are powerful at HPC scale. Agentix targets the much larger set of research and product teams that want await remote(fn) with fewer moving parts.

What you get

One API for everything. Agent, tool, or scorer — the same await sandbox.remote(fn, ...).
Bundles from a normal Python project. agentix build reads pyproject.toml; an optional default.nix adds system binaries.
Backends you choose. Local Docker/Podman, Daytona, E2B, Apptainer, or your own SandboxProvider.
Sandbox logs on the host. print and stdlib logging from any remote call replay into your host logging tree over /log — no scraping command output.
Tracing built in. OTel-shaped /trace spans for every step, the same across agents and environments; ship them anywhere with agentix-trace-otel.
Any model behind any agent. abridge translates between Claude, OpenAI, and Gemini, so an agent that speaks one provider can be evaluated against any model — and the host captures the trajectory (token-in / token-out) for RL.

Ecosystem

One monorepo, separate PyPI packages. The core is agentixx; everything else is an optional plugin under plugins/.

Package	Role
`agentix-runtime-basic`	`agentix.bash`, file ops, sandbox primitives
`agentix-provider-docker` · `-daytona` · `-e2b` · `-apptainer`	Sandbox backends
`agentix-runner`	`run_rollouts(...)` — batch eval/rollout orchestration
`agentix-dataset-swe`	SWE-bench task images + official-harness scoring
`agentix-agent-claude-code` · `-mini-swe-agent` · `-qwen-code`	Agent adapters
`agentix-bridge`	Model translation + rollout → RL buffer capture (abridge)
`agentix-trace-otel`	Export `/trace` spans to any OTLP backend

Drop a directory under plugins/ and it becomes a workspace member; uv sync --all-packages installs it editable.

Development

git clone https://github.com/Agentiix/Agentix
cd Agentix
uv sync --all-packages --all-extras
uv run pytest
uv run ruff check agentix/ tests/

This repo is a uv workspace — core, plugins, and examples share one lockfile, so editing any member is live in the shared venv with no publish cycle. See ARCHITECTURE.md for how bundles and remote calls work under the hood.

Links

_{MIT licensed · built on uv workspaces}

Name		Name	Last commit message	Last commit date
Latest commit History 363 Commits
.claude/skills/agentix-ray-build		.claude/skills/agentix-ray-build
.github/workflows		.github/workflows
agentix		agentix
docs		docs
examples		examples
plugins		plugins
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Agentix

The universal bridge between agents and environments.

Any agent

Any environment

Two ideas

Quickstart

What you can call

How it compares

What you get

Ecosystem

Development

Links

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Agentix

The universal bridge between agents and environments.

Any agent

Any environment

Two ideas

Quickstart

What you can call

How it compares

What you get

Ecosystem

Development

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages