Skip to content

Agentix-Project/Agentix

Repository files navigation

Agentix

The universal bridge between agents and environments.

Evaluate agents, run RL rollouts, and collect rollout data across any agent and any sandbox — one API, no bespoke microservice per pairing.

GitHub Stars Python 3.11+ Docs License

Docs · Quickstart · Cookbook · Roadmap


Any agent

Claude Code · Codex · Aider · OpenHands · your own
Expose as async def run(...) -> Result.

Any environment

SWE-bench images · custom Docker · Daytona · E2B · your own backend
Pick a sandbox — or bring your own.

⇣   bridged by   ⇣

await sandbox.remote(fn, *args, **kwargs)

Two ideas

Agentix is small on purpose. The whole framework is two operations:

You write You get
Bundle agentix build [path] A deploy-ready image with your code and its dependencies
Remote call await sandbox.remote(fn, ...) The return value of fn, executed inside the sandbox

fn is any importable Python callable — an agent, a shell helper, a scorer, or a whole multi-step rollout. Args travel in, the typed return value comes back out. There is no fixed RPC surface to conform to and no base class for your code to inherit.

from app import run

result = await sandbox.remote(run, input="hello")

Side traffic rides along automatically: stdlib logging from inside the sandbox replays into your host logs, and OTel-shaped /trace spans capture every step — ready for eval dashboards and RL buffers.

Quickstart

pip install agentixx agentix-runtime-basic agentix-provider-docker

Build a bundle once (takes a few minutes), then every remote call is seconds. From examples/hello-world:

cd examples/hello-world
uv sync
uv run agentix build . --output dist/hello-world.bundle.tar
BUNDLE=$(uv run agentix deploy docker dist/hello-world.bundle.tar --format json | jq -r .bundle)
uv run python main.py --bundle "$BUNDLE"

The host code is just provider → session → remote call:

from agentix.bash import run
from agentix.provider.base import SandboxConfig
from agentix.provider.docker import DockerProvider

config = SandboxConfig(image="python:3.13-slim", bundle=BUNDLE)

async with DockerProvider().session(config) as sandbox:
    result = await sandbox.remote(run, command="echo hello from $(uname -a)")

Build a cross-arch bundle by passing --platform linux/amd64 to both agentix build and agentix deploy. Full walkthrough: quickstart.

What you can call

The point of one call surface is that an eval or RL loop wires together out of the same primitive — the agent, the environment setup, and the scorer are all just functions you remote-call:

You have You expose You call
An agent (Claude Code, Codex, OpenHands, …) async def run(...) -> RunResult await sandbox.remote(run, ...)
Shell, files, repo setup async def run(command: str) -> BashResult await sandbox.remote(bash_run, ...)
A benchmark or reward model async def score(...) -> Score await sandbox.remote(score, ...)

examples/run-swe-rollouts is the full loop end to end: sandbox agent run → patch extraction → SWE-bench harness score → one rollout log per instance.

How it compares

vs. sandbox runners (swe-rex, E2B, Daytona, Harbor). A runner hands you a box and a fixed way to reach into it — a predefined RPC surface, or "run a shell / docker exec command" plus a vendor SDK. Anything richer means squeezing your logic through that narrow hole. Agentix inverts it: the bundle installs your real Python, and sandbox.remote(fn, ...) calls any importable function and returns its typed value. A backend decides where the box runs; Agentix decides what you can call inside it — so you layer it on top of Docker, E2B, or Daytona.

swe-rex · E2B · Daytona · Harbor Agentix
Reach into the sandbox Fixed RPC surface, or shell / docker exec + vendor SDK await sandbox.remote(fn, ...) — any importable function
Sandbox logs & stdout Scrape command output stdlib logging auto-bridged to the host over /log
Observability Bring your own /trace spans (OTel-shaped) for every step
Model under test Whatever the agent's SDK speaks abridge translates Claude ⇄ OpenAI ⇄ Gemini — any agent on any model

vs. rollout-as-a-service (ProRL-Agent-Server). ProRL popularized an HTTP server with task-specific handlers and token trajectories for RL trainers. Agentix shares the decoupling — training stays separate from rollout execution — with a lighter surface.

ProRL-Agent-Server Agentix
Add a new task Implement a handler, register it Write a function, install it
Call a rollout HTTP request to the service await sandbox.remote(fn, ...)
Trajectories Token-in / token-out over the service API Captured by abridge as rollout logs
Sweet spot HPC-scale multi-turn RL fleets Teams wiring eval + RL data without a platform team

Both designs are powerful at HPC scale. Agentix targets the much larger set of research and product teams that want await remote(fn) with fewer moving parts.

What you get

  • One API for everything. Agent, tool, or scorer — the same await sandbox.remote(fn, ...).
  • Bundles from a normal Python project. agentix build reads pyproject.toml; an optional default.nix adds system binaries.
  • Backends you choose. Local Docker/Podman, Daytona, E2B, Apptainer, or your own SandboxProvider.
  • Sandbox logs on the host. print and stdlib logging from any remote call replay into your host logging tree over /log — no scraping command output.
  • Tracing built in. OTel-shaped /trace spans for every step, the same across agents and environments; ship them anywhere with agentix-trace-otel.
  • Any model behind any agent. abridge translates between Claude, OpenAI, and Gemini, so an agent that speaks one provider can be evaluated against any model — and the host captures the trajectory (token-in / token-out) for RL.

Ecosystem

One monorepo, separate PyPI packages. The core is agentixx; everything else is an optional plugin under plugins/.

Package Role
agentix-runtime-basic agentix.bash, file ops, sandbox primitives
agentix-provider-docker · -daytona · -e2b · -apptainer Sandbox backends
agentix-runner run_rollouts(...) — batch eval/rollout orchestration
agentix-dataset-swe SWE-bench task images + official-harness scoring
agentix-agent-claude-code · -mini-swe-agent · -qwen-code Agent adapters
agentix-bridge Model translation + rollout → RL buffer capture (abridge)
agentix-trace-otel Export /trace spans to any OTLP backend

Drop a directory under plugins/ and it becomes a workspace member; uv sync --all-packages installs it editable.

Development

git clone https://github.com/Agentiix/Agentix
cd Agentix
uv sync --all-packages --all-extras
uv run pytest
uv run ruff check agentix/ tests/

This repo is a uv workspace — core, plugins, and examples share one lockfile, so editing any member is live in the shared venv with no publish cycle. See ARCHITECTURE.md for how bundles and remote calls work under the hood.

Links

MIT licensed · built on uv workspaces

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors