RoboSandbox

A sim-first sandbox for robot manipulation. Bring your own arm, objects, and tasks.

RoboSandbox is a small manipulation sandbox built around MuJoCo. You can load a robot from a URDF or MJCF, spawn a few objects, define a task, run a planner or policy, and record the result. The point is to make the stack small enough to inspect and easy enough to modify.

Try It

git clone https://github.com/amarrmb/robosandbox.git
cd robosandbox
uv sync
uv pip install -e 'packages/robosandbox-core[viewer]'

uv run robo-sandbox viewer
# → open http://localhost:8000
# → pick a task, type "pick up the red cube", click Run
# → frames stream to the browser; hit Record to save for training

No API key, no model download. The stub planner handles a small but useful grammar: pick the <obj>, pick the <obj> and put it on <obj2>, stack <obj> on <obj2>, push the <obj> <dir>, go home.

Why RoboSandbox Exists

A lot of robotics tooling is either very low-level or very heavy. If you are new, that means a steep learning curve before you can make anything move. If you are experienced, that often means too much setup just to test one idea.

RoboSandbox sits in the middle. It is a small manipulation sandbox for learning, prototyping, and integration work. You can run it, read it, and modify it without committing to a heavyweight simulation workflow.

This project is intentionally a starting point, not an end state. The goal is not to replace MuJoCo, Isaac Sim, LeRobot, or your team's internal stack. The goal is to help you get oriented, get something working, and make the seams visible before you invest in a larger system.

If you start with RoboSandbox and later move to MuJoCo, Isaac Sim, LeRobot training workflows, or real hardware, that is success, not failure.

Who it helps

If you are new to robotics — use it to learn how a manipulation stack fits together. Start with a working example, then trace the path from task text to skills to motion to recorded artifacts without getting buried in framework complexity.

If you already do robotics but not simulation — use it as a fast prototyping environment. A lightweight place to test a robot, task, recorder, or policy integration without first committing to a heavyweight simulator workflow.

If you already use simulation tools — use it as a small integration harness. A good place to isolate interface questions, build minimal reproductions, and validate a seam before moving the idea into MuJoCo, Isaac Sim, or your internal stack.

When to use it

Use RoboSandbox when you want to learn how a manipulation stack works end to end, prototype a new robot/task/policy, test recording-export- replay workflows, debug interface contracts, or build a minimal reproducible manipulation demo.

You will probably want a heavier stack when you need photorealistic rendering, richer sensor simulation, large scenes or multi-robot setups, industrial-scale simulation workflows, or production-grade deployment infrastructure.

How It Works

user: "pick up the red cube and put it on the green cube"
       │
       ▼
 planner ─► [pick(red_cube), place_on(green_cube)]
       │
       ▼
 perception (VLM or ground truth) locates both in 3D
       │
       ▼
 motion (DLS Jacobian IK + Cartesian interpolation) executes
       │
       ▼
 recorder writes runs/<id>/video.mp4 + events.jsonl

Deeper dives live under docs/site/:

How it works in 3 minutes — the four-layer architecture
Running the agent — CLI entry points, recorded artifacts, provider switch
VLM tool-calling — how text becomes SkillCalls
Reachability pre-flight — catch bad object placements before physics runs
Replan loop — ReAct recovery when skills fail

Make It Yours

Providers

robo-sandbox run takes a --vlm-provider flag. Pick one:

Provider	Command	Setup
`stub` (default)	`uv run robo-sandbox run "pick up the red cube"`	none — regex-based planner
`ollama`	`uv run robo-sandbox run --vlm-provider ollama "pick up the blue cube and put it on the green cube"`	`ollama pull llama3.2-vision && ollama serve &`
`openai`	`uv run robo-sandbox run --vlm-provider openai "stack all three cubes by colour — red on green on blue"`	`export OPENAI_API_KEY=sk-...`
`custom`	`uv run robo-sandbox run --vlm-provider custom --base-url https://... ...`	any OpenAI-compatible endpoint (together.ai, vLLM, ...)

Override the model with --model (defaults: llama3.2-vision for ollama, gpt-4o-mini for openai). For richer reasoning on open-ended tasks, try --model gpt-4o.

System prerequisites

Requires Python 3.10–3.13. MuJoCo 3.2+ comes in as a dependency; no GPU needed.

macOS (Apple Silicon or Intel): works out of the box — no GL configuration needed.

Linux (Ubuntu 22.04 / 24.04): CI-tested platform. Headless GL is required for rendering:

sudo apt-get install -y libosmesa6 libosmesa6-dev libgl1-mesa-dri
export MUJOCO_GL=osmesa    # or `egl` if a GPU is available

Windows: not directly supported. WSL2 running Ubuntu 22.04 works; follow the Linux path inside WSL.

Bring your own…

Bring your own robot — URDF + sidecar YAML
Bring your own object — YCB meshes + BYO OBJ
Bring your own task — author a task YAML, randomize, score
Add a skill — extend the agent's vocabulary

Extras

Each extra is two lines: install the optional dependency, then run the command.

Benchmark

uv run robo-sandbox-bench                           # run all default tasks
uv run robo-sandbox-bench --seeds 50                # randomize and aggregate
uv run robo-sandbox-bench --vlm-provider ollama     # use a real VLM

Tasks with a randomize: block get per-seed perturbations. Seed 0 is the deterministic baseline; seeds ≥ 1 apply uniform jitter keyed on the seed. With multiple seeds the summary reports mean ± stderr. Results append to benchmark_results.json locally for regression tracking (the file is gitignored).

Eight default tasks ship under packages/robosandbox-core/src/robosandbox/tasks/definitions/ (plus one experimental):

Task	What it exercises
`home`	Skill dispatch with no spatial reasoning
`pick_cube`	Single-object pick (core reliability)
`pick_cube_franka`	URDF-import path — bundled Franka picks a cube
`pick_cube_scrambled`	Pick under per-seed pose/size/mass/rgba randomization
`pick_from_three`	Perception disambiguation by colour name
`pick_ycb_mug`	Mesh-import path — bundled YCB mug picked by Franka
`pour_can_into_bowl`	Long-horizon composite (pick → pour)
`push_forward`	Non-pick manipulation, verifies directional displacement
`open_drawer`	First articulated primitive — drawer + `OpenDrawer` skill

_experimental_stack_two is excluded from default runs because stacking is still open work.

Browser live viewer

uv pip install -e 'packages/robosandbox-core[viewer]'
uv run robo-sandbox viewer
# → open http://localhost:8000

Pick a task, click Run. Events log to the sidebar; frames stream at ~15–50 fps depending on how fast the sim is stepping. Pass --task pick_cube_franka to preload a specific scene, --host 0.0.0.0 to expose it on your LAN.

Documentation preview

uv pip install -e 'packages/robosandbox-core[docs]'
uv run mkdocs serve -f docs/site/mkdocs.yml           # live preview
uv run mkdocs build --strict -f docs/site/mkdocs.yml  # one-shot build

If you're reading this on GitHub, start at docs/site/docs/index.md.

Bring-your-own meshes

The sandbox decomposes user OBJ/STL files with CoACD and caches the hulls at ~/.cache/robosandbox/mesh_hulls/:

uv pip install -e 'packages/robosandbox-core[meshes]'    # pulls in coacd

SceneObject(
    id="widget",
    kind="mesh",
    mesh_path=Path("/abs/path/to/widget.obj"),
    collision="coacd",                # or "hull" (skip decomp if mesh is already convex)
    pose=Pose(xyz=(0.4, 0.0, 0.05)),
    mass=0.1,
)

collision="hull" is a cheap fallback for already-convex meshes — no CoACD install required, but the sandbox does not compute a hull for you; it trusts the mesh is convex. For concave objects, always use collision="coacd".

Pre-decompose once for a bundled asset with the authoring tool:

uv run python scripts/decompose_mesh.py \
  --input /path/to/drill.obj \
  --out-dir assets/objects/custom/drill \
  --name drill --mass 0.3 --center-bottom

Bundled Assets

Robots

packages/robosandbox-core/src/robosandbox/assets/robots/franka_panda/ ships a trimmed copy of Franka Emika Panda adapted from mujoco_menagerie under Apache 2.0. Visual meshes removed (collision-only, ~160 KB); the tendon-driven gripper actuator was replaced with a simple position actuator on finger_joint1 so the standard RobotSpec interface (open_qpos / closed_qpos) applies directly. See LICENSE in that directory for menagerie's attribution.

To bring your own robot:

Scene(
    robot_urdf=Path("/path/to/ur5.urdf"),     # .urdf or .xml
    robot_config=Path("/path/to/ur5.robosandbox.yaml"),  # optional — sibling auto-discovered
    objects=(...),
)

The sidecar YAML tells RoboSandbox which joint is the primary finger, where the end-effector TCP sits, the home pose, and gripper open/closed qpos. See packages/robosandbox-core/src/robosandbox/assets/robots/franka_panda/panda.robosandbox.yaml for the schema.

Objects

packages/robosandbox-core/src/robosandbox/assets/objects/ycb/ ships 10 pre-decomposed YCB benchmark objects: a visual OBJ + N CoACD convex hulls + per-object sidecar YAML each.

YCB id	Description	Mass (kg)
`003_cracker_box`	cracker box	0.411
`005_tomato_soup_can`	tomato soup can	0.349
`006_mustard_bottle`	mustard bottle	0.603
`011_banana`	banana	0.066
`013_apple`	apple	0.068
`024_bowl`	bowl (hollow; 11 hulls)	0.147
`025_mug`	mug (handled; 15 hulls)	0.118
`035_power_drill`	power drill	0.895
`042_adjustable_wrench`	adjustable wrench	0.252
`055_baseball`	baseball	0.148

Drop any of them into a task with the @ycb: shorthand:

objects:
  - id: box_1
    kind: mesh
    mesh: "@ycb:003_cracker_box"
    pose: {xyz: [0.4, 0.0, 0.08]}
  - id: soup
    kind: mesh
    mesh: "@ycb:005_tomato_soup_can"
    pose: {xyz: [0.4, 0.15, 0.06]}

Or discover the bundled catalog from Python:

from robosandbox.tasks.loader import list_builtin_ycb_objects
list_builtin_ycb_objects()
# ['003_cracker_box', '005_tomato_soup_can', ..., '055_baseball']

See packages/robosandbox-core/src/robosandbox/assets/objects/ycb/LICENSE for the YCB project's terms.

Architecture

The codebase is deliberately small. Most extension points are plain Protocols, so the seams are easy to find and reason about.

packages/robosandbox-core/
├── src/robosandbox/
│   ├── types.py          Pose, Scene, Observation, Grasp, SkillResult
│   ├── protocols.py      SimBackend, Perception, GraspPlanner,
│   │                     MotionPlanner, RecordSink, VLMClient, Skill
│   ├── sim/              MuJoCo backend (built-in 6-DOF arm + URDF robots)
│   ├── scene/            MJCF builder + URDF/mesh loaders — spawns any Scene into MuJoCo
│   ├── perception/       ground_truth (sim cheat), vlm_pointer (VLM)
│   ├── grasp/            analytic top-down (v0.1)
│   ├── motion/           DLS Jacobian IK + Cartesian interpolation
│   ├── skills/           Pick, PlaceOn, Push, Home, Pour, Tap,
│   │                     OpenDrawer, CloseDrawer, Stack
│   ├── agent/            Planner protocol, VLMPlanner, StubPlanner,
│   │                     ReAct-style Agent with replan loop
│   ├── policy/           Policy protocol + LeRobotPolicyAdapter
│   ├── vlm/              OpenAI-compatible client + JSON recovery
│   ├── recorder/         MP4 + JSONL per episode; `export-lerobot` CLI
│   ├── backends/         RealRobotBackend (sim-to-real Protocol stub)
│   ├── tasks/            Task loader + benchmark runner
│   ├── cli.py            `robo-sandbox` entry point
│   ├── demo.py           Scripted pick (no VLM, no API)
│   └── agentic_demo.py   Full agent loop
└── tests/                Test suite covering types, IK, skills, agent,
                          planner, JSON recovery, VLM pointer projection,
                          URDF import, mesh import, policy adapter,
                          real-backend contract, reachability pre-flight.

Agent loop

IDLE → PLAN → EXECUTE (one skill at a time) → EVALUATE →
                   │ success                      │ failure
                   ▼                              ▼
                 next in plan                   REPLAN ─► (max N times)
                   │                              │
                   ▼                              ▼
                 DONE                           FAILED

One important seam is the planner:

class Planner(Protocol):
    def plan(
        self,
        task: str,
        obs: Observation,
        prior_attempts: list[dict],
    ) -> tuple[list[SkillCall], int]:
        """Returns (plan, n_model_calls). Empty plan == 'already done'."""

VLMPlanner talks to an OpenAI-compatible endpoint with tool-calling and image input. StubPlanner is a regex parser.

Skills as tools

Each skill exposes name, description, and a JSON parameters_schema. VLMPlanner turns that into tool definitions; the model's tool calls become skill dispatches. If you want to add a skill, register it at the robosandbox.skills entry point.

Status

This is still an early project, but the core shape is there. Most moving parts are narrow Protocols, so swapping in a different robot, object set, planner, recorder, or policy is a small integration job instead of a rewrite. The current stack is solid on pick/push/pour/ drawer-style tasks. Stacking is still rougher than the rest and remains open work.

The roadmap is the best place to see what already ships and what is still deferred. The short version: better stacking, collision-aware planning, a cleaner real-policy path, and a concrete SO-101 hardware backend are the main next steps.

Development

uv sync --extra dev --extra viewer --extra meshes

uv run ruff check packages/
uv run pytest packages/robosandbox-core/tests/ -q
uv run robo-sandbox-bench --tasks pick_cube pick_cube_franka home pick_ycb_mug

These are the exact commands CI runs on every PR (see .github/workflows/ci.yml).

License

Core: Apache 2.0.

Optional contrib/ plugins carry their own licenses — research- licensed grasp predictors etc. live there; they are opt-in installs and not pulled in by the base source install from packages/robosandbox-core.

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
.github/workflows		.github/workflows
docs/site		docs/site
examples		examples
packages/robosandbox-core		packages/robosandbox-core
scripts		scripts
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RoboSandbox

Try It

Why RoboSandbox Exists

Who it helps

When to use it

How It Works

Make It Yours

Providers

System prerequisites

Bring your own…

Extras

Benchmark

Browser live viewer

Documentation preview

Bring-your-own meshes

Bundled Assets

Robots

Objects

Architecture

Agent loop

Skills as tools

Status

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RoboSandbox

Try It

Why RoboSandbox Exists

Who it helps

When to use it

How It Works

Make It Yours

Providers

System prerequisites

Bring your own…

Extras

Benchmark

Browser live viewer

Documentation preview

Bring-your-own meshes

Bundled Assets

Robots

Objects

Architecture

Agent loop

Skills as tools

Status

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages