Skip to content

amarrmb/robosandbox

Repository files navigation

RoboSandbox

A sim-first sandbox for robot manipulation. Bring your own arm, objects, and tasks.

Franka picks a red cube from a natural-language command

RoboSandbox is a small manipulation sandbox built around MuJoCo. You can load a robot from a URDF or MJCF, spawn a few objects, define a task, run a planner or policy, and record the result. The point is to make the stack small enough to inspect and easy enough to modify.

Try It

git clone https://github.com/amarrmb/robosandbox.git
cd robosandbox
uv sync
uv pip install -e 'packages/robosandbox-core[viewer]'

uv run robo-sandbox viewer
# → open http://localhost:8000
# → pick a task, type "pick up the red cube", click Run
# → frames stream to the browser; hit Record to save for training

No API key, no model download. The stub planner handles a small but useful grammar: pick the <obj>, pick the <obj> and put it on <obj2>, stack <obj> on <obj2>, push the <obj> <dir>, go home.

Why RoboSandbox Exists

A lot of robotics tooling is either very low-level or very heavy. If you are new, that means a steep learning curve before you can make anything move. If you are experienced, that often means too much setup just to test one idea.

RoboSandbox sits in the middle. It is a small manipulation sandbox for learning, prototyping, and integration work. You can run it, read it, and modify it without committing to a heavyweight simulation workflow.

This project is intentionally a starting point, not an end state. The goal is not to replace MuJoCo, Isaac Sim, LeRobot, or your team's internal stack. The goal is to help you get oriented, get something working, and make the seams visible before you invest in a larger system.

If you start with RoboSandbox and later move to MuJoCo, Isaac Sim, LeRobot training workflows, or real hardware, that is success, not failure.

Who it helps

If you are new to robotics — use it to learn how a manipulation stack fits together. Start with a working example, then trace the path from task text to skills to motion to recorded artifacts without getting buried in framework complexity.

If you already do robotics but not simulation — use it as a fast prototyping environment. A lightweight place to test a robot, task, recorder, or policy integration without first committing to a heavyweight simulator workflow.

If you already use simulation tools — use it as a small integration harness. A good place to isolate interface questions, build minimal reproductions, and validate a seam before moving the idea into MuJoCo, Isaac Sim, or your internal stack.

When to use it

Use RoboSandbox when you want to learn how a manipulation stack works end to end, prototype a new robot/task/policy, test recording-export- replay workflows, debug interface contracts, or build a minimal reproducible manipulation demo.

You will probably want a heavier stack when you need photorealistic rendering, richer sensor simulation, large scenes or multi-robot setups, industrial-scale simulation workflows, or production-grade deployment infrastructure.

How It Works

user: "pick up the red cube and put it on the green cube"
       │
       ▼
 planner ─► [pick(red_cube), place_on(green_cube)]
       │
       ▼
 perception (VLM or ground truth) locates both in 3D
       │
       ▼
 motion (DLS Jacobian IK + Cartesian interpolation) executes
       │
       ▼
 recorder writes runs/<id>/video.mp4 + events.jsonl

Deeper dives live under docs/site/:

Make It Yours

Providers

robo-sandbox run takes a --vlm-provider flag. Pick one:

Provider Command Setup
stub (default) uv run robo-sandbox run "pick up the red cube" none — regex-based planner
ollama uv run robo-sandbox run --vlm-provider ollama "pick up the blue cube and put it on the green cube" ollama pull llama3.2-vision && ollama serve &
openai uv run robo-sandbox run --vlm-provider openai "stack all three cubes by colour — red on green on blue" export OPENAI_API_KEY=sk-...
custom uv run robo-sandbox run --vlm-provider custom --base-url https://... ... any OpenAI-compatible endpoint (together.ai, vLLM, ...)

Override the model with --model (defaults: llama3.2-vision for ollama, gpt-4o-mini for openai). For richer reasoning on open-ended tasks, try --model gpt-4o.

System prerequisites

Requires Python 3.10–3.13. MuJoCo 3.2+ comes in as a dependency; no GPU needed.

macOS (Apple Silicon or Intel): works out of the box — no GL configuration needed.

Linux (Ubuntu 22.04 / 24.04): CI-tested platform. Headless GL is required for rendering:

sudo apt-get install -y libosmesa6 libosmesa6-dev libgl1-mesa-dri
export MUJOCO_GL=osmesa    # or `egl` if a GPU is available

Windows: not directly supported. WSL2 running Ubuntu 22.04 works; follow the Linux path inside WSL.

Bring your own…

Extras

Each extra is two lines: install the optional dependency, then run the command.

Benchmark

uv run robo-sandbox-bench                           # run all default tasks
uv run robo-sandbox-bench --seeds 50                # randomize and aggregate
uv run robo-sandbox-bench --vlm-provider ollama     # use a real VLM

Tasks with a randomize: block get per-seed perturbations. Seed 0 is the deterministic baseline; seeds ≥ 1 apply uniform jitter keyed on the seed. With multiple seeds the summary reports mean ± stderr. Results append to benchmark_results.json locally for regression tracking (the file is gitignored).

Eight default tasks ship under packages/robosandbox-core/src/robosandbox/tasks/definitions/ (plus one experimental):

Task What it exercises
home Skill dispatch with no spatial reasoning
pick_cube Single-object pick (core reliability)
pick_cube_franka URDF-import path — bundled Franka picks a cube
pick_cube_scrambled Pick under per-seed pose/size/mass/rgba randomization
pick_from_three Perception disambiguation by colour name
pick_ycb_mug Mesh-import path — bundled YCB mug picked by Franka
pour_can_into_bowl Long-horizon composite (pick → pour)
push_forward Non-pick manipulation, verifies directional displacement
open_drawer First articulated primitive — drawer + OpenDrawer skill

_experimental_stack_two is excluded from default runs because stacking is still open work.

Browser live viewer

uv pip install -e 'packages/robosandbox-core[viewer]'
uv run robo-sandbox viewer
# → open http://localhost:8000

Pick a task, click Run. Events log to the sidebar; frames stream at ~15–50 fps depending on how fast the sim is stepping. Pass --task pick_cube_franka to preload a specific scene, --host 0.0.0.0 to expose it on your LAN.

Documentation preview

uv pip install -e 'packages/robosandbox-core[docs]'
uv run mkdocs serve -f docs/site/mkdocs.yml           # live preview
uv run mkdocs build --strict -f docs/site/mkdocs.yml  # one-shot build

If you're reading this on GitHub, start at docs/site/docs/index.md.

Bring-your-own meshes

The sandbox decomposes user OBJ/STL files with CoACD and caches the hulls at ~/.cache/robosandbox/mesh_hulls/:

uv pip install -e 'packages/robosandbox-core[meshes]'    # pulls in coacd
SceneObject(
    id="widget",
    kind="mesh",
    mesh_path=Path("/abs/path/to/widget.obj"),
    collision="coacd",                # or "hull" (skip decomp if mesh is already convex)
    pose=Pose(xyz=(0.4, 0.0, 0.05)),
    mass=0.1,
)

collision="hull" is a cheap fallback for already-convex meshes — no CoACD install required, but the sandbox does not compute a hull for you; it trusts the mesh is convex. For concave objects, always use collision="coacd".

Pre-decompose once for a bundled asset with the authoring tool:

uv run python scripts/decompose_mesh.py \
  --input /path/to/drill.obj \
  --out-dir assets/objects/custom/drill \
  --name drill --mass 0.3 --center-bottom

Bundled Assets

Robots

packages/robosandbox-core/src/robosandbox/assets/robots/franka_panda/ ships a trimmed copy of Franka Emika Panda adapted from mujoco_menagerie under Apache 2.0. Visual meshes removed (collision-only, ~160 KB); the tendon-driven gripper actuator was replaced with a simple position actuator on finger_joint1 so the standard RobotSpec interface (open_qpos / closed_qpos) applies directly. See LICENSE in that directory for menagerie's attribution.

To bring your own robot:

Scene(
    robot_urdf=Path("/path/to/ur5.urdf"),     # .urdf or .xml
    robot_config=Path("/path/to/ur5.robosandbox.yaml"),  # optional — sibling auto-discovered
    objects=(...),
)

The sidecar YAML tells RoboSandbox which joint is the primary finger, where the end-effector TCP sits, the home pose, and gripper open/closed qpos. See packages/robosandbox-core/src/robosandbox/assets/robots/franka_panda/panda.robosandbox.yaml for the schema.

Objects

packages/robosandbox-core/src/robosandbox/assets/objects/ycb/ ships 10 pre-decomposed YCB benchmark objects: a visual OBJ + N CoACD convex hulls + per-object sidecar YAML each.

YCB id Description Mass (kg)
003_cracker_box cracker box 0.411
005_tomato_soup_can tomato soup can 0.349
006_mustard_bottle mustard bottle 0.603
011_banana banana 0.066
013_apple apple 0.068
024_bowl bowl (hollow; 11 hulls) 0.147
025_mug mug (handled; 15 hulls) 0.118
035_power_drill power drill 0.895
042_adjustable_wrench adjustable wrench 0.252
055_baseball baseball 0.148

Drop any of them into a task with the @ycb: shorthand:

objects:
  - id: box_1
    kind: mesh
    mesh: "@ycb:003_cracker_box"
    pose: {xyz: [0.4, 0.0, 0.08]}
  - id: soup
    kind: mesh
    mesh: "@ycb:005_tomato_soup_can"
    pose: {xyz: [0.4, 0.15, 0.06]}

Or discover the bundled catalog from Python:

from robosandbox.tasks.loader import list_builtin_ycb_objects
list_builtin_ycb_objects()
# ['003_cracker_box', '005_tomato_soup_can', ..., '055_baseball']

See packages/robosandbox-core/src/robosandbox/assets/objects/ycb/LICENSE for the YCB project's terms.

Architecture

The codebase is deliberately small. Most extension points are plain Protocols, so the seams are easy to find and reason about.

packages/robosandbox-core/
├── src/robosandbox/
│   ├── types.py          Pose, Scene, Observation, Grasp, SkillResult
│   ├── protocols.py      SimBackend, Perception, GraspPlanner,
│   │                     MotionPlanner, RecordSink, VLMClient, Skill
│   ├── sim/              MuJoCo backend (built-in 6-DOF arm + URDF robots)
│   ├── scene/            MJCF builder + URDF/mesh loaders — spawns any Scene into MuJoCo
│   ├── perception/       ground_truth (sim cheat), vlm_pointer (VLM)
│   ├── grasp/            analytic top-down (v0.1)
│   ├── motion/           DLS Jacobian IK + Cartesian interpolation
│   ├── skills/           Pick, PlaceOn, Push, Home, Pour, Tap,
│   │                     OpenDrawer, CloseDrawer, Stack
│   ├── agent/            Planner protocol, VLMPlanner, StubPlanner,
│   │                     ReAct-style Agent with replan loop
│   ├── policy/           Policy protocol + LeRobotPolicyAdapter
│   ├── vlm/              OpenAI-compatible client + JSON recovery
│   ├── recorder/         MP4 + JSONL per episode; `export-lerobot` CLI
│   ├── backends/         RealRobotBackend (sim-to-real Protocol stub)
│   ├── tasks/            Task loader + benchmark runner
│   ├── cli.py            `robo-sandbox` entry point
│   ├── demo.py           Scripted pick (no VLM, no API)
│   └── agentic_demo.py   Full agent loop
└── tests/                Test suite covering types, IK, skills, agent,
                          planner, JSON recovery, VLM pointer projection,
                          URDF import, mesh import, policy adapter,
                          real-backend contract, reachability pre-flight.

Agent loop

IDLE → PLAN → EXECUTE (one skill at a time) → EVALUATE →
                   │ success                      │ failure
                   ▼                              ▼
                 next in plan                   REPLAN ─► (max N times)
                   │                              │
                   ▼                              ▼
                 DONE                           FAILED

One important seam is the planner:

class Planner(Protocol):
    def plan(
        self,
        task: str,
        obs: Observation,
        prior_attempts: list[dict],
    ) -> tuple[list[SkillCall], int]:
        """Returns (plan, n_model_calls). Empty plan == 'already done'."""

VLMPlanner talks to an OpenAI-compatible endpoint with tool-calling and image input. StubPlanner is a regex parser.

Skills as tools

Each skill exposes name, description, and a JSON parameters_schema. VLMPlanner turns that into tool definitions; the model's tool calls become skill dispatches. If you want to add a skill, register it at the robosandbox.skills entry point.

Status

This is still an early project, but the core shape is there. Most moving parts are narrow Protocols, so swapping in a different robot, object set, planner, recorder, or policy is a small integration job instead of a rewrite. The current stack is solid on pick/push/pour/ drawer-style tasks. Stacking is still rougher than the rest and remains open work.

The roadmap is the best place to see what already ships and what is still deferred. The short version: better stacking, collision-aware planning, a cleaner real-policy path, and a concrete SO-101 hardware backend are the main next steps.

Development

uv sync --extra dev --extra viewer --extra meshes

uv run ruff check packages/
uv run pytest packages/robosandbox-core/tests/ -q
uv run robo-sandbox-bench --tasks pick_cube pick_cube_franka home pick_ycb_mug

These are the exact commands CI runs on every PR (see .github/workflows/ci.yml).

License

Core: Apache 2.0.

Optional contrib/ plugins carry their own licenses — research- licensed grasp predictors etc. live there; they are opt-in installs and not pulled in by the base source install from packages/robosandbox-core.

About

Sim-first agentic manipulation sandbox: any arm, any object, any command.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors