Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,10 @@ We provide fully functioning $\pi_{0.5}$ checkpoints trained with high success r

| Model Checkpoint | Description | Success Rate (%) |
|-------------------------------|---------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|
| [TensorAuto/Robocasa_navigatekitchen][12] | A $\pi_{0.5}$ model checkpoint trained on Navigate to Kitchen objects task on Robocasa. | 97% |
| [TensorAuto/Robocasa_Closeupdown][11] | A $\pi_{0.5}$ model checkpoint trained on Close Oven, Close Toaster and Close Dishwasher on Robocasa. | Close Oven : 90% <br> Close Toaster : 70% <br> Close Dishwasher : 90% |
| [TensorAuto/TensorAuto/robocasa_Closesideways][10]| A $\pi_{0.5}$ model checkpoint trained on Close Microwave, Close Cabinet and Close Fridge on Robocasa. | Close Microwave : 97% <br> Close Cabinet : 65% <br> Close Fridge : 80% |
| [TensorAuto/pi05_libero_continuous_state][9] | A $\pi_{0.5}$ model checkpoint trained on Libero dataset with continuous states (projecting raw proprioceptive states to models latent dimension). | 92% |
| [TensorAuto/moka_pot_libero_sft][6] <br> [TensorAuto/moka_pot_RECAP_R0][7] <br> [TensorAuto/moka_pot_RECAP_R1][8] | A $\pi_{0}$ RECAP model checkpoint trained on moka pot task on libero. | 83% <br> 89% <br> 90% |
| [TensorAuto/tPi0.5-libero][2] | A $\pi_{0.5}$ model checkpoint trained on the LIBERO dataset with discrete actions and knowledge insulation. | 98.4% (10) <br> 97.6% (Goal) <br> 100% (Object) <br> 98% (Spatial) |
| [TensorAuto/pi05_base][5] | A $\pi_{0.5}$ model checkpoint converted from the official openpi checkpoint, with language embeddings added. | N/A |
Expand All @@ -81,3 +85,7 @@ This project builds on the $\pi$ series of [papers][3] and many other open-sourc
[6]: https://huggingface.co/TensorAuto/moka_pot_libero_sft
[7]: https://huggingface.co/TensorAuto/moka_pot_RECAP_R0
[8]: https://huggingface.co/TensorAuto/moka_pot_RECAP_R1
[9]: https://huggingface.co/TensorAuto/pi05_libero_continuous_state
[10]: https://huggingface.co/TensorAuto/robocasa_Closesideways
[11]: https://huggingface.co/TensorAuto/Robocasa_Closeupdown
[12]: https://huggingface.co/TensorAuto/Robocasa_navigatekitchen
1 change: 1 addition & 0 deletions docs/source/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ This section provides step-by-step guides for common tasks in OpenTau, including
RL
tutorials/human_demo
tutorials/ros_conversion
tutorials/robocasa
162 changes: 162 additions & 0 deletions docs/source/tutorials/robocasa.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
.. _robocasa:

.. _robocasa_client_gist: https://gist.github.com/akshay18iitg/4d299c135c2d384ceb9a283b745baa01

RoboCasa setup and rollout client
=================================

This page explains how to set up **RoboCasa** (kitchen simulation) alongside **OpenTau**, run the **policy WebSocket server** that serves an OpenTau checkpoint, and run the **rollout client** against that server.

The rollout client code **is not shipped in the OpenTau repository**. Use the reference implementation in `robocasa_client_gist`_ (RoboCasa policy client: ``client`` and ``client_async``).

.. note::
Complete the base :doc:`/installation` steps first. RoboCasa itself is installed **outside** the OpenTau package. OpenTau provides the **policy server**; you run the **client** inside your RoboCasa install (files from the gist, or equivalent).

Overview
--------

The workflow is usually split across machines or terminals:

1. **OpenTau host** — runs the WebSocket policy server, loads ``policy.pretrained_path`` from a training config, and returns **action chunks** via MessagePack.
2. **RoboCasa host** — runs the kitchen sim, JPEG-encodes cameras, and talks to the server. Parallel rollouts use a threaded **async** client that **batches** observations for workers that need a new chunk.

**In this repo**

* ``opentau.scripts.robocasa.server`` — WebSocket server (single-observation or batched requests; replies are **action chunks** per request row).

**Outside this repo**

* ``robocasa.scripts.client`` / ``robocasa.scripts.client_async`` — reference rollout scripts from `robocasa_client_gist`_ (place them under your ``robocasa`` package tree or run them as you prefer).

Server dependencies (``websockets``, ``msgpack``) are in OpenTau’s ``pyproject.toml``. The server needs **OpenCV** (``cv2``) to decode JPEG camera inputs.


Prerequisites
-------------

**Hardware and OS**

* Linux with an NVIDIA GPU is recommended for both RoboCasa (MuJoCo) and OpenTau inference.
* Follow GPU guidance in :doc:`/installation`.

**Python**

* OpenTau targets **Python 3.10** (see ``requires-python`` in the repo root ``pyproject.toml``). Match or reconcile Python versions with your RoboCasa environment.

**RoboCasa simulation**

RoboCasa is not fully installed by ``pip install opentau``. Install the simulator and assets from upstream:

* `RoboCasa installation <https://robocasa.ai/docs/introduction/installation.html>`_

**OpenTau**

Install OpenTau as in :doc:`/installation` (e.g. ``uv sync`` or ``pip install -e .``).


Policy server (OpenTau)
-----------------------

The server listens on WebSocket and uses **MessagePack** for request and response bodies.

**Inference**

* Each successful call uses ``policy.sample_actions`` (not ``select_action``): the model predicts a **temporal chunk** of actions. The last dimension is trimmed or zero-padded to ``--robocasa_action_dim``.

**Requests**

* **Single observation:** top-level dict with ``images`` (JPEG bytes per camera name), ``state`` (list of floats), ``prompt`` (string).
* **Batch:** ``{ "batch": true, "items": [ { ... same fields ... }, ... ] }``.

**Responses**

* **Single:** one chunk as nested lists: ``[[float, ...], ...]`` — shape ``(T, action_dim)`` with ``T`` equal to the policy’s predicted horizon (e.g. ``n_action_steps``).
* **Batch:** ``[ chunk_0, chunk_1, ... ]`` — one chunk per ``items`` row, same order.

**Entry point**

.. code-block:: bash

python -m opentau.scripts.robocasa.server \
--config_path /path/to/train_config.json

**RoboCasa-specific flags** (must appear **before** normal OpenTau config flags; they are parsed first and stripped from ``sys.argv``):

.. list-table::
:header-rows: 1
:widths: 28 72

* - Flag
- Meaning
* - ``--robocasa_host``
- Bind address (default ``0.0.0.0``). Use ``127.0.0.1`` to listen only locally.
* - ``--robocasa_port``
- TCP port (default ``8765``).
* - ``--robocasa_action_dim``
- Flat action width for reply padding/trimming (default ``16``; align with RoboCasa env and training).
* - ``--robocasa_torch_compile``
- ``true`` / ``false`` — whether to compile ``sample_actions`` when supported (default ``true``).

**Example**

.. code-block:: bash

python -m opentau.scripts.robocasa.server \
--robocasa_host 0.0.0.0 \
--robocasa_port 8765 \
--robocasa_action_dim 16 \
--config_path /path/to/train_config.json

The training config must define ``policy.pretrained_path`` and settings compatible with your checkpoint.


Rollout client (RoboCasa environment)
-------------------------------------

Get the client sources from `robocasa_client_gist`_.

Typical layout after copying into a RoboCasa checkout:

* ``robocasa/scripts/client.py`` — single-env style client (if provided in the gist).
* ``robocasa/scripts/client_async.py`` — threaded client that **batches** observations for workers that need a **new action chunk**, sends one WebSocket message per batch, receives one chunk per batch row, then **steps the simulator for every action in each chunk** before querying the server again.

If your PandaOmron-style env expects actions in a particular layout, the gist may include a ``convert_action_pi05`` helper (or equivalent); wire it to match ``create_env`` / your task.

**Example (async / batched client)**

.. code-block:: bash

python -m robocasa.scripts.client_async ENV_NAME \
--host localhost \
--port 8765

Replace ``ENV_NAME`` with a registered RoboCasa kitchen task. Common options (see the gist for the exact CLI):

* ``--num-rollouts`` — total episodes.
* ``--num-parallel`` — parallel env threads (batch size is at most the count of workers requesting a chunk at once).
* ``--seed``, ``--split``, ``--output-dir``, ``--max-episode-steps``, ``--render``, ``--jpeg-quality``.

**Environment variables** (if supported by the gist client)

* ``ROBOCASA_POLICY_HOST`` — default host.
* ``ROBOCASA_POLICY_PORT`` — default port.


Protocol and outputs (summary)
------------------------------

* **Transport:** WebSocket binary frames, MessagePack.
* **Client → server (batch):** ``{ "batch": true, "items": [ { "images": {...}, "state": [...], "prompt": "..." }, ... ] }``.
* **Server → client (batch):** list of action chunks; each chunk is ``(T, action_dim)`` as nested lists.
* **Rollout output:** directory with ``rollouts.json`` and, when not rendering on screen, per-rollout MP4s per camera (behavior as implemented in the gist).

For server implementation details, see ``src/opentau/scripts/robocasa/server.py``. For client behavior and options, see `robocasa_client_gist`_.


Troubleshooting
---------------

* **Import errors for ``robocasa``** — Install RoboCasa per upstream docs; run the client from that environment.
* **Server JPEG decode errors** — Install OpenCV for Python on the server (``cv2``).
* **Port in use** — Change ``--robocasa_port`` / client ``--port``.
* **Action shape / chunk mismatch** — Align ``--robocasa_action_dim`` with training and env; ensure the client consumes **chunks** (multiple steps per server reply) if you use chunking inference.
13 changes: 13 additions & 0 deletions src/opentau/scripts/robocasa/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Copyright 2026 Tensor Auto Inc. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
Loading
Loading