Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
baa9786
perf: parallelize ffmpeg trims in segment_lerobot_dataset (#166)
bananaSnail Apr 22, 2026
cdfc9a8
Adding script to parse subtask.json and add response field to parquet…
akshay18iitg Apr 22, 2026
00b7054
feat: attach metadata and expose optional standard data format keys (…
shuheng-liu Apr 23, 2026
8cc47ad
perf(pi05): 2.9x LIBERO training throughput via DDP + fused AdamW (#176)
shuheng-liu Apr 24, 2026
2e22da9
feat: per-dataset validation loss reporting (#169)
shuheng-liu Apr 24, 2026
2009918
fix(train): avoid bf16 optimizer state under DDP (#181)
claude Apr 24, 2026
d259616
refactor: fp32 master weights wrapper for DDP path (#181)
claude Apr 24, 2026
467ec0a
fix(verify): live script reproduces the bug + DeepSpeed workaround
claude Apr 24, 2026
1a542af
fix(verify): unconditional assignment of train_micro_batch_size_per_gpu
claude Apr 24, 2026
d5fb2c3
fix(verify): route through accelerator.backward + walk to populated s…
claude Apr 24, 2026
50f5577
chore: remove one-time verification scripts after #181 lands
claude Apr 24, 2026
dd1154e
fix(master_weights): migrate fp32 masters to live device after prepare
shuheng-liu Apr 25, 2026
3be0f52
test(master_weights): add device-migration regressions for #181
claude Apr 25, 2026
8be2cd1
fix(master_weights): subclass Optimizer + rebind LR scheduler (#181)
claude Apr 25, 2026
2a42db2
feat(pi05): add sdpa attention backend behind attention_implementatio…
claude Apr 23, 2026
3f645b2
feat(pi05): gradient checkpointing behind gradient_checkpointing flag
claude Apr 23, 2026
0bfab17
fix(profile_step): wrap optimizer with MasterWeightOptimizer for fair…
shuheng-liu Apr 25, 2026
ed06b54
fix(master_weights): migrate fp32 masters to live device after prepare
shuheng-liu Apr 25, 2026
5bb7f30
fix(profile_step): rebind LR scheduler after MasterWeightOptimizer wrap
shuheng-liu Apr 25, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ OpenTau ($\tau$) is a tool developed by *[Tensor][1]* to bridge this gap, and we
| Visualize dataset with URDF models | ❌ | ❌ | ✅ |
| Simulation Environments for Evaluating Models | ❌ | ✅ | ✅ |
| Create Validation Splits During Training | ❌ | ❌ | ✅ |
| Drop-in Training Profiler & Unused-Param Auditor | ❌ | ❌ | ✅ |
| $\pi^{*}_{0.6}$ style Reinforcement Learning Pipeline | ❌ | ❌ | ✅ |
| Post-training on Human Data| ❌ | ❌ | ✅ |
| Framework | Jax / PyTorch | PyTorch | PyTorch |
Expand All @@ -59,6 +60,27 @@ For using local notebooks to train and evaluate models, find the notebooks at [n

For using the Google Colab notebooks to train and evaluate models, find the colab notebooks here: [pi05_training](https://colab.research.google.com/drive/1DeU0lNnEzs1KHo0Nkgh4YKBr-xu9moBM?usp=sharing) and [pi05_evaluation_only](https://colab.research.google.com/drive/1U_AyuH9WYMT4anEWvsOtIT7g01jA0WGm?usp=sharing) respectively.

## Training Diagnostics

OpenTau ships three drop-in scripts under `src/opentau/scripts/` to help you figure out where a training run is spending its time. Each reads the same `TrainPipelineConfig` as `opentau-train`, so they reproduce your exact model / dataset / batch size — no reconfiguration needed.

| Script | What it answers |
|---|---|
| [`profile_step.py`](https://github.com/TensorAuto/OpenTau/blob/main/src/opentau/scripts/profile_step.py) | Where does each training step's wall-clock go? (forward / backward / optimizer / sync phases, with mean / median / p95) |
| [`profile_dataloader.py`](https://github.com/TensorAuto/OpenTau/blob/main/src/opentau/scripts/profile_dataloader.py) | Is the dataloader keeping up with the GPUs? (pure input-pipeline ceiling, no model, no collective) |
| [`find_unused_params.py`](https://github.com/TensorAuto/OpenTau/blob/main/src/opentau/scripts/find_unused_params.py) | Are any parameters dead? (lists params DDP would refuse to sync with `find_unused_parameters=False`) |

A one-command example — see where your training time is going:

```bash
accelerate launch \
--config_file configs/examples/accelerate_ddp_config.yaml \
src/opentau/scripts/profile_step.py \
--config_path=<your_training_config.json>
```

Full tutorial with annotated example output and env-var knobs: [docs/tutorials/benchmarking](https://opentau.readthedocs.io/en/latest/tutorials/benchmarking.html). A worked example investigation that used these tools to find and fix a 2.9× throughput regression is tracked in [issue #177](https://github.com/TensorAuto/OpenTau/issues/177).

## Checkpoints
We provide fully functioning $\pi_{0.5}$ checkpoints trained with high success rates. We plan to release more models in the near future.

Expand Down
5 changes: 2 additions & 3 deletions configs/examples/accelerate_ddp_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,11 @@ debug: false
distributed_type: MULTI_GPU
downcast_bf16: 'no'
enable_cpu_affinity: false
gpu_ids: 0,1
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
mixed_precision: bf16
num_machines: 1
num_processes: 2
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
Expand Down
8 changes: 8 additions & 0 deletions configs/examples/add_subtask_response.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"datasets": [
{
"repo_id": "TensorAuto/ice-lemonade",
"root": "/path/to/local/dataset"
}
]
}
50 changes: 50 additions & 0 deletions configs/examples/attach_metadata_annotations.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
[
{
"episode_id": 0,
"quality": 3,
"segments": [
{
"start": 0,
"subtask": "approach the blue cup",
"success": false
},
{
"start": 50,
"subtask": "pick up the blue cup",
"success": true
},
{
"start": 120,
"subtask": "place the cup in the tray",
"success": true
}
]
},
{
"episode_id": 1,
"quality": 5,
"segments": [
{
"start": 0,
"subtask": "pick up the bottle",
"success": true
},
{
"start": 80,
"subtask": "place the bottle in the tray",
"success": true
}
]
},
{
"episode_id": 7,
"quality": 4,
"segments": [
{
"start": 0,
"subtask": "reset the workspace",
"success": true
}
]
}
]
11 changes: 2 additions & 9 deletions configs/libero/reproduce_pi05_libero_accelerate_config.yaml
Original file line number Diff line number Diff line change
@@ -1,20 +1,13 @@
compute_environment: LOCAL_MACHINE
debug: false
deepspeed_config:
gradient_accumulation_steps: 1
gradient_clipping: 10
offload_optimizer_device: none
offload_param_device: none
zero3_init_flag: false
zero_stage: 2
distributed_type: DEEPSPEED
distributed_type: MULTI_GPU
downcast_bf16: 'no'
enable_cpu_affinity: false
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 4
num_processes: 8
rdzv_backend: static
same_network: true
tpu_env: []
Expand Down
135 changes: 135 additions & 0 deletions docs/source/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,8 @@ Metadata is crucial for defining the structure and statistics of a dataset. Hand

Metadata is stored in JSON files (``info.json``, ``stats.json``) and JSONL files (``tasks.jsonl``) within the dataset directory.

.. _standard-data-format:

Standard Data Format
--------------------
To ensure compatibility across different datasets and policies, OpenTau introduces the **Standard Data Format**.
Expand Down Expand Up @@ -105,6 +107,139 @@ The following fields are set in ``DatasetMixtureConfig``:

Cameras should be labeled in order of importance (e.g. camera0 is the most important camera, camera1 is the second most important camera, etc.). The model dataset will select the most important cameras to use if num_cams is less than the number of cameras in the dataset.

.. _standard-data-format-optional-keys:

Optional Standard-Format Keys
-----------------------------

On top of the core fields above, ``__getitem__`` emits several *optional*
keys when the dataset has been enriched with segment metadata (see
:doc:`tutorials/attach_metadata`) or for the subgoal images sampled from
future video frames. Each optional key is **always present**. Numeric
and image keys pair with an ``{key}_is_pad`` boolean flag — zero-filled
+ flag True means "unavailable or masked". String keys
(``response``, ``memory``, ``next_memory``) don't get a separate flag:
the empty string ``""`` is itself the pad signal, which also keeps the
default PyTorch collate happy (list of strings, same length as batch).

.. code-block:: python

{
# ... core keys above ...

"memory": str, # Cumulative subtask summary for the current frame's segment.
# Empty string ("") when memory_raw is absent
# (legacy / unannotated dataset).
"next_memory": str, # Memory string for frame t+1 (same as `memory` within a
# segment, differs at segment boundaries). Clipped at episode
# end. Empty string when unavailable.

"speed": torch.LongTensor, # Scalar; episode length in frames rounded to the nearest multiple of
# 500 (so short <250-frame episodes bucket to 0). Populated
# unconditionally from ``info.json`` — available on every
# LeRobotDataset regardless of whether the dataset went through
# ``attach_metadata``. Name is historical; think
# "episode-length bucket".
"speed_is_pad": torch.BoolTensor, # True only when the dataset has no episode-length metadata
# (pure VQA / legacy fake datasets) or when the metadata drop
# rolls in _emit_optional_keys fire at training time.

"mistake": torch.BoolTensor, # Scalar; True iff the current segment's success flag is False.
"mistake_is_pad": torch.BoolTensor,

"quality": torch.LongTensor, # Scalar in {1,2,3,4,5}; episode-level quality score.
"quality_is_pad": torch.BoolTensor,

"subgoal0": torch.Tensor, # shape (3, H, W), values in [0,1]. A single future frame from
# camera0 sampled either at end-of-segment (with probability
# `subgoal_end_of_segment_prob`) or uniformly in [t, t+4 seconds].
# ...
"subgoal{num_cams-1}": torch.Tensor,
"subgoal_is_pad": torch.BoolTensor, # Single flag covering every `subgoalK`. Subgoals are either
# all present (annotated dataset, not dropped this step) or
# all padded (legacy dataset, or `subgoal_drop_prob` fired).

# `response` (already in the core fields) may be replaced with ""
# when `response_drop_prob` fires — consumers read "" as masked,
# same convention as `memory` / `next_memory`.
}

Subgoals are always rank-3 ``(3, H, W)`` regardless of
``n_obs_history`` — they represent a single future target frame, not a
temporal window. All camera slots share a single ``subgoal_is_pad``
flag because subgoals are all-or-none.

Subgoal image **paths** are read from ``meta/info.json`` under the
``subgoals`` key. When the key is absent (the state of every LeRobot
dataset today), ``_load_subgoal_frames`` returns ``{}`` and every
``subgoalK`` tensor comes out zero-filled with ``subgoal_is_pad=True``.
Datasets opt in to subgoals by adding the key; the loader then uses the
frame-selection machinery (end-of-segment vs. uniform ``[t, t+4 s]``)
described below.

Training-time dropout
^^^^^^^^^^^^^^^^^^^^^

Six probability fields on ``DatasetMixtureConfig`` control how often
each optional key is masked during a single ``__getitem__`` call. Masks
are independent per sample (each call rolls fresh). ``DataLoader``
workers seed their own torch RNG, so samples within a batch are
independent across workers; seed globally via ``torch.manual_seed(...)``
for reproducibility.

.. list-table::
:header-rows: 1
:widths: 34 14 52

* - Field
- Default
- Effect
* - ``history_state_drop_prob``
- ``0.3``
- Zero-fills ``state`` and historical camera frames (when
``n_obs_history > 1``); sets ``obs_history_is_pad`` all True.
* - ``subgoal_drop_prob``
- ``0.75``
- Zero-fills every ``subgoal{K}`` image together and sets the single
shared ``subgoal_is_pad`` flag to True.
* - ``subgoal_end_of_segment_prob``
- ``0.25``
- Probability that a *present* subgoal is sourced from the end of
the current segment. Otherwise sampled uniformly in time from
the current timestamp through ``t + 4 s`` (clipped at segment
end, then episode end).
* - ``response_drop_prob``
- ``0.3``
- Replaces ``response`` with the empty string. Only rolled when
subgoals are NOT dropped (dropping both response and subgoals
would remove the primary task signal).
* - ``metadata_drop_all_prob``
- ``0.15``
- Masks ``speed``, ``mistake``, and ``quality`` together.
* - ``metadata_drop_each_prob``
- ``0.05``
- Per-field independent mask roll for each of ``speed``,
``mistake``, ``quality``. Only rolled when the shared drop did
not fire.
* - ``val_enable_optional_key_dropout``
- ``False``
- Whether the five drop rolls above also fire on the **validation**
split. Default is ``False`` so validation metrics aren't
artificially noisy. Set to ``True`` if you want the validation
distribution to match training. Subgoal *frame* selection
(end-of-segment vs. uniform in the next 4 s) stays random either
way — only the masking logic is gated.

``make_dataset`` enforces this by giving the validation subset its own
shallow-copied dataset instance with ``enable_optional_key_dropout``
flipped accordingly; the underlying ``meta`` / ``hf_dataset`` objects
are still shared with the training subset, so the extra copy is cheap.

Legacy datasets that have not been passed through
:mod:`opentau.scripts.attach_metadata` still load: every optional key
appears with a zero/empty value and ``_is_pad=True``, so policies that
consume these fields can train without gating on dataset provenance.

Configs
-------
Configuration management is handled using `Draccus <https://github.com/dlwh/draccus>`_.
Expand Down
4 changes: 3 additions & 1 deletion docs/source/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,4 +126,6 @@ Configure accelerate for your distributed training setup:

accelerate config

This will create an accelerate config file at `~/.cache/huggingface/accelerate/default_config.yaml`. We are currently using DeepSpeed ZeRO2 for model parallelism distributed training. For an accelerate config example, see `this config file <https://github.com/TensorAuto/OpenTau/blob/main/configs/examples/accelerate_deepspeed_config.yaml>`_ used for our CI pipelines.
This will create an accelerate config file at `~/.cache/huggingface/accelerate/default_config.yaml`. The recommended setup for models that fit in GPU memory (including the pi05 reference policy) is plain DDP with bf16 mixed precision. For an example, see `configs/examples/accelerate_ddp_config.yaml <https://github.com/TensorAuto/OpenTau/blob/main/configs/examples/accelerate_ddp_config.yaml>`_.

A DeepSpeed ZeRO-2 config is also available at `configs/examples/accelerate_deepspeed_config.yaml <https://github.com/TensorAuto/OpenTau/blob/main/configs/examples/accelerate_deepspeed_config.yaml>`_ for memory-constrained scenarios (very large models, long sequences), but note that it can be significantly slower than DDP on mid-sized policies with many small parameter tensors due to per-parameter gradient-reduce hooks. See issue #177 for benchmarks.
2 changes: 2 additions & 0 deletions docs/source/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,10 @@ This section provides step-by-step guides for common tasks in OpenTau, including
tutorials/training
tutorials/inference
tutorials/evaluation
tutorials/benchmarking
tutorials/deployment
tutorials/datasets
tutorials/attach_metadata
tutorials/visualization
RL
tutorials/human_demo
Expand Down
Loading
Loading