Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,8 @@ These override defaults — read them before running anything.
- **Per-rank branch decisions that fire collectives must be OR-reduced first.** When a `forward` takes a Python-level branch based on what the local micro-batch contains (e.g. `if has_response: embed_language_tokens(...)` in `embed_prefix`), use `_global_or_branch_decisions` in `src/opentau/policies/pi07/low_level/modeling_pi07_low_level.py` — one SUM all-reduce that both OR-reduces the per-rank decisions and asserts cross-rank presence agreement. Adding a new optional branch in distributed `forward` without going through it (or an equivalent pre-branch all-reduce) is the same bug.
- **Composite forward units must be a single `nn.Module`.** Bundle multi-component decoder steps (e.g. a backbone layer paired with an action-expert layer) into one `nn.Module` so FSDP's all-gather hook prefetches every sub-component together — like `InterleavedDecoderLayer` in `src/opentau/policies/pi07/gemma3_with_expert.py`. Calling sub-components directly on a separately-wrapped layer (`layer.input_layernorm(...)`, `layer.self_attn.q_proj(...)`) bypasses the hook and triggers mismatched all-gather sizes across ranks.

6. **Tests that mutate module-level state must save and restore it via `try`/`finally`.** Module-level dedup flags like `_CONTROL_MODE_WARNED` (set) and `_SKIP_TIMESTAMP_WARNED` (bool) in `src/opentau/datasets/lerobot_dataset.py` persist across tests within the same pytest-xdist worker process. A test that flips the flag to exercise the "first-time" branch and then leaves it flipped will silently mask any later test that wants to assert the warning fires again — a regression that won't show up locally but can flake under different `pytest-xdist` shard distributions. Pattern: capture the original up-front, mutate inside `try`, restore in `finally`. See `test_skip_timestamp_warning_emitted_once_per_process` in `tests/datasets/test_datasets.py` for the canonical shape.

## Project overview

OpenTau is Tensor's open-source PyTorch training toolchain for vision-language-action (VLA) models — a fork of LeRobot with extra capabilities (heterogeneous-dataset co-training, discrete actions for π₀.₅, knowledge insulation, dropout in PaliGemma, π*₀.₆-style RL, validation splits, profilers). Any LeRobot-compliant policy and dataset works directly. Pinned to **Python 3.10**.
Expand Down
Loading