Skip to content

Pim/feat/g1 groot wbc part 3#2225

Closed
Nabla7 wants to merge 45 commits into
mainfrom
pim/feat/g1-groot-wbc-part-3
Closed

Pim/feat/g1 groot wbc part 3#2225
Nabla7 wants to merge 45 commits into
mainfrom
pim/feat/g1-groot-wbc-part-3

Conversation

@Nabla7

@Nabla7 Nabla7 commented May 22, 2026

Copy link
Copy Markdown
Collaborator

Problem

Closes DIM-XXX

Solution

How to Test

Contributor License Agreement

  • I have read and approved the CLA.

Nabla7 and others added 30 commits May 6, 2026 21:35
…era_pose, _publish_tf refactor, spec.py docstring expansion)
The DDS callback registered via ChannelSubscriber.Init(handler, 10)
doesn't fire reliably on macOS, so the adapter timed out waiting for
the first LowState to capture mode_machine.  Switch to passive-mode
Init(None, 0) and Read() per tick.  mode_machine is now hardcoded to
the static value for the 29-DOF G1 instead of read-back-then-echoed.

Also adds has_motor_states() to satisfy the post-refactor
WholeBodyAdapter Protocol, makes _release_sport_mode null-tolerant
(CheckMode returns (status, None) once nothing is active), drops the
now-unused _on_low_state callback.

WebsocketVisModule: restore the arm / disarm / set_dry_run socket.io
handlers and the matching activate / dry_run Out[Bool] ports that an
earlier cleanup pass dropped — the dashboard buttons now publish on
the LCM topics the coordinator already subscribes to.

TransportWholeBodyAdapter: trivial Protocol fix — implement read_odom
returning None so isinstance(adapter, WholeBodyAdapter) passes the
runtime_checkable type check in ControlCoordinator._setup_hardware.
The earlier cleanup pass over-deleted: openarm manipulator support,
the dimos/utils/workspace.py module, the audio STT node_whisper
edits, CI configs, README, command-center package-lock churn, and
two go2 LFS DBs.  None of those are part of the GR00T WBC story.

Restore each to origin/dev's content.  Drop one path that was added
on this branch but doesn't exist on dev (api/requirements.txt).

PR diff is now 24 files / +2623 / -210, all GR00T-WBC-relevant.
Both Mustafa's #1954 (already on dev) and the GR00T-WBC PR added
this method.  After the merge, both definitions live in
coordinator.py — Python uses the second, the first becomes dead
code, and the transport adapter (which depends on hardware_id
being passed) silently picks up its default ``"wholebody"`` topic
prefix instead of the component's id.

Collapse to one definition that passes the union of kwargs both
adapters need: dof, hardware_id, network_interface, domain_id,
address, plus **adapter_kwargs.  Adapters tolerate extras via
their constructor's **_ catch-all.
Switch unitree-g1-groot-wbc to the architecture Mustafa landed in
#1954 (G1WholeBodyConnection Module + TransportWholeBodyAdapter via
LCM bridge), matching unitree-g1-coordinator.  Single G1 low-level
pattern in the codebase now.

  * Delete dimos/hardware/whole_body/unitree/g1/adapter.py — the
    UnitreeG1LowLevelAdapter (single-process DDS) is gone.
  * Rewrite unitree_g1_groot_wbc.py to compose G1WholeBodyConnection
    + ControlCoordinator(adapter_type="transport_lcm") + dashboard
    via autoconnect.
  * Patch wholebody_connection.py to apply the macOS-cyclonedds fixes
    the dropped monolith was carrying:
      - Init(None, 0) instead of Init(callback, 10)
      - Hardcode mode_machine = 5 (the static value for 29-DOF G1)
      - publish_loop polls subscriber.Read() per tick
      - Drop the now-unused _on_low_state callback
_MJCF_PATH was a relative path "data/mujoco_sim/g1_gear_wbc.xml"
which only resolved when dimos was launched from the repo root.
The mujoco viewer subprocess inherits CWD from the launching shell,
so running ``dimos run unitree-g1-groot-wbc-sim`` from any other
directory tripped FileNotFoundError on viewer startup.

get_data("mujoco_sim") resolves the absolute install path and
auto-extracts the LFS tarball on first run.
  1. unitree_g1_groot_wbc_sim.py: resolve _MJCF_PATH via get_data so
     both MujocoSimModule and the DIMOS_MUJOCO_VIEW=1 subprocess open
     the file regardless of shell CWD.  Earlier the relative path
     "data/mujoco_sim/g1_gear_wbc.xml" only worked from the repo root.

  2. wholebody_connection.py: hardcoded mode_machine = 5 has no
     fallback if a future G1 firmware revision changes the value.
     Add a one-shot warning the first time we read a LowState whose
     mode_machine doesn't match.  Self-disables after the first log
     line so it doesn't spam the operator.

  3. test_unitree_g1_groot_wbc.py: composition smoke test verifying
     the three deployed modules, bridge adapter binding (transport_lcm,
     confirming the migration off the deleted unitree_g1 monolith),
     real-hw safety profile (auto_arm=False, auto_dry_run=True,
     default_ramp_seconds=10.0), servo_arms defaults, and all bridge
     + dashboard LCM topics.  No DDS, no hardware — just module
     imports.
Two safety fixes for back-to-back ``dimos run`` cycles on real
hardware:

  1. ``stop()`` now sends a final ``LowCmd_`` with ``mode=0x00``
     (motors disabled) for every motor slot before closing the DDS
     publisher.  Previously the last commanded pose lingered in the
     motor controllers — when the next process opened a publisher
     and re-released sport mode, those stale stiff commands fought
     the new participant during the handoff window, producing
     audible gearbox stress.

  2. ``_release_sport_mode()`` bails immediately if the first
     ``CheckMode()`` reports nothing active (``result is None`` or
     ``{"name": ""}``).  A clean second start now logs "Sport mode
     already released — skipping ReleaseMode" and skips the
     loop-and-poll dance entirely, removing the handoff window.
``dimos/project/test_no_sections.py`` forbids ``# -----`` separator
banners and ``# --- text ---`` inline section headers as a project
style rule.  Three files in this PR carried the pattern from earlier
drafts; convert inline-section to plain ``# text`` and drop pure
separators.

No code-behaviour change.
  * tick_loop.py / groot_wbc_task.py: import the class
    (``from dimos.msgs.geometry_msgs.PoseStamped import PoseStamped``,
    same for ``Twist``), not the module — module-as-type tripped
    mypy ``[valid-type]``.
  * coordinator.py: when constructing ``GrootWBCTask``, narrow ``hw``
    to ``ConnectedWholeBody`` via isinstance before passing
    ``hw.adapter`` (else mypy unions the manipulator + whole-body
    adapter types).  Also raises a clear TypeError if a non-whole-body
    component is referenced by a groot_wbc task.
  * test_unitree_g1_groot_wbc.py: type ``_coordinator_kwargs() ->
    dict[str, Any]`` so list/iter usages on the values typecheck.

``mypy dimos`` is clean (670 files).
Phase 1 — trivial cleanups:
- Delete test_unitree_g1_groot_wbc.py (the "tests don't run anything"
  blueprint test the reviewer asked to drop).
- MujocoEngine.data public property; mujoco_sim_module stops reaching
  into engine._data.
- JointServoTask.start() resets _last_update_time so a non-zero-timeout
  caller doesn't time out on the first tick.
- Split wholebody_connection's two-tasks-in-one publish loop into
  _drain_low_state, _verify_mode_machine_once, _snapshot_motor_imu,
  _publish_motor_state_and_imu helpers; dropped the `sub is not None`
  guard.
- MujocoSimModuleConfig.imu_gyro_sensor_names + imu_accel_sensor_names
  configurable instead of G1-hardcoded; default list still covers the
  common humanoid pelvis names.

Phase 2 — renames and moves:
- groot_wbc_task.py -> g1_groot_wbc_task.py; GrootWBCTask ->
  G1GrootWBCTask (Paul: G1-specific values, name it).
- Fold _groot_wbc_common.py (joint lists, gain tables, ARM_DEFAULT_POSE)
  into g1_groot_wbc_task.py and delete the common file.
- Move dimos/hardware/whole_body/mujoco/g1/adapter.py ->
  dimos/simulation/adapters/whole_body/g1.py. Adapter registry now
  scans both roots.

Phase 3 — safety: G1GrootWBCTask no longer falls back to zero
  when a joint is missing from CoordinatorState. Adds last-known-good
  caches (_cached_q_29, _cached_dq_29, _cached_q_15) and a _state_seen
  guard that refuses to emit a command until at least one fully-
  populated state has been observed. Stops the "packet drop -> snap
  to zero pose -> robot falls" failure mode.

Phase 4 — task registry: new dimos/control/tasks/registry.py.
  Each task module exposes a register(registry) hook; coordinator's
  _create_task_from_config drops 100+ lines of if/elif and becomes
  a single control_task_registry.create(cfg.type, cfg, hardware=...)
  call. Task type "g1_groot_wbc" is the new canonical name;
  "groot_wbc" kept as an alias.

Phase 5 — activate / dry_run from streams to RPC:
  - Drop the `activate: In[Bool]` and `dry_run: In[Bool]` ports on
    ControlCoordinator. Replaced with @rpc set_activated(engaged) and
    @rpc set_dry_run(enabled) -- these are one-shot configuration
    events, not continuous signals.
  - WebsocketVisModule's arm/disarm/set_dry_run socket.io handlers
    call the coordinator's RPC directly via RPCClient(None,
    ControlCoordinator); the matching Out[DimosBool] ports go away.
  - Blueprints lose their now-dead `("activate", DimosBool)` /
    `("dry_run", DimosBool)` LCM transport entries.

Phase 6 — hardware addressing: _create_whole_body_adapter passes a
  single canonical `address` (matching the manipulator/twist-base
  pattern) instead of overloaded address + network_interface.
  Adapter __init__ signatures updated accordingly.

Phase 7 — MuJoCo subprocess hygiene:
  - Move the standalone viewer's logic from mujoco_view_subprocess.py
    into mujoco_engine.view_main() with `python -m
    dimos.simulation.engines.mujoco_engine <mjcf>` entry point.
    One MuJoCo entry-point file, not two. Adds the missing lock
    around `latest_*` shared state, try-finally for LCM teardown,
    and tick-aware render sleep so 60 Hz holds under load.
  - New MujocoViewerModule wraps the subprocess lifecycle as a
    proper dimos Module (start() spawns, stop() terminates, atexit
    fallback). Blueprints declare it instead of running
    subprocess.Popen at import time. Configurable via -o
    mujocoviewermodule.enabled=true; no more DIMOS_MUJOCO_VIEW env
    var.

Phase 8 — one blueprint, --simulation flag:
  - Merge unitree_g1_groot_wbc.py + unitree_g1_groot_wbc_sim.py into
    a single unitree_g1_groot_wbc.py that branches on
    global_config.simulation. Real-hw path: G1WholeBodyConnection +
    transport_lcm adapter, 500 Hz, unarmed + dry-run, servo_arms.
    Sim path: MujocoSimModule + sim_mujoco_g1 adapter, 50 Hz,
    auto-arm, optional MujocoViewerModule.
  - Drop the unitree-g1-groot-wbc-sim registry entry; CLI runs
    `dimos --simulation run unitree-g1-groot-wbc`.
  - Replace get_data() at import time with LfsPath() (Paul's
    don't-block-imports note).
  - Drop ROBOT_INTERFACE / GROOT_MODEL_DIR / DIMOS_DDS_DOMAIN /
    DIMOS_MUJOCO_VIEW env-var reads from blueprints; users override
    via `-o module.field=value` like every other dimos blueprint.

Deferred (Mustafa flagged for separate PRs, captured as TODOs):
- CoordinatorState should carry IMU data so the task doesn't have to
  reach into the adapter for read_imu (TODO in g1_groot_wbc_task.py).
- Drop read_odom from WholeBodyAdapter Protocol -- most adapters
  shouldn't be a source of base pose (TODO in whole_body/spec.py).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Protocol

Promoted the two follow-up TODOs from the PR #2033 review (Mustafa
:381 and :91) into this PR after re-reading the actual scope:

1) IMU on CoordinatorState (Mustafa :381):
   - Add ``imu: dict[hardware_id, IMUState]`` field to CoordinatorState.
   - TickLoop polls every ConnectedWholeBody's read_imu() each tick via
     a new _read_all_imu() helper and stuffs the dict on the state.
   - G1GrootWBCTask reads from state.imu first, falls back to
     self._adapter.read_imu() only when state.imu is empty (e.g. unit
     tests that build a bare CoordinatorState). The state path is the
     non-coupled one going forward.

2) Drop read_odom from WholeBodyAdapter Protocol (Mustafa :91):
   - Remove read_odom from spec.py — the WholeBodyAdapter Protocol no
     longer promises base pose. Most adapters never did (real-hw G1's
     was hardcoded None, transport_lcm's likewise); the few that did
     (sim_mujoco_g1) reported pose the engine module already publishes
     itself, so the adapter polling was redundant.
   - Drop the read_odom impls from SimMujocoG1WholeBodyAdapter and
     TransportWholeBodyAdapter.
   - Drop ControlCoordinatorConfig.publish_odom + ControlCoordinator's
     ``odom: Out[PoseStamped]`` port + TickLoop._publish_odom +
     odom_callback wiring. ~50 lines deleted.
   - Sim path: MujocoSimModule's ``odom`` Out is now the sole producer
     of ``/odom``. Removed the previous ``/sim/odom`` topic override
     since the autoconnect default for ``(odom, PoseStamped)`` is
     ``/odom``.
   - Real-hw path: nothing publishes /odom (matches current behaviour
     — read_odom returned None). A future estimator Module subscribes
     to motor_states + imu and publishes to /odom directly, which is
     the architecturally correct seam.

Not addressed in this PR (genuinely pre-existing, not introduced here):
   Paul's "ideal solution" of converting mujoco_engine.py to run as
   the main thread in its own process (so mujoco_process.py and
   mujoco_engine.py can collapse to one). That's a refactor of the
   pre-existing two-MuJoCo-implementations situation; the PR review
   only flagged "don't add a third", which Phase 7 of the previous
   commit already addressed by folding the standalone viewer into
   mujoco_engine.view_main(). Unifying mujoco_process.py with
   mujoco_engine.py touches the manipulator stack (xarm/piper) and
   is out of scope for this G1-WBC PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Paul's "ideal solution" from the PR #2033 review (paraphrased):
"convert mujoco_engine.py to work as the main thread in a new process
if possible." Done.

WholeBodySimHooks (new):
- Extracts the per-step SHM-bridge logic out of MujocoSimModule into
  a standalone class: pre_step pulls motor commands from SHM and
  applies PD-with-feedforward; post_step writes motor state + gripper
  back to SHM. The Module's _apply_shm_commands + _publish_shm_state
  methods, the latched _latest_pd_* state, and _gripper_joint_to_ctrl
  all move here. Same hook bodies now run regardless of whether the
  engine lives in a thread or a subprocess.

mujoco_engine.engine_main (new entry point):
- Standalone "whole-body sim subprocess" loaded with `python -m
  dimos.simulation.engines.mujoco_engine <mjcf> <shm_key> <dof> [--view]`.
  Owns: MujocoEngine, WholeBodySimHooks, an LCM publisher for /odom
  and /imu so non-adapter consumers (viser viewer etc.) still get
  those topics, plus signal handlers that tear down SHM + LCM
  cleanly on SIGTERM/SIGINT.
- Replaces the previous view-only view_main(): the subprocess now
  runs full physics + viewer instead of just mirroring state from LCM.

MujocoSimModule:
- New config field `engine_mode: Literal["thread", "subprocess"]`
  (default "thread"). Thread mode: today's behaviour, untouched
  except for delegating SHM hooks to WholeBodySimHooks instead of
  inline methods. Subprocess mode: skips in-process engine entirely,
  spawns `engine_main` via subprocess.Popen, terminates it cleanly
  on stop(). macOS uses mjpython for the subprocess (Cocoa main-
  thread requirement); Linux uses sys.executable.
- Subprocess + cameras combo is intentionally rejected with a clear
  error — no cross-process frame buffer yet; users either disable
  cameras or pick thread mode.

MujocoViewerModule deleted:
- Now redundant. The engine subprocess can launch its own viewer
  directly with viewer.launch_passive (on its own main thread) when
  the Module is configured headless=False + engine_mode="subprocess".
  Blueprint loses its standalone _viewer module entry; the engine
  subprocess handles viewing.

What this does NOT do (deliberately):
- Touch the manipulator stack (dimos/simulation/mujoco/mujoco_process.py).
  Paul didn't ask for that — he asked specifically for mujoco_engine.py
  to support subprocess+main-thread, which is what landed. The pre-
  existing two-MuJoCo-implementations situation can collapse to one
  in a separate refactor that addresses xarm + piper too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…t import

Blueprints can inspect ``global_config`` at module top level to pick
between real-hw and sim backends (``unitree_g1_groot_wbc.py`` branches
on ``global_config.simulation`` to decide which WholeBodyAdapter to
wire). Until now, the ``run`` command only handed the overrides off to
``ModuleCoordinator.build`` *after* the blueprint was already imported —
so the import-time branch read the stale defaults and the resulting
BlueprintConfig was missing the modules the user was trying to
configure via ``-o module.field=value``. Repro before this fix:

    $ dimos --simulation run unitree-g1-groot-wbc \\
        -o mujocosimmodule.engine_mode=subprocess
    ValidationError: mujocosimmodule
      Extra inputs are not permitted

(Real-hw branch was chosen at import; ``mujocosimmodule`` slot doesn't
exist in the resulting config.)

Apply the same ``global_config.update(**cli_config_overrides)`` the
``show_config`` command already uses, but earlier — before
``get_by_name_or_exit`` triggers the blueprint module import.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented May 22, 2026

Copy link
Copy Markdown

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
2039 1 2038 69
View the top 1 failed test(s) by shortest run time
dimos.types.test_timestamped::test_timestamp_alignment
Stack Traces | 20.1s run time
test_scheduler = <reactivex.scheduler.threadpoolscheduler.ThreadPoolScheduler object at 0x7c4535fd7470>

    @pytest.mark.self_hosted
    @pytest.mark.skipif_macos_bug
    def test_timestamp_alignment(test_scheduler) -> None:
        speed = 5.0
    
        # ensure that lfs package is downloaded
        get_data("unitree_office_walk")
    
        raw_frames = []
    
        def spy(image):
            raw_frames.append(image.ts)
            print(image.ts)
            return image
    
        # sensor reply of raw video frames
        video_raw = (
            LegacyPickleStore(
                "unitree_office_walk/video", autocast=lambda x: Image.from_numpy(x).to_rgb()
            )
            .stream(speed)
            .pipe(ops.take(30))
        )
    
        processed_frames = []
    
        def process_video_frame(frame):
            processed_frames.append(frame.ts)
            time.sleep(0.5 / speed)
            return frame
    
        # fake reply of some 0.5s processor of video frames that drops messages
        # Pass the scheduler to backpressure to manage threads properly
        fake_video_processor = backpressure(
            video_raw.pipe(ops.map(spy)), scheduler=test_scheduler
        ).pipe(ops.map(process_video_frame))
    
        aligned_frames = align_timestamped(fake_video_processor, video_raw).pipe(ops.to_list()).run()
    
        assert len(raw_frames) == 30
        assert len(processed_frames) >= 2
>       assert len(aligned_frames) >= 2
E       AssertionError: assert 1 >= 2
E        +  where 1 = len([(Image(shape=(720, 1280, 3), format=RGB, dtype=uint8, frame_id='', ts=1779491439.0002713), Image(shape=(720, 1280, 3), format=RGB, dtype=uint8, frame_id='', ts=1779491438.9322538))])

aligned_frames = [(Image(shape=(720, 1280, 3), format=RGB, dtype=uint8, frame_id='', ts=1779491439.0002713), Image(shape=(720, 1280, 3), format=RGB, dtype=uint8, frame_id='', ts=1779491438.9322538))]
fake_video_processor = <reactivex.observable.observable.Observable object at 0x7c4535fe0f50>
process_video_frame = <function test_timestamp_alignment.<locals>.process_video_frame at 0x7c4535fc47c0>
processed_frames = [1779491439.0002713, 1779491439.002054, 1779491440.0448275]
raw_frames = [1779491439.0002713, 1779491439.002054, 1779491439.0034208, 1779491439.8667595, 1779491439.9635835, 1779491439.9690166, ...]
speed      = 5.0
spy        = <function test_timestamp_alignment.<locals>.spy at 0x7c4535fc5300>
test_scheduler = <reactivex.scheduler.threadpoolscheduler.ThreadPoolScheduler object at 0x7c4535fd7470>
video_raw  = <reactivex.observable.observable.Observable object at 0x7c4535fe2a80>

dimos/types/test_timestamped.py:325: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Comment thread dimos/robot/cli/dimos.py

# this is a workaround until we have a proper way to have delayed-module-choice in blueprints
# ex: vis_module(viewer=global_config.viewer) is wrong (viewer will always be default value) without this patch
# Apply CLI config before importing blueprints that branch on global_config.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this here, unnecessary comment. Undo.

Comment thread dimos/control/task.py

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mustafab0 is it ok if imu lives here?


joint_states = self._read_all_hardware()
state = CoordinatorState(joints=joint_states, t_now=t_now, dt=dt)
imu_states = self._read_all_imu()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mustafab0 , same here getting imu state is now part of the control tick loop because I need to get it from somewhere and controlcoordinator is the canonical interface for hardware, it makes sense that it would provide imu bit it does extend the responsibility of controlcoordinator beyond that of pure joint control, what are your thoughts here?

)


class _WholeBodySimHooks:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does bloat mujoco sim module, probably needs to be moved elsewhere along with other robot specific logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants