Skip to content

Hosted Teleop Overview #2372

@ruthwikdasyam

Description

@ruthwikdasyam

Cloud-relayed VR/keyboard teleop. Operator's browser → broker (CF SFU) → robot, all over WebRTC. No port forwarding, no LAN, no public IP — the robot is just an outbound HTTPS/UDP client.

1. Big picture

┌─────────────┐    HTTPS    ┌───────────────────┐   HTTPS+REST   ┌──────────────┐
│ Operator    │ ──────────► │ dimensional-teleop│ ─────────────► │ Cloudflare   │
│ (browser)   │   X-Bearer  │   broker (EC2)    │   CF app key   │ Realtime SFU │
└──────┬──────┘             │   FastAPI         │                └──────┬───────┘
       │                    └───────────────────┘                       │
       │                              ▲                                 │
       │  HTTPS (X-Robot-API-Key) ────┘                                  │
       │                                                                 │
       │           WebRTC (data + video) ◄──────────────────────────────►│
       │                                                                 │
       ▼                                                                 ▼
   Both peers connect to CF; CF bridges them.            ┌──────────────────────────┐
   Broker only handles SETUP — once channels are         │ HostedTeleopModule       │
   bridged, broker is off the data path.                 │ (this package)           │
                                                         └──────────────────────────┘

Two roles for the broker:

  • Auth (operator JWT login + robot API key validation).
  • Session setup (creates CF sessions, bridges DataChannels, pulls video).

Everything operational (PoseStamped, Joy, Twist, video, telemetry) flows
direct: operator's browser ↔ Cloudflare edge ↔ robot. Broker is not a
WebRTC peer.

3. The module — HostedTeleopModule

One DimOS module wraps the entire WebRTC + control plane. It has:

class HostedTeleopModule(Module):
color_image: In[Image]
left_controller_output: Out[PoseStamped],
right_controller_output: Out[PoseStamped]
buttons: Out[Buttons]
cmd_vel_stamped: Out[TwistStamped]
video_stats: Out[VideoStats] — operator-side video health (sampled in browser via getStats(), relayed back).

  • Four threads, separated so async + sync + hard-rate cycles don't fight:

    Thread Runs Cadence
    _loop_thread asyncio event loop (aiortc + httpx callbacks, frame pulls) continuous
    _heartbeat_thread POST /heartbeat → react to SCTP ids in ack 1 Hz
    _telemetry_thread snapshot LiveStreamStats → push as JSON 3 Hz
    _control_loop_thread engage → publish pose deltas + buttons 50 Hz fixed

    Sync threads talk to the asyncio loop via run_coroutine_threadsafe(...) (blocking)
    or call_soon_threadsafe(...) (fire-and-forget). All loops watch a shared
    _stop_event: threading.Event so stop() exits them promptly.


4. Transports — what flows where

Channel Direction Reliability Carries
cmd_unreliable operator → robot unordered, no retransmits LCM-encoded PoseStamped, Joy, TwistStamped
state_reliable operator → robot ordered, reliable JSON: ping, clock_report, video_stats
state_reliable_back robot → operator ordered, reliable JSON: pong, robot_telemetry
video track robot → operator unreliable, paced by camera H.264-encoded frames

All three datachannels share one SCTP association (because MAX_BUNDLE).
SCTP ids are assigned by CF, not by us — robot learns them via the heartbeat
ack; operator learns them in the /bridge-datachannel HTTP response.

Inbound demux is by LCM fingerprint (first 8 bytes of every payload).
One _decoders dict maps fingerprint → typed handler. No envelope needed.


5. Session lifecycle

start()
  ├─ subscribe color_image → _video_track.set_latest
  ├─ _start_event_loop()                    # spawn _loop_thread + asyncio loop
  ├─ _connect_blocking()                    # sync wrapper around async _connect
  │     └─ on _loop_thread:
  │           1. build PC (MAX_BUNDLE) + addTrack(video) + throwaway DC id=0
  │           2. createOffer / setLocalDescription / wait full ICE gather
  │           3. POST /api/v1/sessions to broker
  │              → {session_id, sdp_answer, cf_session_id}
  │           4. propagate_bundle_candidates(answer)  ← SDP workaround
  │           5. setRemoteDescription(answer) → PC reaches "connected"
  │           6. on connected: _video_track.arm() (discard buffered frames)
  ├─ _start_heartbeat()                     # 1 Hz POST /heartbeat
  ├─ _start_telemetry()                     # 3 Hz JSON push
  └─ _start_control_loop()                  # 50 Hz fixed-rate

Steady state — operator joins:
  Heartbeat ack returns 3 SCTP ids → robot opens negotiated DataChannels.
  cmd_unreliable callback decodes bytes → updates _current_poses / _controllers.
  Control loop reads snapshot → publishes Out streams.
  Telemetry thread snapshots stats → JSON over state_reliable_back.
  Robot's camera frames → _video_track → aiortc encoder → operator browser.

6. Subclasses + blueprints

HostedTeleopModule is abstract for actuation. Two concrete subclasses:

Subclass For Adds
HostedArmTeleopModule arm robots task_names: dict[str, str] config; stamps frame_id so the coordinator routes IK targets; packs analog triggers into Buttons.
HostedTwistTeleopModule mobile bases linear_speed / angular_speed config; scales [-1,1] keyboard → m/s + rad/s.

CLI-runnable blueprints:

dimos run teleop-hosted-xarm7                            # arm teleop
dimos run teleop-hosted-go2                              # mobile-base teleop
dimos run teleop-hosted-xarm7 hosted-teleop-recorder     # + record to .db

Operator-facing streams (controller/joy/twist) ride WebRTC via the broker.
Robot-internal streams (coordinator commands, recorder inputs) stay on LCM.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions