Cloud-relayed VR/keyboard teleop. Operator's browser → broker (CF SFU) → robot, all over WebRTC. No port forwarding, no LAN, no public IP — the robot is just an outbound HTTPS/UDP client.
1. Big picture
┌─────────────┐ HTTPS ┌───────────────────┐ HTTPS+REST ┌──────────────┐
│ Operator │ ──────────► │ dimensional-teleop│ ─────────────► │ Cloudflare │
│ (browser) │ X-Bearer │ broker (EC2) │ CF app key │ Realtime SFU │
└──────┬──────┘ │ FastAPI │ └──────┬───────┘
│ └───────────────────┘ │
│ ▲ │
│ HTTPS (X-Robot-API-Key) ────┘ │
│ │
│ WebRTC (data + video) ◄──────────────────────────────►│
│ │
▼ ▼
Both peers connect to CF; CF bridges them. ┌──────────────────────────┐
Broker only handles SETUP — once channels are │ HostedTeleopModule │
bridged, broker is off the data path. │ (this package) │
└──────────────────────────┘
Two roles for the broker:
- Auth (operator JWT login + robot API key validation).
- Session setup (creates CF sessions, bridges DataChannels, pulls video).
Everything operational (PoseStamped, Joy, Twist, video, telemetry) flows
direct: operator's browser ↔ Cloudflare edge ↔ robot. Broker is not a
WebRTC peer.
3. The module — HostedTeleopModule
One DimOS module wraps the entire WebRTC + control plane. It has:
class HostedTeleopModule(Module):
color_image: In[Image]
left_controller_output: Out[PoseStamped],
right_controller_output: Out[PoseStamped]
buttons: Out[Buttons]
cmd_vel_stamped: Out[TwistStamped]
video_stats: Out[VideoStats] — operator-side video health (sampled in browser via getStats(), relayed back).
-
Four threads, separated so async + sync + hard-rate cycles don't fight:
| Thread |
Runs |
Cadence |
_loop_thread |
asyncio event loop (aiortc + httpx callbacks, frame pulls) |
continuous |
_heartbeat_thread |
POST /heartbeat → react to SCTP ids in ack |
1 Hz |
_telemetry_thread |
snapshot LiveStreamStats → push as JSON |
3 Hz |
_control_loop_thread |
engage → publish pose deltas + buttons |
50 Hz fixed |
Sync threads talk to the asyncio loop via run_coroutine_threadsafe(...) (blocking)
or call_soon_threadsafe(...) (fire-and-forget). All loops watch a shared
_stop_event: threading.Event so stop() exits them promptly.
4. Transports — what flows where
| Channel |
Direction |
Reliability |
Carries |
cmd_unreliable |
operator → robot |
unordered, no retransmits |
LCM-encoded PoseStamped, Joy, TwistStamped |
state_reliable |
operator → robot |
ordered, reliable |
JSON: ping, clock_report, video_stats |
state_reliable_back |
robot → operator |
ordered, reliable |
JSON: pong, robot_telemetry |
| video track |
robot → operator |
unreliable, paced by camera |
H.264-encoded frames |
All three datachannels share one SCTP association (because MAX_BUNDLE).
SCTP ids are assigned by CF, not by us — robot learns them via the heartbeat
ack; operator learns them in the /bridge-datachannel HTTP response.
Inbound demux is by LCM fingerprint (first 8 bytes of every payload).
One _decoders dict maps fingerprint → typed handler. No envelope needed.
5. Session lifecycle
start()
├─ subscribe color_image → _video_track.set_latest
├─ _start_event_loop() # spawn _loop_thread + asyncio loop
├─ _connect_blocking() # sync wrapper around async _connect
│ └─ on _loop_thread:
│ 1. build PC (MAX_BUNDLE) + addTrack(video) + throwaway DC id=0
│ 2. createOffer / setLocalDescription / wait full ICE gather
│ 3. POST /api/v1/sessions to broker
│ → {session_id, sdp_answer, cf_session_id}
│ 4. propagate_bundle_candidates(answer) ← SDP workaround
│ 5. setRemoteDescription(answer) → PC reaches "connected"
│ 6. on connected: _video_track.arm() (discard buffered frames)
├─ _start_heartbeat() # 1 Hz POST /heartbeat
├─ _start_telemetry() # 3 Hz JSON push
└─ _start_control_loop() # 50 Hz fixed-rate
Steady state — operator joins:
Heartbeat ack returns 3 SCTP ids → robot opens negotiated DataChannels.
cmd_unreliable callback decodes bytes → updates _current_poses / _controllers.
Control loop reads snapshot → publishes Out streams.
Telemetry thread snapshots stats → JSON over state_reliable_back.
Robot's camera frames → _video_track → aiortc encoder → operator browser.
6. Subclasses + blueprints
HostedTeleopModule is abstract for actuation. Two concrete subclasses:
| Subclass |
For |
Adds |
HostedArmTeleopModule |
arm robots |
task_names: dict[str, str] config; stamps frame_id so the coordinator routes IK targets; packs analog triggers into Buttons. |
HostedTwistTeleopModule |
mobile bases |
linear_speed / angular_speed config; scales [-1,1] keyboard → m/s + rad/s. |
CLI-runnable blueprints:
dimos run teleop-hosted-xarm7 # arm teleop
dimos run teleop-hosted-go2 # mobile-base teleop
dimos run teleop-hosted-xarm7 hosted-teleop-recorder # + record to .db
Operator-facing streams (controller/joy/twist) ride WebRTC via the broker.
Robot-internal streams (coordinator commands, recorder inputs) stay on LCM.
Cloud-relayed VR/keyboard teleop. Operator's browser → broker (CF SFU) → robot, all over WebRTC. No port forwarding, no LAN, no public IP — the robot is just an outbound HTTPS/UDP client.
1. Big picture
Two roles for the broker:
Everything operational (PoseStamped, Joy, Twist, video, telemetry) flows
direct: operator's browser ↔ Cloudflare edge ↔ robot. Broker is not a
WebRTC peer.
3. The module —
HostedTeleopModuleOne DimOS module wraps the entire WebRTC + control plane. It has:
class HostedTeleopModule(Module):
color_image: In[Image]left_controller_output: Out[PoseStamped],right_controller_output: Out[PoseStamped]buttons: Out[Buttons]cmd_vel_stamped: Out[TwistStamped]video_stats: Out[VideoStats]— operator-side video health (sampled in browser viagetStats(), relayed back).Four threads, separated so async + sync + hard-rate cycles don't fight:
_loop_thread_heartbeat_thread/heartbeat→ react to SCTP ids in ack_telemetry_threadLiveStreamStats→ push as JSON_control_loop_threadSync threads talk to the asyncio loop via
run_coroutine_threadsafe(...)(blocking)or
call_soon_threadsafe(...)(fire-and-forget). All loops watch a shared_stop_event: threading.Eventsostop()exits them promptly.4. Transports — what flows where
cmd_unreliablePoseStamped,Joy,TwistStampedstate_reliableping,clock_report,video_statsstate_reliable_backpong,robot_telemetryAll three datachannels share one SCTP association (because
MAX_BUNDLE).SCTP ids are assigned by CF, not by us — robot learns them via the heartbeat
ack; operator learns them in the
/bridge-datachannelHTTP response.Inbound demux is by LCM fingerprint (first 8 bytes of every payload).
One
_decodersdict maps fingerprint → typed handler. No envelope needed.5. Session lifecycle
6. Subclasses + blueprints
HostedTeleopModuleis abstract for actuation. Two concrete subclasses:HostedArmTeleopModuletask_names: dict[str, str]config; stampsframe_idso the coordinator routes IK targets; packs analog triggers into Buttons.HostedTwistTeleopModulelinear_speed/angular_speedconfig; scales[-1,1]keyboard → m/s + rad/s.CLI-runnable blueprints:
Operator-facing streams (controller/joy/twist) ride WebRTC via the broker.
Robot-internal streams (coordinator commands, recorder inputs) stay on LCM.