Skip to content

feat(realtime): moving to livekit#122

Merged
VerioN1 merged 92 commits into
mainfrom
alon/livekit-prod
May 19, 2026
Merged

feat(realtime): moving to livekit#122
VerioN1 merged 92 commits into
mainfrom
alon/livekit-prod

Conversation

@VerioN1
Copy link
Copy Markdown
Contributor

@VerioN1 VerioN1 commented Apr 29, 2026

Note

High Risk
High risk because it replaces the core realtime transport (removing the prior WebRTC signaling/manager implementation) and changes the public realtime surface (client.realtime.connect/subscribe) plus connection/telemetry behaviors.

Overview
Realtime streaming is migrated to a LiveKit-backed architecture. The SDK realtime client now uses a control WebSocket (livekit_join/livekit_room_info) plus LiveKit room media via a new StreamSession, adds onConnectionChange/onQueuePosition callbacks and corresponding events, and exposes getSubscribeToken()/subscribeToken for viewers.

Adds a dedicated viewer/subscribe client. createDecartClient().realtime becomes { connect, subscribe }, and subscribe now calls a new watch-stream HTTP endpoint to fetch LiveKit credentials and connects via livekit-client to receive remote tracks.

Removes the legacy WebRTC transport and trims/updates observability. The old webrtc-connection.ts/webrtc-manager.ts and WebRTC-specific diagnostics types are deleted, telemetry/stat collection is reconfigured via REALTIME_CONFIG, and new unit tests cover LiveKit handshake/initial-state gating, queue behavior, and token decoding.

Examples and tooling are updated for the new protocol. The ws proxy is reframed as a LiveKit control proxy (types/logging/e2e updated, connect URL changed to /v1/stream), the browser example adds connection/queue callbacks, the SDK README and demo index.html add “watch stream”/subscribe UI, and .gitignore ignores deploy-to-staging.md.

Reviewed by Cursor Bugbot for commit 3e11c0e. Bugbot is set up for automated code reviews on this repo. Configure here.

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 29, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@decartai/sdk@122

commit: 3e11c0e

Comment thread examples/ws-signaling-proxy/README.md Outdated
```
signaling signaling
control control
Client <----WebSocket----> Proxy <----WebSocket----> Decart
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there is an additional web socket to Livekit for signaling

@VerioN1 VerioN1 force-pushed the alon/livekit-prod branch from 8c321b6 to 3a296a6 Compare April 29, 2026 12:46
Comment thread packages/sdk/src/realtime/livekit-connection.ts Outdated
Comment thread packages/sdk/src/realtime/types.ts
@nagar-decart
Copy link
Copy Markdown
Contributor

@VerioN1
Also notice we'll break the queue in demo
Currently we use ad hoc version - https://github.com/DecartAI/lucy14b-rt-demo/blob/af61c87f1f21836d99e9bcfc76ede86ce67a1f28/pnpm-lock.yaml#L176-L178
https://github.com/DecartAI/sdk/commits/opencode/crisp-nebula/
Maybe we actually want to deploy it to demo earlier for early test/feedback
@AdirAmsalem

Comment thread packages/sdk/src/realtime/types.ts Outdated
Comment thread packages/sdk/src/realtime/client.ts Outdated
Comment thread packages/sdk/src/realtime/observability/diagnostics.ts Outdated
@VerioN1 VerioN1 marked this pull request as ready for review May 3, 2026 09:44
Comment thread packages/sdk/src/index.ts Outdated
Comment thread packages/sdk/index.html Outdated
Comment thread packages/sdk/src/realtime/livekit-connection.ts Outdated
Comment thread packages/sdk/src/realtime/livekit-connection.ts Outdated
Comment thread packages/sdk/src/utils/platform.ts
Comment thread packages/sdk/src/realtime/livekit-connection.ts Outdated
Comment thread packages/sdk/src/realtime/livekit-connection.ts Outdated
Comment thread packages/sdk/src/realtime/livekit-manager.ts Outdated
Comment thread packages/sdk/src/realtime/observability/webrtc-stats.ts
Comment thread packages/sdk/src/realtime/livekit-connection.ts Outdated
Comment thread packages/sdk/src/realtime/signaling-channel.ts Outdated
Comment thread examples/sdk-core/realtime/connection-events.ts Outdated
Comment thread packages/sdk/src/realtime/signaling-channel.ts
Comment thread packages/sdk/src/realtime/media-channel.ts
@VerioN1 VerioN1 force-pushed the alon/livekit-prod branch from 0666b36 to e4238db Compare May 14, 2026 15:27
nagar-decart and others added 8 commits May 14, 2026 18:42
Side-by-side WebRTC transport support for the inference server's new
livekit path. aiortc stays the default and is fully back-compat.

- packages/sdk/src/realtime/transports/livekit.ts: new
  LiveKitConnection. Public surface (connect/send/cleanup/
  getPeerConnection/websocketMessagesEmitter/setImageBase64/state)
  matches WebRTCConnection so WebRTCManager can swap implementations.
  Control WS is identical (prompt / set_image / session_id / tick acks);
  the only differences are the media handshake (livekit_join →
  livekit_room_info, then Room.connect + publishTrack).

- packages/sdk/src/realtime/transports/index.ts: shared TransportKind
  type + re-exports.

- packages/sdk/src/realtime/webrtc-manager.ts: gains an optional
  transport: "aiortc" | "livekit" field in WebRTCConfig. The constructor
  dispatches to LiveKitConnection when opted in, WebRTCConnection
  otherwise. All manager state machine logic (reconnect, buffer, emit)
  is transport-agnostic.

- packages/sdk/src/realtime/client.ts: RealTimeClientConnectOptions now
  accepts `transport`; it's threaded into the manager config.

- package.json: adds livekit-client ^2.0.0.

Typecheck passes; all 145 existing unit tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
index.html now has aiortc | livekit radios that feed
realtime.connect({ transport }), so the dev demo at sdk.decart.local
can flip between the two transports without a code change. Default
stays aiortc so existing sanity tests are unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inference server gained an opt-in periodic `{"type": "server_metrics"}`
WS emission (DecartAI/api PR forthcoming) that the webrtc-bench tool
subscribes to for per-session fps / latency / queue-depth numbers.
Surface it through the SDK so consumers can do:

    rtClient.on("serverMetrics", (msg) => ...)

Changes:
- types.ts: new ServerMetricsMessage type; added to IncomingWebRTCMessage.
- webrtc-connection.ts (aiortc): parse `type: "server_metrics"` and emit
  on the internal websocketMessagesEmitter.
- transports/livekit.ts: same, inside handleControlMessage switch.
- client.ts: add `serverMetrics` to public Events, wire the listener so
  the internal emitter fans out to the public RealTimeClient.on surface.

Default off — the server only emits when the client's realtime URL has
`?emit_server_metrics=1`. Normal SDK consumers see nothing unless they
explicitly opt in.

Typecheck passes; 145/145 unit tests still green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Forwards the inference server's E2E pixel-latency handshake (message
type "marker_config") to SDK consumers. Symmetric with serverMetrics —
opt-in via ?pixel_latency=1 on the realtime WS URL.

The webrtc-bench tool uses this to align its PixelMarkerReader's search
window with the server's actual stamp dimensions (which can differ from
the client stamp dims when the server transcodes). Normal consumers
ignore the event.

- types.ts: MarkerConfigMessage + add to IncomingWebRTCMessage union.
- webrtc-connection.ts + transports/livekit.ts: parse type == "marker_config"
  and emit on the transport's websocketMessagesEmitter.
- client.ts: expose as a public markerConfig event on RealTimeClient,
  via the same emitOrBuffer path as serverMetrics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
E2E pixel-latency no longer negotiates stamp dimensions between client
and server — both sides use a fixed protocol and auto-detect the
received scale. The marker_config WS message is gone, so drop the
MarkerConfigMessage type and the event plumbing across client.ts,
webrtc-connection.ts, transports/livekit.ts, and types.ts.

Reverts the prior markerConfig addition on this branch; the webrtc-bench
tool in api#1095 handles scale detection inside its PixelMarkerReader.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rate

Two fixes that let non-aiortc transports see the same `stats` event stream
and that keep the reported outbound bitrate sensible under simulcast:

1. Transport-agnostic stats source.

   Introduce `StatsProvider`: `{ getStats(): Promise<RTCStatsReport> }`.
   `RTCPeerConnection` already satisfies it (aiortc path, back-compat);
   LiveKitConnection now supplies an aggregator that walks every local
   and remote track in the Room, calls `track.getRTCStatsReport()`, and
   merges the per-track reports into one RTCStatsReport-shaped Map.
   That's the minimum surface `WebRTCStatsCollector.parse()` needs — it
   iterates with `.forEach` and keys off `report.type`.

   Before: LiveKitConnection.getPeerConnection() returned null, so the
   SDK never started its stats collector for livekit sessions and no
   `stats` events fired. Now livekit sessions emit stats on the same
   cadence (and with the same payload shape) as aiortc.

   Client code (`startStatsCollection` / `handleConnectionStateChange`)
   now consults `manager.getStatsProvider()` instead of
   `manager.getPeerConnection()`. The identity check (so we don't
   restart the collector on every state change) still works because
   both the provider and the PC are stable references per connection.

2. Simulcast-safe outbound bitrate.

   Simulcast emits one `outbound-rtp` report per spatial layer (3 layers
   is typical). The parser used to overwrite `outboundVideo` with
   whichever layer `forEach` visited last — each layer has its own
   `bytesSent` counter, so across ticks the "last visited" layer would
   alternate and `bytesSent - prevBytesSentVideo` went violently
   negative. We saw `bitrateOutKbps` down to -6589 in bench results.

   Accumulate `bytesSent` + `packetsSent` across every outbound-rtp
   video report; compute the bitrate once, after the forEach, against
   the summed total. Also clamp the result to `Math.max(0, ...)` since
   `bytesSent` can transiently drop when tracks are added/removed
   mid-session (new simulcast layer ramping up, publisher swap).

   For scalar fields (resolution, fps, qualityLimitationReason), pick
   the highest-resolution active layer so reported frame dimensions
   match what's on the wire.

Verified against staging: 3-region x 2-transport smoke produces 0
negative `bitrateOutKbps` samples and livekit scenarios now report
bitrate/fps/rtt/jitter/resolution alongside aiortc.
Bench callers (and presumably other stats consumers) need to know which
ICE candidate path the current session is using — relayed TURN vs
direct UDP, the local/remote IPs and port, the transport protocol.
That signal disappeared when an earlier refactor projected the parser's
output down to just `currentRoundTripTime` + `availableOutgoingBitrate`
on `connection`.

Restore it:

- `WebRTCStats.connection.selectedCandidatePairs: Array<{ local, remote }>`
  exposing { candidateType, address, port, protocol } per side.
- Parser now collects `localCandidateId` / `remoteCandidateId` from
  succeeded candidate-pair reports and, after the main forEach, looks
  each ID up in rawStats to produce the resolved pair (rawStats entry
  order isn't guaranteed — the pair may appear before its referenced
  candidates).
- Handles both the older `ip` and newer `address` fields on
  `local-candidate` / `remote-candidate` reports.

Net effect: bench's `SdkStatsCollector.onStats` (which already
defensively reads `stats.connection.selectedCandidatePairs`) will now
populate `iceCandidate` for every session. Before this change, that
field was always undefined under the SDK transport, so every bench
run logged `iceCandidate: None` and diagnosing relay vs direct
sessions was impossible.
Consumers (benchmark/observability) need the full set of fields that
the WebRTC spec exposes via `RTCInboundRtpStreamStats` /
`RTCOutboundRtpStreamStats` / `RemoteInboundRtpStreamStats`. The SDK's
parser previously projected those down to a small curated set
(bitrate, fps, jitter, freezes) and dropped everything diagnostic —
so downstream code that tried to read e.g. `stats.video.avgJitterBufferMs`
silently got undefined for months.

Restored fields (inbound video):
- framesReceived, keyFramesDecoded
- nackCount, nackCountDelta, pliCount, firCount
- avgDecodeTimeMs (totalDecodeTime / framesDecoded)
- avgProcessingDelayMs (totalProcessingDelay / framesDecoded)
- avgJitterBufferMs (jitterBufferDelay / jitterBufferEmittedCount)
- avgInterFrameDelayMs (totalInterFrameDelay / framesDecoded)
- interFrameDelayVarianceMs (σ from total+totalSquared — tells you
  how much the decoder's inter-frame arrival is jittering)
- jitterBufferTargetDelayMs, jitterBufferMinimumDelayMs (current
  target vs minimum buffer depth — answers "is Chrome running a
  deep adaptive buffer?")
- decoderImplementation

Restored fields (outbound video):
- targetBitrateKbps (BWE's current target — separate from the
  actual-bytes-sent-derived `bitrate` field)
- avgEncodeTimeMs, avgPacketSendDelayMs, avgQp
- nackCount, pliCount, firCount (received from remote — recovery
  request counters)
- retransmittedBytesSent, retransmittedPacketsSent
- encoderImplementation

New block:
- `remoteInbound { fractionLost, jitter, roundTripTime }` from the
  remote-inbound-rtp report. Tells you "what does the remote side
  think about its reception of our outbound" — independent of our
  own observations.

Simulcast aggregation unchanged: the outbound-rtp block still
accumulates per-spatial-layer totals for bytesSent/packetsSent/retransmit
counters, picks scalar fields (resolution, fps, quality-limit,
targetBitrate, avgEncodeTime, encoderImplementation) from the
highest-resolution layer.

All derived averages return null instead of 0 when the denominator is
0 (before any frames decode). Avoids the ambiguity of `avg = 0` meaning
either "genuinely instant" or "no samples yet".

Unblocks bench-side diagnosis of bimodal session behavior: the jitter
buffer depth + inter-frame delay variance + targetBitrate signals,
together, let you tell whether a bad session is running with a deep
receive buffer, irregular decoder input timing, or a BWE that didn't
adapt — each of which points to a different root cause.
VerioN1 and others added 4 commits May 14, 2026 18:43
Add diagnostic logging across the signaling layer so silent retries and
ack-stalls become visible in customer logs and telemetry. Emits
phaseTiming for websocket, initial-prompt, and initial-image phases.
Logs WS close (code, reason, pending acks), room-info timeout, ack
timeouts, server errors, and WS open timeout.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Capture the DisconnectReason argument from RoomEvent.Disconnected, log
it at warn, and propagate it via the disconnected event payload so
upstream consumers can distinguish benign reconnects from real failures.
Also emit phaseTiming for the webrtc-handshake phase covering
room.connect plus local track publish.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ession

Surface every pRetry attempt and reconnect cycle that previously
swallowed silently. Logs each failed attempt with attemptNumber and
retriesLeft, logs permanent-error short-circuit, logs every connection
state transition, and logs scheduleReconnect entry with the cause.
Emits a reconnect diagnostic per attempt (with success flag) and a
phaseTiming for the total connect duration. Threads a logger into
SignalingChannel and MediaChannel so their internal logs share the
same logger as the client.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread packages/sdk/src/realtime/signaling-channel.ts
Comment thread packages/sdk/src/realtime/remote-stream-exposure.ts Outdated
@VerioN1 VerioN1 force-pushed the alon/livekit-prod branch from 83f1828 to c12fde8 Compare May 14, 2026 15:56
@VerioN1 VerioN1 force-pushed the alon/livekit-prod branch from c12fde8 to dfd8c1f Compare May 14, 2026 16:00
Comment thread packages/sdk/src/realtime/stream-session.ts
VerioN1 and others added 4 commits May 14, 2026 19:04
The set_image/prompt ack timer (30s requestTimeoutMs) was running in
parallel with the queue wait, so long queue holds triggered a spurious
"Image send timed out", teardown, and reconnect cycle. Only the
room_info wait paused on queue_position; the ack timer did not. Sending
initial state after room_info ensures the ack timer covers session time
only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary

- Desktop Safari has interop issues with the SDK's default H.264; on
desktop Safari, switch the video pipeline to VP8 in both directions.
- Adds an `isDesktopSafari()` UA helper that excludes iOS / iPadOS
desktop-mode (`MacIntel` + `maxTouchPoints > 1`) and non-Safari engines
(Chrome / Chromium / Edge / Firefox / Opera / Android).
- Publish leg: `videoCodec` threads through `StreamSession` →
`MediaChannel` so `getDefaultVideoPublishOptions()` returns `videoCodec:
"vp8"` when Safari is detected.
- Server-encode leg: prepends `livekit_server_codec=vp8` to the WS query
string. The bouncer already accepts this query param
(`bouncer/src/realtime/stream.py:491`) and forwards it to the inference
server (`inference_server/base_rt_server.py:245` →
`rt/livekit/conn.py`).
- Caller-supplied `queryParams.livekit_server_codec` still wins (spread
order preserves the escape hatch).

## Test plan

- [x] `pnpm typecheck` clean
- [x] `pnpm test` — all 183 unit tests pass
- [x] `pnpm build` succeeds
- [ ] Manual desktop Safari: WS URL contains `livekit_server_codec=vp8`;
`getStats()` shows VP8 on both inbound + outbound video tracks
- [ ] Manual iOS Safari / iPadOS regression: WS URL has no
`livekit_server_codec`; tracks remain H.264 (unchanged)
- [ ] Manual Chrome regression: WS URL has no `livekit_server_codec`;
tracks remain H.264

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Changes WebRTC codec negotiation and server-side codec selection for a
specific browser, which can affect realtime media connectivity and video
quality if detection or codec support is wrong.
> 
> **Overview**
> For **desktop Safari only**, the realtime SDK now forces VP8
end-to-end to avoid H.264 interop issues.
> 
> It adds an `isDesktopSafari()` platform helper and, when detected, (1)
appends `livekit_server_codec=vp8` to the session URL query and (2)
threads a `videoCodec` override through `StreamSession` → `MediaChannel`
so LiveKit publishes local video with VP8 instead of the default
(`h264`).
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
bafa97d. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolve conflicts by keeping the LiveKit branch's implementation:
- Observability folder already evolved beyond main's PR1 orchestrator
- webrtc-connection / webrtc-manager removed (replaced by stream-session,
  media-channel, signaling-channel)
- index.ts, client.ts, subscribe-client.ts, unit.test.ts kept on the
  LiveKit side

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread packages/sdk/src/realtime/observability/webrtc-stats.ts Outdated
VerioN1 and others added 6 commits May 18, 2026 13:57
…agnostic

Replace per-phase phaseTiming diagnostics with one aggregated breakdown
log per connect attempt, capturing per-phase durations and wall-clock
total. Phase boundaries (websocket-open, room-join, initial-state-handshake,
webrtc-handshake, publish-local-track) are tagged from inside the
signaling/media channels; the orchestrator buffers them and emits one
diagnostic on success or failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Capture the decoded base64 size of the initial image (in KB) and surface
it on the client-session-connection-breakdown diagnostic. null when no
image was provided.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 20c5324. Configure here.

Comment thread packages/sdk/index.html
Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread packages/sdk/src/realtime/client.ts
Comment thread CONTEXT.md Outdated
Comment thread packages/sdk/README.md
Comment on lines +101 to +106
realtimeClient.on("connectionChange", (state) => {
if ((state === "connected" || state === "generating") && realtimeClient.subscribeToken) {
const url = new URL("/watch", window.location.origin);
url.searchParams.set("token", realtimeClient.subscribeToken);
setShareUrl(url.toString());
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does it connect to the subscribe snippet below? looks like it only needs the token (like it was previously)

VerioN1 and others added 2 commits May 19, 2026 17:34
Co-authored-by: Cursor <cursoragent@cursor.com>
@AdirAmsalem AdirAmsalem changed the title moving to livekit feat(realtime): moving to livekit May 19, 2026
@VerioN1 VerioN1 merged commit 78b46bf into main May 19, 2026
5 checks passed
@VerioN1 VerioN1 deleted the alon/livekit-prod branch May 19, 2026 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants