feat: bring livepeer runner Kafka events to parity with cloud-relay#969
feat: bring livepeer runner Kafka events to parity with cloud-relay#969emranemran merged 1 commit intomainfrom
Conversation
PR #956 started publishing websocket_connected / websocket_disconnected from the livepeer fal wrapper using the orchestrator-provided manifest_id. But the rest of the session lifecycle (pipeline_loaded, session_created, stream_started, stream_heartbeat, stream_stopped, playback_ready, and the error variants) continued to either not fire or fire with null user_id / connection_id in livepeer mode because the runner built FrameProcessor without those fields and never persisted manifest_id. - Add manifest_id / session_id / connection_info fields to LivepeerSession and populate them right after parsing the job_info (src/scope/cloud/livepeer_app.py). - Thread user_id, session_id, manifest_id (as connection_id), and connection_info into FrameProcessor so every event it emits matches the wrapper's websocket_connected. - Explicitly publish session_created after FrameProcessor.start() and session_closed after stop(), mirroring the shape of the existing webrtc.py emissions — livepeer mode doesn't hit the WebRTC offer handler so this has to happen here. - Swap the pipeline/load body injection to use manifest_id instead of the runner's random internal UUID, so pipeline_loaded correlates too; pass connection_info along. - Allow NOMAD_DC / FAL_JOB_ID / FAL_RUNNER_ID / FAL_LOG_LABELS / FAL_MACHINE_TYPE through the runner subprocess env_allowlist so _build_connection_info() can reconstruct the same dict the wrapper uses. After this, ClickHouse queries filtered by user_id or connection_id (= manifest_id) see the full session lifecycle for livepeer mode, not just the two wrapper-layer events. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: emranemran <emran.mah@gmail.com>
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
🚀 fal.ai Preview Deployment
Livepeer Runner
Testing Livepeer Mode |
| media_publishes: list[MediaPublish | None] = field(default_factory=list) | ||
| user_id: str | None = None | ||
| connection_id: str | None = None | ||
| manifest_id: str | None = None |
There was a problem hiding this comment.
Pretty sure manifest_id is the same as the connection_id above
There was a problem hiding this comment.
... oh dear it's not. Well that's a simple change then
There was a problem hiding this comment.
i made manifest id to be same as connection id and the old conenction id should be gone now.
There was a problem hiding this comment.
I guess the manifest_id does not need to be passed in anymore if that's the case? Would simplify some of the ad hoc checks around here.
There was a problem hiding this comment.
Yeah that's true. I'll clean this up in a follow up and file a ticket so i don't forget.
The previous iteration of this test false-positively passed. It polled any <video> for playback, which always finds the local input preview playing even when the browser↔local-scope WebRTC never completes and no frames ever reach the cloud. The result: ClickHouse saw only websocket_connected / pipeline_loaded / websocket_disconnected — nothing that requires a real round-trip through the livepeer runner. Two fixes: 1. Feed the browser a synthetic camera via --use-fake-device-for-media-stream (plus the Camera input toggle in the UI). This lets getUserMedia() succeed and a real WebRTC peer connection between browser and local scope complete end to end, which triggers CloudTrack._start() → LivepeerClient.start_media() and the "start_stream" trickle control message the runner needs. 2. Assert on the video inside the "Video Output" card, not any <video>. That element only renders when a remoteStream is set, so waiting on its visibility and currentTime > 0 is a true round-trip signal. After frames start flowing, idle 15s so stream_heartbeat events (~every 10s on the runner side) have a chance to fire. Verified locally: test passes in ~2.8 min against scope-livepeer-emran with passthrough. Full event set lands in ClickHouse when paired with the parity PR (#969). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: emranemran <emran.mah@gmail.com>
The previous iteration of this test false-positively passed. It polled any <video> for playback, which always finds the local input preview playing even when the browser↔local-scope WebRTC never completes and no frames ever reach the cloud. The result: ClickHouse saw only websocket_connected / pipeline_loaded / websocket_disconnected — nothing that requires a real round-trip through the livepeer runner. Two fixes: 1. Feed the browser a synthetic camera via --use-fake-device-for-media-stream (plus the Camera input toggle in the UI). This lets getUserMedia() succeed and a real WebRTC peer connection between browser and local scope complete end to end, which triggers CloudTrack._start() → LivepeerClient.start_media() and the "start_stream" trickle control message the runner needs. 2. Assert on the video inside the "Video Output" card, not any <video>. That element only renders when a remoteStream is set, so waiting on its visibility and currentTime > 0 is a true round-trip signal. After frames start flowing, idle 15s so stream_heartbeat events (~every 10s on the runner side) have a chance to fire. Verified locally: test passes in ~2.8 min against scope-livepeer-emran with passthrough. Full event set lands in ClickHouse when paired with the parity PR (#969). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: emranemran <emran.mah@gmail.com>
Squash of feat/test-cloud-connect-tooling (PR #962) onto this branch so we can exercise the parity changes end-to-end via Playwright + skill-driven "test cloud" flow. This commit is a throwaway for verification — once the parity code is signed off, revert this single commit before opening PR #969 for review so the diff stays focused. Squashed from: - feat: add end-to-end cloud-connect test harness and skill - fix(e2e): update cloud-streaming test for graph-mode UI redesign - fix(e2e): actually exercise the livepeer trickle path - feat: lead SKILL with Playwright + fix run-app.sh env var quoting - docs: make the testing-livepeer-fal-deploy skill discoverable - docs: route all "test cloud" prompts to the livepeer skill - feat: skill asks for fal app+env, deploys, then runs Playwright Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: emranemran <emran.mah@gmail.com>
9f655d2 to
291c5c9
Compare
Summary
Follow-up to #956. That PR added
websocket_connected/websocket_disconnectedfrom the livepeer fal wrapper, but the rest of the session lifecycle (pipeline_loaded,session_created,stream_started,stream_heartbeat,stream_stopped,playback_ready,errorvariants) was either not firing or firing with null identifiers in livepeer mode. Cloud-relay mode (fal_app.pypath) already had these events working; this PR brings livepeer mode to the same shape.Root causes (all in the runner, `src/scope/cloud/livepeer_app.py`)
Changes
`src/scope/cloud/livepeer_app.py`
`src/scope/cloud/livepeer_fal_app.py`
Test plan
Not in scope
🤖 Co-authored with Claude Code