Delay A/V in WebRTC until both tracks have been seen.#1001
Merged
Conversation
For LTX-2 the cloud runs video VAE decode → audio decode sequentially, so audio is ready before video is, often several hundred ms. In aiortc WebRTC, RTCP SR anchors the PTS-to-wallclock mapping based on the time the first frame was seen. Since both tracks start at PTS=0, this caused a permanent desync if one track arrived late relative to the other. Fix this by waiting until both tracks have been seen.
Delay anchoring audio playback until the first buffered frame is available, so startup and bursty generation do not emit premature silence that can throw off AV sync.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Contributor
🚀 fal.ai Preview Deployment
Testing on Cloud |
BuffMcBigHuge
approved these changes
Apr 28, 2026
4 tasks
leszko
added a commit
that referenced
this pull request
Apr 30, 2026
## Summary - CI test runs were timing out at ~2h (recent runs cancelled at 1h43m / 1h56m / 2h0m). Root cause: PR #1001 ("Delay A/V in WebRTC until both tracks have been seen.") changed `AudioProcessingTrack.recv()` so the first call now blocks in `_wait_for_initial_audio()` until enough audio has been buffered, but three tests in `tests/test_audio_processing_track.py` still constructed tracks with `started=False` and fed no / insufficient / paused audio — so `recv()` looped forever. - Drop `started=False` from the silence-fallback tests (`test_no_audio_returns_silence`, `test_paused_returns_silence`, `test_undersized_chunk_returns_silence`) so they exercise the post-anchor `recv()` path, which is the path these silence-fallback assertions actually care about. - Local `pytest` now completes in ~70s instead of hanging. ## Test plan - [x] `uv run pytest tests/test_audio_processing_track.py -v` — 30/30 pass in ~1.2s - [x] `uv run pytest` — full suite completes in ~70s (vs. previously hanging on `test_no_audio_returns_silence`) - [x] `uv run ruff check tests/test_audio_processing_track.py` clean - [x] `uv run ruff format --check tests/test_audio_processing_track.py` clean 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Rafał Leszko <rafal@livepeer.org> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Delay A/V in WebRTC until both tracks have been seen.
For LTX-2 the cloud runs video VAE decode → audio decode sequentially,
so audio is ready before video is, often several hundred ms.
In aiortc WebRTC, RTCP SR anchors the PTS-to-wallclock mapping based
on the time the first frame was seen. Since both tracks start at
PTS=0, this caused a permanent desync if one track arrived late
relative to the other.
Fix this by waiting until both tracks have been seen.
Then a small refactor to how audio tracks are read:
Wait for audio before starting AV sync
Delay anchoring audio playback until the first buffered frame is available,
so startup and bursty generation do not emit premature silence that can
throw off AV sync.