Fix VOD-audio session races (v0.1.8)#24
Merged
Merged
Conversation
The egress supervisor wakes every ~2s and, for a Twitch destination with VOD audio enabled, asks Twitch's API to allocate an IVS session and then points egress at the returned URL. The old code spawned that API call on every tick until one came back. When Twitch took longer than 2s to answer, the calls piled up: each allocated a different IVS session and each changed the override URL, which restarted egress. A single stream then showed up as several short sessions in Twitch Inspector, and the first one came up Source-Only. Two guards, both per-destination so multistreaming is unaffected: - Single-flight latch (try_claim_vod_fetch): only one fetch runs at a time. The claim re-checks the override under its mutex, so a fetch that just landed can't let a duplicate slip through. - Session epoch: bumped on every publisher disconnect, under the same mutex that holds the override. A fetch records the epoch when it starts and only writes its result if the epoch still matches. A request that returns after OBS disconnected, with an IVS token now bound to a dead session, is discarded instead of poisoning the next stream. Non-Twitch destinations never enter this branch. The synchronous /obs/multitrack-config proxy is untouched: it runs inside OBS's blocking request before the session starts, so it was never exposed to either race. Also syncs Cargo.lock to the package version (was left at 0.1.6).
complete_vod_fetch applies the session result (epoch-guarded) and releases the in-flight latch in one place, so the whole latch lifecycle (claim in try_claim_vod_fetch, release here) lives on the type that owns it instead of being split into the supervisor closure. The closure is now just fetch, complete, log. Tests: release-on-every-path for complete_vod_fetch (success, stale discard, failure), single-flight under real thread contention, and an apply-vs-disconnect stress test that asserts the override is never left stale regardless of which thread wins.
- Transcoded ladder is only guaranteed for Twitch Partners; Affiliates get it opportunistically and far less so above ~6 Mbps. Note that this is Twitch's allocation behaviour, not the proxy (OBS's native Twitch preset auto-caps bitrate, the Custom-server path InstantClone uses does not). - Longest validated run is ~5h, ~10h cumulative across sessions. - Reframe sync ring-append disk I/O as a deliberate trade-off (page cache is the async buffer) rather than a planned fix. - Bump test count 185 -> 197.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this does
Fixes a race in the Twitch VOD-audio path that could make a stream go live
Source-Only and show several short sessions in Twitch Inspector instead of
one.
The egress supervisor wakes every ~2s and, for a Twitch destination with
VOD audio enabled, asks Twitch's API to allocate an IVS session, then
points egress at the returned URL. The old code spawned that API call on
every tick until one came back. When Twitch answered slower than the 2s
tick, the calls stacked: each allocated a separate IVS session and each
rewrote the override URL, which restarted egress. That is the
multi-session, wrong-broadcast-type behaviour in Inspector (one session
Transmuxed Source-Only, a later one Transcoded).
Two guards, both per-destination so multistreaming is unaffected:
Single-flight latch: only one session fetch is ever in flight. The claim
re-checks the override under its lock, so a fetch that just completed
can't let a duplicate slip through.
Session epoch: bumped on every publisher disconnect, under the same lock
that holds the override. A fetch records the epoch when it starts and
only applies its result if the epoch still matches. A request that
returns after OBS disconnected gets discarded instead of writing a
dead-session IVS URL into the next stream.
Notes for the reviewer
proxy (Enhanced Broadcasting via the registered service, and VOD+EB via
the Launch button) is awaited inside OBS's own blocking request, so it
always sets the override before the session begins and was never exposed
to either race. Left as-is on purpose.
custom RTMP are unchanged.
publisher_token for the staleness check, but it is read by the egress
pump for sequence-header re-sends and does not cover the gap between
disconnect and reconnect, so a dedicated per-destination epoch is cleaner
and keeps that path alone.