Skip to content

fix: vision mixer compositor stall due to start-time-selection race#480

Merged
srperens merged 1 commit intomainfrom
fix/vision-mixer-cpu-caps-negotiation
Apr 8, 2026
Merged

fix: vision mixer compositor stall due to start-time-selection race#480
srperens merged 1 commit intomainfrom
fix/vision-mixer-cpu-caps-negotiation

Conversation

@srperens
Copy link
Copy Markdown
Collaborator

@srperens srperens commented Apr 8, 2026

Summary

  • Fix intermittent compositor stall in vision mixer CPU path (also affects GPU path)
  • The multiview compositor (mv_comp) would sometimes never produce output, causing the entire downstream chain (videoenc → WHEP output) to remain unnegotiated
  • Root cause: GStreamer 1.26 start-time-selection=first race condition — when the aggregator srcpad task runs before any buffer arrives, it selects the absolute monotonic clock time as start time instead of ~0, making the compositor wait for a deadline of 2× system uptime

Root Cause Analysis

With start-time-selection=first, the aggregator's wait_and_check calls gst_aggregator_get_first_buffer_start_time(). If no buffer is available yet (race with srcpad task startup), the code falls through and uses the absolute clock time as the start time. The deadline becomes base_time + start_time ≈ 2× system_uptime, which is never reached.

Intermittent because: if a buffer arrives before the srcpad task's first iteration → correct start time (~0). If the task runs first → broken start time (absolute clock).

Verified via GST_DEBUG logs:

  • mixer (PGM): start time = 0:00:00.007
  • mv_comp (MV): start time = 1:14:40.730 (= system uptime) ✗

Fix

Changed start-time-selection from first to zero for vision mixer compositors. With force-live=true, running time always starts at 0 regardless of clock type (monotonic, PTP, pipeline default).

Test plan

  • Start vision mixer flow with CPU compositor in Docker — verify both PGM and multiview outputs negotiate and produce video
  • Restart flow multiple times to confirm no intermittent stalls
  • Verify GPU path still works (same code path for start-time-selection)
  • Verify with PTP clock (base_time=0 configuration)

🤖 Generated with Claude Code

With start-time-selection=first, a race condition in GStreamer 1.26
causes the multiview compositor to stall indefinitely. When the
aggregator srcpad task runs before any buffer arrives, it falls
through to using the absolute monotonic clock time as start time.
This makes the compositor wait for a deadline of 2× system uptime
(base_time + start_time where both equal the monotonic clock),
which is never reached in a reasonable time.

The bug is intermittent: if a buffer arrives before the srcpad task's
first iteration, the correct near-zero start time is selected.
If the task runs first, the compositor is stuck forever.

Switching to zero is correct for force-live compositors where all
inputs start at running time 0, and works with all clock types
(monotonic, PTP, pipeline default).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@srperens srperens merged commit 8007abe into main Apr 8, 2026
3 checks passed
@srperens srperens deleted the fix/vision-mixer-cpu-caps-negotiation branch April 8, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant