Skip to content

feat: add vision mixer block with PVW/PGM workflow#463

Merged
srperens merged 20 commits intomainfrom
feat/vision-mixer
Mar 27, 2026
Merged

feat: add vision mixer block with PVW/PGM workflow#463
srperens merged 20 commits intomainfrom
feat/vision-mixer

Conversation

@srperens
Copy link
Copy Markdown
Collaborator

Summary

  • Adds a full vision mixer block with preview/program (PVW/PGM) workflow, multiview output, and web-based control UI
  • Supports multi-source groups, CUT/AUTO transitions, DSK (downstream key) overlays, fade-to-black, and configurable background source
  • Includes a dedicated web control page (/vision-mixer.html) with keyboard shortcuts, WHEP preview, and WebSocket state synchronization
  • Cairo-based multiview overlay with clock, source labels, tally borders, and dirty-checking for minimal render overhead
  • All shared types placed in strom-types, OpenAPI schema registered, WebSocket events with ToSchema annotations

Test plan

  • Create a vision mixer block with multiple inputs and verify multiview output renders correctly
  • Test CUT and AUTO transitions between PVW and PGM sources via the web UI
  • Test DSK toggle and fade-to-black via web UI and verify overlay renders
  • Verify WebSocket state resync on page reload
  • Verify background source selection works
  • Run cargo test — all 17 tests pass including OpenAPI snapshot
  • Verify clippy and build are clean

🤖 Generated with Claude Code

Per Enstedt and others added 20 commits March 26, 2026 10:05
New broadcast production block for video source selection and transitions.
Takes 2-10 video inputs, outputs a high-res PGM stream and a multiview
monitor with cairo-drawn overlays (borders, labels, clock).

Block architecture:
- Inputs tee'd to two compositors (distribution + multiview)
- GPU (glvideomixer) primary with CPU (compositor) fallback
- Multiview: 2-5-5 grid layout with PVW/PGM large panels and thumbnails
- Cairo overlay draws colored borders, labels, and local wall clock
- PVW/PGM swap on take (standard broadcast workflow)

API:
- POST /api/flows/{id}/blocks/{id}/preview - select preview source
- Reuses existing transition endpoint for take (cut/fade/dip)
- VisionMixerStateChanged WebSocket event for multi-client sync

Web control UI:
- /player/vision-mixer/{flow_id} serves a switcher control page
- WHEP multiview player + PGM/PVW bus buttons + transition controls
- Keyboard shortcuts: 1-9,0 = PVW select, Space = AUTO, X = CUT

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add "Vision Mixer" action button in block inspector (opens control page)
- Fix button scaling: use flex layout with number-only labels and hover tooltips
- Add connection health indicators (Video/WS status dots with green/yellow/red)
- Add video stall detection (frozen frame warning)
- Add WHEP auto-reconnect on disconnect
- Remove focus outline on buttons to prevent sticky selection ring

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ck to open

Vision mixer improvements:
- DSK (Downstream Keyer): 0-4 graphics overlay inputs on dist compositor
  with high zorder, 1-based API (POST .../dsk {"dsk": 1, "enabled": true}),
  toggle buttons in web UI with clear ON/OFF state
- Pipeline restructure: multiview PGM big display now fed from tee_pgm
  (dist compositor output), so transitions and DSK are visible in multiview
- PVW candidate pads shifted to offset N+1 to accommodate PGM tee pad
- Alpha cleanup during transitions skips DSK pads (index >= num_video_inputs)
- Double-click vision mixer block in graph opens control page in browser
- DSK link order fix: DSK pads created after video inputs for correct indexing
- Push transition type added to web UI
- Default input labels changed to "In 1", "In 2", etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- FTB (Fade to Black): toggle all mixer pads (video + DSK) to alpha=0,
  F keyboard shortcut, pulsing red button when active, auto-un-FTB before
  transitions, "FTB" indicator in multiview PGM via cairo overlay
- VisionMixerFtbChanged WebSocket event for multi-client FTB sync
- VisionMixerDskChanged WebSocket event for multi-client DSK sync
- Fix transition alpha bug: use server's authoritative PGM state instead
  of client-provided from_input to prevent stale alpha values
- Fix button text centering: all buttons use flexbox centering
- DSK initial state: buttons correctly show ON at startup

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Track DSK enabled states in VisionMixerOverlayState (Vec<AtomicBool>)
- Inject ftb_active and dsk_states into page config JSON from overlay state
- JS initializes dskEnabled and ftbActive from server state, not hardcoded
- Call updateFtbButton() on init to reflect correct FTB button state

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…anup

- DSK inputs start disabled (alpha=0) instead of enabled
- FTB is now an animated fade (ease-in-out) instead of instant cut,
  with 500ms default duration in web UI
- FTB un-fade respects each DSK's individual enabled state
- Transitions auto-cancel FTB server-side, avoiding client race condition
- All pad properties (alpha, xpos, ypos, width, height) and control
  bindings are reset before each transition, fixing push/slide leaving
  pads off-screen
- Capsfilters pin pixel-aspect-ratio=1/1 to prevent autovideoconvert
  from negotiating non-square pixels on GPU systems
- Multiview clock shows timezone abbreviation (e.g. CET), refreshed
  every 60s to handle DST transitions, with zero heap allocations
- Thumbnail layout: video at top of slot, label below, border wraps
  full slot area

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace unsafe libc::localtime_r with chrono for Windows compatibility.
Remove orphaned dsk_comp from CPU pipeline that double-linked mixer.src,
making CPU path match GPU path topology.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The cairo-rs crate requires libcairo2-dev at build time, which is not
pulled in by the GStreamer dev packages.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nversion

Add queue_pgm_mv to decouple the distribution and multiview compositors
onto separate threads (both GPU and CPU paths).

Replace CPU videoconvert with glcolorconvert + GL memory capsfilter in
the GPU multiview output chain. This forces BGRA format conversion to
happen on the GPU before gldownload, eliminating the per-frame CPU
videoconvert. Cairo's ARGB32 maps to GStreamer's BGRA on little-endian.

Also update openapi_snapshot.json to match current API schema.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace per-frame cairooverlay with an appsrc-based overlay that pushes
RGBA frames into the MV compositor as a separate GPU-composited pad.
The overlay only re-renders on state changes (~1/sec for clock, instant
on PGM/PVW/FTB switch), eliminating all per-frame CPU overlay work.

Pipeline threading improvements:
- Add queues after each input tee (3 per input) to decouple from
  compositor backpressure
- Add queue after mv_comp to decouple from downstream
- Add queue for overlay appsrc (solves GL caps negotiation timing)

Overlay rendering:
- Cairo renders with R/B-swapped colors, producing correct RGBA output
  without per-pixel byte swapping (1-2ms vs 15-24ms at 720p)
- OverlayRenderer pushes via appsrc with push_sample (caps deferred
  until pipeline PLAYING, matching WHIP input pattern)
- 1Hz timer for clock + trigger_overlay_update for instant state changes

Bug fixes:
- Fix capsfilter lookup in transition reset (capsfilter -> capsfilter_dist)
  which caused all pads to reset to 1920x1080 after every transition
- Fix hardcoded 1920x1080 in TransitionController, now reads from
  capsfilter_dist dynamically
- Extract dist_canvas_size() helper to eliminate resolution fallback
  duplication, defaults from DEFAULT_PGM_RESOLUTION
- Fix resolution fallback in properties::parse_resolution to use the
  provided default parameter instead of hardcoded 1920x1080

Reduce default compositor latency from 200ms to 20ms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move default latency values (20ms) to strom-types constants
(DEFAULT_LATENCY_MS, DEFAULT_MIN_UPSTREAM_LATENCY_MS) instead of
hardcoding in builder.rs and definition.rs separately.

Lower overlay timing logs from info to debug level.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move all hardcoded magic numbers to strom-types vision_mixer constants:
- MV_THUMBNAIL_ZORDER, MV_BIG_DISPLAY_ZORDER, DIST_DSK_BASE_ZORDER,
  MV_OVERLAY_ZORDER
- OVERLAY_FRAMERATE
- TIMEZONE_REFRESH_SECS
- TRANSITION_KEYFRAMES

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…YFRAMES constant

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…r control UI

Overlay text, borders, and padding were hardcoded for 720p. Add a scale
factor (canvas_height / 720) and apply it to border widths, label padding,
and remove restrictive font size clamps so the overlay scales correctly
from tiny canvases up to 4K.

Vision mixer control page: remove unused webrtc.css import that leaked
body padding (gray border), move DSK buttons below input rails, add
minimize mode (M key) with localStorage persistence.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extend the vision mixer to support ordered groups of 1-4 sources for
both PVW and PGM. Groups are composited as split-screen layouts:
fullscreen (1), side-by-side (2), 2-top+1-bottom (3), or 2x2 grid (4).

Source groups use a packed AtomicU64 representation for lock-free reads
from the overlay renderer thread. The distribution compositor positions
each group member at a sub-rectangle, and the multiview PVW display
shows the group layout within the PVW area.

Frontend: hold G + press number keys to toggle sources in/out of the
PVW group. Release G for normal single-select mode. Transitions
between groups use cross-fade; single-to-single transitions use the
existing optimized path unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a background source layer that sits fullscreen behind PGM group
sources at a lower z-order. Visible through gaps in split-screen
layouts (pillarboxing, empty quadrants). Background is exclusive: a
source used as background cannot be selected for PVW/PGM groups.

Controls: hold B + number key to set background (toggle), shown as
yellow border + BG badge on multiview thumbnails and control UI buttons.

Also resync full vision mixer state (PGM/PVW groups, background, FTB,
DSK) from the server on every WebSocket reconnect, preventing stale UI
after connection drops.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cairo is now a build dependency for the vision mixer multiview overlay.
Add it to README, DEVELOPMENT, CONTRIBUTING, and cross-compile docs.
Also add vision mixer to the Features and Blocks lists in README.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The install script downloads pre-built binaries and only needs runtime
GStreamer libraries, not development headers and pkg-config files.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The GStreamer 1.x libnice plugin package on Fedora is
libnice-gstreamer1, not libnice-gstreamer.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@srperens srperens merged commit eca0c60 into main Mar 27, 2026
7 checks passed
@srperens srperens deleted the feat/vision-mixer branch March 27, 2026 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant