Skip to content

fix(linux): multi-GPU segfault + wlr GPU auto selection, DMA-BUF metadata planes and revert wlr vulkan support#5030

Merged
ReenigneArcher merged 5 commits intoLizardByte:masterfrom
neatnoise:vulkan-fixes
Apr 21, 2026
Merged

fix(linux): multi-GPU segfault + wlr GPU auto selection, DMA-BUF metadata planes and revert wlr vulkan support#5030
ReenigneArcher merged 5 commits intoLizardByte:masterfrom
neatnoise:vulkan-fixes

Conversation

@neatnoise
Copy link
Copy Markdown
Contributor

@neatnoise neatnoise commented Apr 19, 2026

Description

Fixed a crash on multi-GPU systems when Sunshine tries to detect which GPU to use for Vulkan encoding.

Fixed wlr capture picking the wrong GPU on multi-GPU systems, which caused failed imports and memory leaks.

Fixed Vulkan encoding failing after long streaming sessions on AMD GPUs. The compositor can add extra buffer planes for compression, and the encoder now correctly ignores them.

Removed Vulkan encoder support from wlr capture. The buffer formats (linear, non-linear) for various drivers/gpus break wlr screencopy. Vulkan encoding still works with KMS capture and portal.

Screenshot

Issues Fixed or Closed

Roadmap Issues

Type of Change

  • feat: New feature (non-breaking change which adds functionality)
  • fix: Bug fix (non-breaking change which fixes an issue)
  • docs: Documentation only changes
  • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semicolons, etc.)
  • refactor: Code change that neither fixes a bug nor adds a feature
  • perf: Code change that improves performance
  • test: Adding missing tests or correcting existing tests
  • build: Changes that affect the build system or external dependencies
  • ci: Changes to CI configuration files and scripts
  • chore: Other changes that don't modify src or test files
  • revert: Reverts a previous commit
  • BREAKING CHANGE: Introduces a breaking change (can be combined with any type above)

Checklist

  • Code follows the style guidelines of this project
  • Code has been self-reviewed
  • Code has been commented, particularly in hard-to-understand areas
  • Code docstring/documentation-blocks for new or existing methods/components have been added or updated
  • Unit tests have been added or updated for any new or modified functionality

AI Usage

  • None: No AI tools were used in creating this PR
  • Light: AI provided minor assistance (formatting, simple suggestions)
  • Moderate: AI helped with code generation or debugging specific parts
  • Heavy: AI generated most or all of the code changes

@neatnoise neatnoise mentioned this pull request Apr 19, 2026
2 tasks
@neatnoise neatnoise changed the title fix(linux/vulkan): use correct queue family and plane aspects for DMA-BUF barriers fix(linux): Vulkan DMA-BUF barriers, wlroots GBM allocation, and multi-GPU device selection Apr 19, 2026
@neatnoise neatnoise force-pushed the vulkan-fixes branch 5 times, most recently from 41a97fe to 27e14ef Compare April 19, 2026 13:01
Comment thread src/platform/linux/vulkan_encode.cpp Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 19, 2026

Bundle Report

Bundle size has no change ✅

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 20, 2026

Codecov Report

❌ Patch coverage is 0% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 18.16%. Comparing base (baa1a91) to head (ff14fee).
⚠️ Report is 1 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/platform/linux/vulkan_encode.cpp 0.00% 26 Missing ⚠️
src/platform/linux/wayland.cpp 0.00% 4 Missing ⚠️
src/platform/linux/wlgrab.cpp 0.00% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5030      +/-   ##
==========================================
- Coverage   18.18%   18.16%   -0.02%     
==========================================
  Files         109      109              
  Lines       23536    23545       +9     
  Branches    10387    10391       +4     
==========================================
- Hits         4280     4278       -2     
- Misses      16130    17621    +1491     
+ Partials     3126     1646    -1480     
Flag Coverage Δ
Archlinux 11.57% <0.00%> (-0.01%) ⬇️
FreeBSD-14.3-aarch64 ?
FreeBSD-14.3-amd64 13.75% <0.00%> (-0.01%) ⬇️
Homebrew-ubuntu-22.04 13.94% <0.00%> (-0.01%) ⬇️
Linux-AppImage 12.52% <0.00%> (-0.01%) ⬇️
Windows-AMD64 14.88% <ø> (-0.01%) ⬇️
Windows-ARM64 13.21% <ø> (ø)
macOS-arm64 19.01% <ø> (ø)
macOS-x86_64 18.37% <ø> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
src/platform/linux/wayland.h 0.00% <ø> (ø)
src/platform/linux/wlgrab.cpp 0.00% <0.00%> (ø)
src/platform/linux/wayland.cpp 3.10% <0.00%> (+0.11%) ⬆️
src/platform/linux/vulkan_encode.cpp 0.33% <0.00%> (-0.02%) ⬇️

... and 58 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update baa1a91...ff14fee. Read the comment docs.

@neatnoise neatnoise mentioned this pull request Apr 20, 2026
2 tasks
@neatnoise neatnoise force-pushed the vulkan-fixes branch 2 times, most recently from 4c7fe2b to c2145d9 Compare April 20, 2026 09:39
@kabooHD
Copy link
Copy Markdown

kabooHD commented Apr 20, 2026

Tested build-Linux-AppImage from the CI artifacts (commit 5d68c4b) against my original reproducer — dual RTX 5090, NixOS 25.11, Hyprland, NVIDIA 595.58.03 open driver, kernel 6.19.10.
Ran via appimage-run (and separately via steam-run — same result). used build app image from here: https://github.com/LizardByte/Sunshine/actions/runs/24646068696?pr=5030#artifacts
Two issues observed:

NVENC path fails immediately with Couldn't initialize EGL display: [00003001] (EGL_NOT_INITIALIZED). Likely a NixOS + AppImage runtime quirk around EGL exposure rather than your code.
Sunshine falls through to Vulkan encoder. h264_vulkan initializes, reports Streaming bitrate is 1000000, then segfaults before reaching the main loop.

Backtrace from the Vulkan segfault:
Stack trace of thread 4592:
#0 0x00007f8ab898f918 __strchrnul_evex (/nix/store/km4g87jxsqxvcq344ncyb8h1i6f3cqxh-glibc-2.40-218/lib/libc.so.6 + 0x18f918)
#1 0x00007f8ab886778e __printf_buffer (/nix/store/km4g87jxsqxvcq344ncyb8h1i6f3cqxh-glibc-2.40-218/lib/libc.so.6 + 0x6778e)
#2 0x00007f8ab888ead3 __vsnprintf_internal (/nix/store/km4g87jxsqxvcq344ncyb8h1i6f3cqxh-glibc-2.40-218/lib/libc.so.6 + 0x8ead3)
#3 0x00007f8ab95d3841 n/a (libvulkan.so.1 + 0x2e841)
#4 0x00007f8ab95ed5da vkPhysDevExtTermin95 (libvulkan.so.1 + 0x485da)
ELF object binary architecture: AMD x86-64
vsnprintf crashing mid-format call from the Vulkan loader iterating extension terminators — looks like a NULL string pointer being passed somewhere in device/extension name handling. I reproduced the same crash yesterday compiling the branch directly from source outside NixOS, so the ad-hoc build wasn't the cause.
Didn't get far enough to exercise the wlgrab multi-GPU fix from my original issue. Happy to retest when the Vulkan side is sorted out, or try a debug variant if you want more info from the core dump.
Full log attached
sunshine-pr5030-test.log

@neatnoise
Copy link
Copy Markdown
Contributor Author

@kabooHD added a fix for segfault

@kabooHD
Copy link
Copy Markdown

kabooHD commented Apr 20, 2026

Thanks for the quick fix! @ReenigneArcher can you please trigger CI again so I can grab a fresh build-Linux-AppImage artifact? Building the branch directly on NixOS turned out to be more trouble than it was worth, and the CI artifact from last round tested cleanly (up until the segfault they just patched).

@ReenigneArcher
Copy link
Copy Markdown
Member

@neatnoise neatnoise changed the title fix(linux): Vulkan DMA-BUF barriers, wlroots GBM allocation, and multi-GPU device selection fix(linux/vulkan): DMA-BUF barriers, multi-GPU device selection, and revert wlr vulkan support Apr 20, 2026
@neatnoise neatnoise changed the title fix(linux/vulkan): DMA-BUF barriers, multi-GPU device selection, and revert wlr vulkan support fix(linux/vulkan): multi-GPU segfault, DMA-BUF metadata planes, and revert wlr vulkan support Apr 20, 2026
…vice matching

The Vulkan instance was created without enabling VK_EXT_physical_device_drm,
but VkPhysicalDeviceDrmPropertiesEXT was chained into vkGetPhysicalDeviceProperties2.
On multi-GPU systems the Vulkan loader's physical device terminator would
crash in vsnprintf when dispatching the unrecognized pNext struct.

Enable the extension at instance creation with a fallback for loaders
that don't advertise it.
Query the Vulkan driver for the expected plane count for the given
format+modifier combination, instead of blindly counting all DMA-BUF
file descriptors as image planes. This fixes vkCreateImage failures
on AMD GPUs when the compositor exports buffers with DCC compression
metadata planes.
…t available

Use platf::resolve_render_device() to select the correct GPU for
wlroots screencopy DMA-BUF import. Previously, init_gbm() iterated
DRM devices and picked the first render node, which on multi-GPU
systems could target the wrong GPU.
Vulkan encoding requires linear DMA-BUF buffers, but adding
GBM_BO_USE_LINEAR breaks screencopy on some drivers. Revert all
vulkan-related changes to wlr capture to restore compatibility.
@neatnoise neatnoise changed the title fix(linux/vulkan): multi-GPU segfault, DMA-BUF metadata planes, and revert wlr vulkan support fix(linux): multi-GPU segfault + wlr GPU auto selection, DMA-BUF metadata planes and revert wlr vulkan support Apr 20, 2026
After the RGB→YUV compute shader dispatch, layout and access were
updated on the AVVkFrame but queue_family was not. FFmpeg's
vulkan_encode_issue() reads queue_family[0] as srcQueueFamilyIndex
in the image barrier when transitioning to VIDEO_ENCODE_SRC layout.

On exclusive sharing mode configurations (single queue family), the
stale value would cause an incorrect queue family ownership transfer,
potentially leading to encode corruption.

Currently masked on NVIDIA (concurrent sharing mode), but this is a
correctness fix per the Vulkan spec.
@sonarqubecloud
Copy link
Copy Markdown

@kabooHD
Copy link
Copy Markdown

kabooHD commented Apr 21, 2026

Confirmed the wlgrab multi-GPU fix works for the original #5023 scenario.
Tested CI artifact on dual RTX 5090 / NixOS 25.11 / Hyprland / NVIDIA 595.58.03 open driver, with no adapter_name set and no environment workarounds.
Can't fully exercise streaming on this system because appimage-run's sandbox isolates libEGL from /run/opengl-driver/lib, so NVENC and Vulkan both fail to init and Sunshine falls back to software encoding which can't sustain 1080p60 here. But the important part is the leak check:
During a Moonlight connection attempt (which failed on the encoder side):
GPU 0 VRAM: 292 MiB
GPU 1 VRAM: 2 MiB
FDs on /dev/dri/renderD128: 0
FDs on /dev/dri/renderD129: 0
Pre-fix, #5023 would have shown hundreds of FDs on renderD129 within seconds of connect and GPU 1 VRAM climbing into gigabytes via the failed dmabuf import retry loop. None of that happens now — the capture-side code correctly selected a render device (or failed cleanly without targeting the wrong one) even under the encoder-failure codepath.
I'll verify the full streaming path once this merges and lands in nixpkgs. Thanks for the fix.

(Please note: https://github.com/LizardByte/Sunshine/actions/runs/24665312160?pr=5030#artifacts didn't work for me
Instead I used https://github.com/LizardByte/Sunshine/actions/runs/24696466510)

Thank you

@neatnoise
Copy link
Copy Markdown
Contributor Author

@kabooHD , the build which worked for you it is correct auto GPU selection + revert of wlgrab vulkan support. The buffer formats (linear, non-linear) for various drivers/gpus break wlr screencopy so I reverted vulkan support in this PR unfortunately. I managed to work AMD GPU + wlgrab + vulkan scenario, but at the same time it broke Nvidia GPU + wlgrab + nvenc (from the users' reports).

@ReenigneArcher It can be TODO vulkan + wlgrab/wlr support I guess, if somebody is interested in adding it (specially who owns AMD and Nvidia GPU's to develop it). It seems to be tricky to support both. We have the commit history if it could help in the future (feat(linux): Add Vulkan video encoder (#4603))

@ReenigneArcher
Copy link
Copy Markdown
Member

@netnoise fine with me. BTW, do you think this table should be updated to include Vulkan? https://github.com/LizardByte/Sunshine#-feature-compatibility

The intent is to track anything that varies between platforms or has incompatibilities in some cases. Can be a separate PR though.

This is probably good to go though, I see you got confirmation on 2 of the 3 issues.

@ReenigneArcher ReenigneArcher merged commit d14ccf2 into LizardByte:master Apr 21, 2026
72 of 73 checks passed
@neatnoise neatnoise deleted the vulkan-fixes branch April 21, 2026 21:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

4 participants