fix(linux): multi-GPU segfault + wlr GPU auto selection, DMA-BUF metadata planes and revert wlr vulkan support#5030
Conversation
41a97fe to
27e14ef
Compare
Bundle ReportBundle size has no change ✅ |
098143f to
3219e55
Compare
3219e55 to
2d212bc
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5030 +/- ##
==========================================
- Coverage 18.18% 18.16% -0.02%
==========================================
Files 109 109
Lines 23536 23545 +9
Branches 10387 10391 +4
==========================================
- Hits 4280 4278 -2
- Misses 16130 17621 +1491
+ Partials 3126 1646 -1480
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 58 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
|
4c7fe2b to
c2145d9
Compare
|
Tested build-Linux-AppImage from the CI artifacts (commit 5d68c4b) against my original reproducer — dual RTX 5090, NixOS 25.11, Hyprland, NVIDIA 595.58.03 open driver, kernel 6.19.10. NVENC path fails immediately with Couldn't initialize EGL display: [00003001] (EGL_NOT_INITIALIZED). Likely a NixOS + AppImage runtime quirk around EGL exposure rather than your code. Backtrace from the Vulkan segfault: |
|
@kabooHD added a fix for segfault |
|
Thanks for the quick fix! @ReenigneArcher can you please trigger CI again so I can grab a fresh build-Linux-AppImage artifact? Building the branch directly on NixOS turned out to be more trouble than it was worth, and the CI artifact from last round tested cleanly (up until the segfault they just patched). |
8674f38 to
6582303
Compare
…vice matching The Vulkan instance was created without enabling VK_EXT_physical_device_drm, but VkPhysicalDeviceDrmPropertiesEXT was chained into vkGetPhysicalDeviceProperties2. On multi-GPU systems the Vulkan loader's physical device terminator would crash in vsnprintf when dispatching the unrecognized pNext struct. Enable the extension at instance creation with a fallback for loaders that don't advertise it.
Query the Vulkan driver for the expected plane count for the given format+modifier combination, instead of blindly counting all DMA-BUF file descriptors as image planes. This fixes vkCreateImage failures on AMD GPUs when the compositor exports buffers with DCC compression metadata planes.
…t available Use platf::resolve_render_device() to select the correct GPU for wlroots screencopy DMA-BUF import. Previously, init_gbm() iterated DRM devices and picked the first render node, which on multi-GPU systems could target the wrong GPU.
Vulkan encoding requires linear DMA-BUF buffers, but adding GBM_BO_USE_LINEAR breaks screencopy on some drivers. Revert all vulkan-related changes to wlr capture to restore compatibility.
d7e1577 to
ffb54a4
Compare
After the RGB→YUV compute shader dispatch, layout and access were updated on the AVVkFrame but queue_family was not. FFmpeg's vulkan_encode_issue() reads queue_family[0] as srcQueueFamilyIndex in the image barrier when transitioning to VIDEO_ENCODE_SRC layout. On exclusive sharing mode configurations (single queue family), the stale value would cause an incorrect queue family ownership transfer, potentially leading to encode corruption. Currently masked on NVIDIA (concurrent sharing mode), but this is a correctness fix per the Vulkan spec.
|
|
Confirmed the wlgrab multi-GPU fix works for the original #5023 scenario. (Please note: https://github.com/LizardByte/Sunshine/actions/runs/24665312160?pr=5030#artifacts didn't work for me Thank you |
|
@kabooHD , the build which worked for you it is correct auto GPU selection + revert of wlgrab vulkan support. The buffer formats (linear, non-linear) for various drivers/gpus break wlr screencopy so I reverted vulkan support in this PR unfortunately. I managed to work AMD GPU + wlgrab + vulkan scenario, but at the same time it broke Nvidia GPU + wlgrab + nvenc (from the users' reports). @ReenigneArcher It can be TODO vulkan + wlgrab/wlr support I guess, if somebody is interested in adding it (specially who owns AMD and Nvidia GPU's to develop it). It seems to be tricky to support both. We have the commit history if it could help in the future (feat(linux): Add Vulkan video encoder (#4603)) |
|
@netnoise fine with me. BTW, do you think this table should be updated to include Vulkan? https://github.com/LizardByte/Sunshine#-feature-compatibility The intent is to track anything that varies between platforms or has incompatibilities in some cases. Can be a separate PR though. This is probably good to go though, I see you got confirmation on 2 of the 3 issues. |



Description
Fixed a crash on multi-GPU systems when Sunshine tries to detect which GPU to use for Vulkan encoding.
Fixed wlr capture picking the wrong GPU on multi-GPU systems, which caused failed imports and memory leaks.
Fixed Vulkan encoding failing after long streaming sessions on AMD GPUs. The compositor can add extra buffer planes for compression, and the encoder now correctly ignores them.
Removed Vulkan encoder support from wlr capture. The buffer formats (linear, non-linear) for various drivers/gpus break wlr screencopy. Vulkan encoding still works with KMS capture and portal.
Screenshot
Issues Fixed or Closed
Roadmap Issues
Type of Change
Checklist
AI Usage