Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault when using steam remote play within gamescope #775

Closed
vonzimr opened this issue Feb 5, 2023 · 21 comments
Closed

segfault when using steam remote play within gamescope #775

vonzimr opened this issue Feb 5, 2023 · 21 comments

Comments

@vonzimr
Copy link

vonzimr commented Feb 5, 2023

I'd like to use gamescope as a way of providing a 4k output when trying to stream games from my desktop (currently using an ultrawide). When I start to stream to my TV, gamescope crashes.

System details:
GPU is a Radeon 7900 xtx
Mesa:

        driverID                                             = DRIVER_ID_MESA_RADV
        driverName                                           = radv
        driverInfo                                           = Mesa 23.1.0-devel (git-68e914a4ca)

Pipewire

pipewire 1:0.3.65-6

Console output:

pipewire: format changed (size: 3440x1440, requested 1280x800, format 23, stride 3440, size: 7430400, dmabuf: 0)
gamescope: ../src/pipewire.cpp:477: void stream_handle_add_buffer(void*, pw_buffer*): Assertion `buffer->texture != nullptr' failed.
...
Aborted (core dumped)
X connection to :1 broken (explicit kill or server shutdown).
X connection to :1 broken (explicit kill or server shutdown).
X connection to :1 broken (explicit kill or server shutdown).
Fatal IO error 11 (Resource temporarily unavailable) on X server :1.
X connection to :1 broken (explicit kill or server shutdown).
wine: Unhandled page fault on read access to FFFFFFFFFFFFFFFF at address 00007F3605E39EEC (thread 00ec), starting debugger...
free(): corrupted unsorted chunks
src/common/pipes.cpp (665) : m_pInternalPipe->BRead failed
src/common/pipes.cpp (665) : m_pInternalPipe->BRead failed
assert_20230204175605_3.dmp[127403]: Uploading dump (out-of-process)
/tmp/dumps/assert_20230204175605_3.dmp
XIO:  fatal IO error 22 (Invalid argument) on X server ":1"
      after 99 requests (98 known processed) with 0 events remaining.
XIO:  fatal IO error 22 (Invalid argument) on X server ":1"
      after 15 requests (15 known processed) with 0 events remaining.
assert_20230204175605_3.dmp[127403]: Finished uploading minidump (out-of-process): success = yes
assert_20230204175605_3.dmp[127403]: response: Discarded=1
assert_20230204175605_3.dmp[127403]: file ''/tmp/dumps/assert_20230204175605_3.dmp'', upload yes: ''Discarded=1''

Some more info:

[currentuser@currentuser-b650iaorusultra gamescope]$ coredumpctl info 125192
           PID: 125192 (gamescope-wl)
           UID: 1000 (currentuser)
           GID: 1000 (currentuser)
        Signal: 6 (ABRT)
     Timestamp: Sat 2023-02-04 17:56:05 PST (2min 29s ago)
  Command Line: ./build/src/gamescope -f -e -w 3840 -W 3840 -H 2160 -h 2160 -- steam -tenfoot -steamos -fulldesktopres -nointro -pipewire
    Executable: /home/currentuser/git/gamescope/build/src/gamescope
 Control Group: /user.slice/user-1000.slice/user@1000.service/app.slice/app-org.kde.konsole-f7afb2a408194b28aee9fe1c95893633.scope
          Unit: user@1000.service
     User Unit: app-org.kde.konsole-f7afb2a408194b28aee9fe1c95893633.scope
         Slice: user-1000.slice
     Owner UID: 1000 (currentuser)
       Boot ID: a82848d6c7f24b89b20075c958e21c16
    Machine ID: c50f592b4c524c67a735ebdbcddebab9
      Hostname: currentuser-b650iaorusultra
       Storage: /var/lib/systemd/coredump/core.gamescope-wl.1000.a82848d6c7f24b89b20075c958e21c16.125192.1675562165000000.zst (present)
  Size on Disk: 1.9M
       Message: Process 125192 (gamescope-wl) of user 1000 dumped core.
                
                Stack trace of thread 125216:
                #0  0x00007f6ead3b964c n/a (libc.so.6 + 0x8864c)
                #1  0x00007f6ead369938 raise (libc.so.6 + 0x38938)
                #2  0x00007f6ead35353d abort (libc.so.6 + 0x2253d)
                #3  0x00007f6ead35345c n/a (libc.so.6 + 0x2245c)
                #4  0x00007f6ead362486 __assert_fail (libc.so.6 + 0x31486)
                #5  0x0000563186185334 n/a (/home/currentuser/git/gamescope/build/src/gamescope + 0xe5334)
                #6  0x00007f6ead8feaeb n/a (libpipewire-0.3.so.0 + 0x7caeb)
                #7  0x00007f6ead900576 pw_impl_port_use_buffers (libpipewire-0.3.so.0 + 0x7e576)
                #8  0x00007f6eacb6b67f n/a (libpipewire-module-client-node.so + 0x1167f)
                #9  0x00007f6eacb7a8c7 n/a (libpipewire-module-client-node.so + 0x208c7)
                #10 0x00007f6ea0045b5b n/a (libpipewire-module-protocol-native.so + 0x14b5b)
                #11 0x00007f6ea00461f0 n/a (libpipewire-module-protocol-native.so + 0x151f0)
                #12 0x00007f6eacbb0b27 n/a (libspa-support.so + 0x6b27)
                #13 0x0000563186185969 n/a (/home/currentuser/git/gamescope/build/src/gamescope + 0xe5969)
                #14 0x0000563186187468 n/a (/home/currentuser/git/gamescope/build/src/gamescope + 0xe7468)
                #15 0x00005631861873dd n/a (/home/currentuser/git/gamescope/build/src/gamescope + 0xe73dd)
                #16 0x000056318618734d n/a (/home/currentuser/git/gamescope/build/src/gamescope + 0xe734d)
                #17 0x0000563186187306 n/a (/home/currentuser/git/gamescope/build/src/gamescope + 0xe7306)
                #18 0x00005631861872ea n/a (/home/currentuser/git/gamescope/build/src/gamescope + 0xe72ea)
                #19 0x00007f6ead6d7283 execute_native_thread_routine (libstdc++.so.6 + 0xd7283)
                #20 0x00007f6ead3b78fd n/a (libc.so.6 + 0x868fd)
                #21 0x00007f6ead439d20 n/a (libc.so.6 + 0x108d20)
                
                Stack trace of thread 125192:
                #0  0x00007f6ead42c37f __poll (libc.so.6 + 0xfb37f)
                #1  0x00005631861200e9 n/a (/home/currentuser/git/gamescope/build/src/gamescope + 0x800e9)
                #2  0x000056318611c506 n/a (/home/currentuser/git/gamescope/build/src/gamescope + 0x7c506)
                #3  0x00007f6ead354290 n/a (libc.so.6 + 0x23290)
                #4  0x00007f6ead35434a __libc_start_main (libc.so.6 + 0x2334a)
                #5  0x00005631860c8115 n/a (/home/currentuser/git/gamescope/build/src/gamescope + 0x28115)
                ELF object binary architecture: AMD x86-64

@laverdone
Copy link

me too, same error

@Samsagax
Copy link

Same error here on RX 6900 XT.
https://github.com/Plagman/gamescope/blob/8a3fa88c7d4d7f98436f97eb5a1c3b590964dee5/src/pipewire.cpp#L476-L477

Maybe the function vulkan_acquire_screenshot_texture() needs an overhaul for headless setups?

@safijari
Copy link

This is the exact issue that happens on Steam Deck ValveSoftware/SteamOS#1000

I thought it may be pipewire related as it does not happen if you use pipewire prior to this commit PipeWire/pipewire@7631316

(try with pipewire 0.3.59)

Even if it's an issue with pipewire, gamescope should not implode because of it.

@kisak-valve
Copy link
Member

Hello @safijari, your regression should be reported upstream to pipewire's issue tracker so that those devs can ponder their change in behavior. I've looked around a bit and didn't find anything related on https://gitlab.freedesktop.org/pipewire/pipewire/-/issues.

@safijari
Copy link

@kisak-valve thank you for your response. I'm genuinely not sure what the core issue is. I can certainly report that something is wrong to the pipewire devs but as best as I can tell this is only negatively impacting gamescope. I looked a bit more into it and the issue isn't related to pipewiresrc asking for the wrong image size. vulkan_acquire_screenshot_texture() returns null even if I forcibly inject the correct image size.

@Starsam80
Copy link

This patch should fix the issue

--- a/src/pipewire.cpp
+++ b/src/pipewire.cpp
@@ -358,7 +358,7 @@ static void stream_handle_param_changed(void *data, uint32_t id, const struct sp
 	const struct spa_pod *buffers_param =
 		(const struct spa_pod *) spa_pod_builder_add_object(&builder,
 		SPA_TYPE_OBJECT_ParamBuffers, SPA_PARAM_Buffers,
-		SPA_PARAM_BUFFERS_buffers, SPA_POD_CHOICE_RANGE_Int(buffers, 1, 32),
+		SPA_PARAM_BUFFERS_buffers, SPA_POD_CHOICE_RANGE_Int(buffers, 1, 8),
 		SPA_PARAM_BUFFERS_blocks, SPA_POD_Int(1),
 		SPA_PARAM_BUFFERS_size, SPA_POD_Int(shm_size),
 		SPA_PARAM_BUFFERS_stride, SPA_POD_Int(state->shm_stride),

On my system, PipeWire was trying to use 16 buffers even though only up to 8 are available

@Joshua-Ashton
Copy link
Collaborator

Oh wow, great catch. I didn't even know we got to choose that!

I have seen this issue before and worked around it locally by adding more screenshot buffers.

@Joshua-Ashton
Copy link
Collaborator

Merged that patch, thanks.

@Samsagax
Copy link

Doesn't work for me. Still segfaults. How can I tell if pipewire is using more buffers than it should?

@Starsam80
Copy link

You can try debugging pipewire with

PIPEWIRE_DEBUG=D PIPEWIRE_LOG=>(grep pw_impl_port_use_buffers) build/src/gamescope -- vkcube

(you might need to omit the PIPEWIRE_LOG part depending on your shell/setup) and look for a line that says something like

[D][86964.931019] pw.port      | [     impl-port.c: 1632 pw_impl_port_use_buffers()] 0x56483a396410: 1:0.0: 8 buffers flags:1 state:2 n_mix:1

Also, is it the same error as this? A segfault != a failed assert

vulkan: Unable to acquire screenshot texture. Out of textures.
gamescope: ../src/pipewire.cpp:477: void stream_handle_add_buffer(void *, struct pw_buffer *): Assertion `buffer->texture != nullptr' failed.

If so, you can try increasing the number of textures/buffers allocated here and see if that helps:

--- a/src/rendervulkan.cpp
+++ b/src/rendervulkan.cpp
@@ -107,7 +107,7 @@ struct VulkanOutput_t
 
 	VkFormat outputFormat = VK_FORMAT_UNDEFINED;
 
-	std::array<std::shared_ptr<CVulkanTexture>, 8> pScreenshotImages;
+	std::array<std::shared_ptr<CVulkanTexture>, 32> pScreenshotImages;
 
 	// NIS and FSR
 	std::shared_ptr<CVulkanTexture> tmpOutput;

@Samsagax
Copy link

Thanks @Starsam80 . The error is this when trying to stream from my machine to another or to my phone with Steam Link:

vulkan: Unable to acquire screenshot texture. Out of textures.
gamescope: ../src/pipewire.cpp:477: void stream_handle_add_buffer(void *, struct pw_buffer *): Assertion `buffer->texture != nullptr' failed.

So seems to be different than a segfault. Still can't stream from one to another and the machine that hosts crashes gamescope.
Increasing the number of textures does help (to 32). I can start the stream, but it won't render anything on the client side and the host will completely lock up (I have to force it shut down physically). Using 16 the behaviour seems to be the same, but the GPU will come back.
Got this in dmesg (with 32 textures):

[  218.577649] amdgpu 0000:08:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:24 vmid:1 pasid:32775, for process steam pid 962 thread steam:cs0 pid 1029)
[  218.577655] amdgpu 0000:08:00.0: amdgpu:   in page starting at address 0x000080010712a000 from client 0x12 (VMC)
[  218.577657] amdgpu 0000:08:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00101A31
[  218.577659] amdgpu 0000:08:00.0: amdgpu:      Faulty UTCL2 client ID: VCN1 (0xd)
[  218.577660] amdgpu 0000:08:00.0: amdgpu:      MORE_FAULTS: 0x1
[  218.577661] amdgpu 0000:08:00.0: amdgpu:      WALKER_ERROR: 0x0
[  218.577662] amdgpu 0000:08:00.0: amdgpu:      PERMISSION_FAULTS: 0x3
[  218.577663] amdgpu 0000:08:00.0: amdgpu:      MAPPING_ERROR: 0x0
[  218.577663] amdgpu 0000:08:00.0: amdgpu:      RW: 0x0
[  218.577666] amdgpu 0000:08:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:24 vmid:1 pasid:32775, for process steam pid 962 thread steam:cs0 pid 1029)
[  218.577668] amdgpu 0000:08:00.0: amdgpu:   in page starting at address 0x000080010712b000 from client 0x12 (VMC)
[  218.577669] amdgpu 0000:08:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
[  218.577670] amdgpu 0000:08:00.0: amdgpu:      Faulty UTCL2 client ID: unknown (0x0)
[  218.577671] amdgpu 0000:08:00.0: amdgpu:      MORE_FAULTS: 0x0
[  218.577672] amdgpu 0000:08:00.0: amdgpu:      WALKER_ERROR: 0x0
[  218.577672] amdgpu 0000:08:00.0: amdgpu:      PERMISSION_FAULTS: 0x0
[  218.577673] amdgpu 0000:08:00.0: amdgpu:      MAPPING_ERROR: 0x0
[  218.577674] amdgpu 0000:08:00.0: amdgpu:      RW: 0x0
[  228.795805] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_enc_1.0 timeout, signaled seq=378, emitted seq=379
[  228.796037] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process steam pid 962 thread steam:cs0 pid 1029
[  228.796231] amdgpu 0000:08:00.0: amdgpu: GPU reset begin!
[  232.796104] amdgpu 0000:08:00.0: amdgpu: failed to suspend display audio
[  233.090070] [drm] Register(1) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
[  253.711043] usb 1-2: USB disconnect, device number 3
[  370.105457] INFO: task kworker/2:0H:33 blocked for more than 122 seconds.
[  370.105461]       Tainted: G           OE      6.3.4-zen1-1-zen #1
[  370.105462] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  370.105462] task:kworker/2:0H    state:D stack:0     pid:33    ppid:2      flags:0x00004000
[  370.105465] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[  370.105473] Call Trace:
[  370.105473]  <TASK>
[  370.105474]  __schedule+0xc18/0x1620
[  370.105478]  ? try_to_wake_up+0x69/0xcc0
[  370.105480]  schedule+0x5e/0xd0
[  370.105481]  schedule_timeout+0x30a/0x370
[  370.105483]  dma_fence_default_wait+0x22b/0x270
[  370.105485]  ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[  370.105486]  dma_fence_wait_timeout+0x157/0x1d0
[  370.105488]  dma_resv_wait_timeout+0xc3/0x1c0
[  370.105490]  ttm_bo_delayed_delete+0x32/0xd0 [ttm 9e5ca2359beed24a342cfc8f06479077fc0047c0]
[  370.105496]  process_one_work+0x252/0x460
[  370.105498]  worker_thread+0x55/0x4f0
[  370.105499]  ? __pfx_worker_thread+0x10/0x10
[  370.105500]  kthread+0xde/0x110
[  370.105501]  ? __pfx_kthread+0x10/0x10
[  370.105502]  ret_from_fork+0x2c/0x50
[  370.105505]  </TASK>
[  370.105511] INFO: task kworker/2:1H:110 blocked for more than 122 seconds.
[  370.105512]       Tainted: G           OE      6.3.4-zen1-1-zen #1
[  370.105512] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  370.105513] task:kworker/2:1H    state:D stack:0     pid:110   ppid:2      flags:0x00004000
[  370.105515] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[  370.105519] Call Trace:
[  370.105520]  <TASK>
[  370.105520]  __schedule+0xc18/0x1620
[  370.105522]  schedule+0x5e/0xd0
[  370.105524]  schedule_timeout+0x30a/0x370
[  370.105525]  dma_fence_default_wait+0x22b/0x270
[  370.105526]  ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[  370.105528]  dma_fence_wait_timeout+0x157/0x1d0
[  370.105529]  dma_resv_wait_timeout+0xc3/0x1c0
[  370.105530]  ttm_bo_delayed_delete+0x32/0xd0 [ttm 9e5ca2359beed24a342cfc8f06479077fc0047c0]
[  370.105535]  process_one_work+0x252/0x460
[  370.105537]  worker_thread+0x55/0x4f0
[  370.105538]  ? __pfx_worker_thread+0x10/0x10
[  370.105539]  kthread+0xde/0x110
[  370.105540]  ? __pfx_kthread+0x10/0x10
[  370.105541]  ret_from_fork+0x2c/0x50
[  370.105542]  </TASK>
[  370.105548] INFO: task kworker/3:2:231 blocked for more than 122 seconds.
[  370.105548]       Tainted: G           OE      6.3.4-zen1-1-zen #1
[  370.105549] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  370.105549] task:kworker/3:2     state:D stack:0     pid:231   ppid:2      flags:0x00004000
[  370.105551] Workqueue: events console_callback
[  370.105552] Call Trace:
[  370.105553]  <TASK>
[  370.105553]  __schedule+0xc18/0x1620
[  370.105555]  ? printk_get_next_message+0x8c/0x330
[  370.105557]  schedule+0x5e/0xd0
[  370.105558]  schedule_timeout+0x30a/0x370
[  370.105559]  ? update_load_avg.constprop.0+0x6e/0x4b0
[  370.105561]  __down+0xb3/0x1c0
[  370.105562]  down+0x47/0x60
[  370.105563]  console_lock+0x1a/0x40
[  370.105565]  console_callback+0x23/0x180
[  370.105566]  process_one_work+0x252/0x460
[  370.105567]  worker_thread+0x55/0x4f0
[  370.105568]  ? __pfx_worker_thread+0x10/0x10
[  370.105569]  kthread+0xde/0x110
[  370.105570]  ? __pfx_kthread+0x10/0x10
[  370.105571]  ret_from_fork+0x2c/0x50
[  370.105573]  </TASK>
[  370.105580] INFO: task systemd-logind:565 blocked for more than 122 seconds.
[  370.105581]       Tainted: G           OE      6.3.4-zen1-1-zen #1
[  370.105582] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  370.105582] task:systemd-logind  state:D stack:0     pid:565   ppid:1      flags:0x00004002
[  370.105583] Call Trace:
[  370.105583]  <TASK>
[  370.105584]  __schedule+0xc18/0x1620
[  370.105585]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
[  370.105588]  schedule_preempt_disabled+0x65/0xe0
[  370.105589]  __mutex_lock.constprop.0+0x474/0x770
[  370.105591]  amdgpu_dm_atomic_commit_tail+0x400/0x3b30 [amdgpu 0fae5242ccec342c9464305cb143ca64e69e9228]
[  370.105848]  ? dcn30_populate_dml_writeback_from_context+0x35/0x50 [amdgpu 0fae5242ccec342c9464305cb143ca64e69e9228]
[  370.106082]  ? dcn20_populate_dml_pipes_from_context+0x828/0xd80 [amdgpu 0fae5242ccec342c9464305cb143ca64e69e9228]
[  370.106300]  ? dcn30_validate_bandwidth+0x101/0x2b0 [amdgpu 0fae5242ccec342c9464305cb143ca64e69e9228]
[  370.106506]  ? dc_validate_global_state+0x3db/0x580 [amdgpu 0fae5242ccec342c9464305cb143ca64e69e9228]
[  370.106712]  ? __pfx_amdgpu_vram_mgr_compatible+0x10/0x10 [amdgpu 0fae5242ccec342c9464305cb143ca64e69e9228]
[  370.106897]  ? dma_resv_get_fences+0xa3/0x2c0
[  370.106898]  ? wait_for_completion_timeout+0x13e/0x170
[  370.106900]  ? wait_for_completion_interruptible+0x139/0x1e0
[  370.106900]  ? dm_plane_helper_prepare_fb+0x17f/0x2f0 [amdgpu 0fae5242ccec342c9464305cb143ca64e69e9228]
[  370.107107]  commit_tail+0x94/0x130
[  370.107109]  drm_atomic_helper_commit+0x11a/0x140
[  370.107110]  drm_atomic_commit+0x9a/0xd0
[  370.107112]  ? __pfx___drm_printfn_info+0x10/0x10
[  370.107113]  drm_client_modeset_commit_atomic.constprop.0+0x1c6/0x200
[  370.107114]  drm_client_modeset_commit_locked+0x55/0x190
[  370.107115]  drm_fb_helper_set_par+0x7f/0x100
[  370.107117]  fb_set_var+0x204/0x420
[  370.107119]  ? sock_def_readable+0x42/0xc0
[  370.107120]  ? unix_stream_sendmsg+0x6d9/0x780
[  370.107122]  ? start_flush_work.isra.0+0x20c/0x220
[  370.107124]  fbcon_blank+0x216/0x2e0
[  370.107125]  do_unblank_screen+0xb0/0x1b0
[  370.107126]  complete_change_console+0x58/0x1e0
[  370.107129]  vt_ioctl+0xd82/0x14b0
[  370.107130]  ? tty_ioctl+0x577/0x920
[  370.107131]  tty_ioctl+0x517/0x920
[  370.107133]  __x64_sys_ioctl+0x94/0xd0
[  370.107135]  do_syscall_64+0x60/0x90
[  370.107135]  ? syscall_exit_to_user_mode+0x1b/0x40
[  370.107137]  ? do_syscall_64+0x6c/0x90
[  370.107138]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  370.107139] RIP: 0033:0x7f63a071576f
[  370.107156] RSP: 002b:00007ffef5db5270 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  370.107158] RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 00007f63a071576f
[  370.107158] RDX: 0000000000000001 RSI: 0000000000005605 RDI: 0000000000000017
[  370.107159] RBP: 0000000000000000 R08: 00007ffef5db5270 R09: 00007ffef5db52b0
[  370.107159] R10: 000056415b11d020 R11: 0000000000000246 R12: 000056415b11bc80
[  370.107160] R13: 00007ffef5db5348 R14: 00007ffef5db5350 R15: 000056415b11bc80
[  370.107161]  </TASK>
[  370.107174] INFO: task steamwebhelper:1065 blocked for more than 122 seconds.
[  370.107175]       Tainted: G           OE      6.3.4-zen1-1-zen #1
[  370.107175] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  370.107175] task:steamwebhelper  state:D stack:0     pid:1065  ppid:1051   flags:0x00004002
[  370.107176] Call Trace:
[  370.107177]  <TASK>
[  370.107177]  __schedule+0xc18/0x1620
[  370.107179]  schedule+0x5e/0xd0
[  370.107180]  schedule_timeout+0x30a/0x370
[  370.107181]  ? ttm_resource_move_to_lru_tail+0x131/0x1d0 [ttm 9e5ca2359beed24a342cfc8f06479077fc0047c0]
[  370.107186]  dma_fence_default_wait+0x22b/0x270
[  370.107187]  ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[  370.107188]  dma_fence_wait_timeout+0x157/0x1d0
[  370.107189]  amdgpu_vm_fini+0x13d/0x530 [amdgpu 0fae5242ccec342c9464305cb143ca64e69e9228]
[  370.107325]  ? amdgpu_ctx_mgr_fini+0x113/0x140 [amdgpu 0fae5242ccec342c9464305cb143ca64e69e9228]
[  370.107461]  amdgpu_driver_postclose_kms+0x153/0x2f0 [amdgpu 0fae5242ccec342c9464305cb143ca64e69e9228]
[  370.107587]  drm_file_free+0x21c/0x270
[  370.107589]  drm_release+0xcd/0x1c0
[  370.107590]  __fput+0x97/0x250
[  370.107591]  task_work_run+0x5d/0x90
[  370.107592]  do_exit+0x345/0xb90
[  370.107594]  do_group_exit+0x31/0x80
[  370.107595]  __x64_sys_exit_group+0x18/0x20
[  370.107596]  do_syscall_64+0x60/0x90
[  370.107596]  ? syscall_exit_to_user_mode+0x1b/0x40
[  370.107597]  ? do_syscall_64+0x6c/0x90
[  370.107597]  ? exit_to_user_mode_prepare+0x132/0x1e0
[  370.107599]  ? syscall_exit_to_user_mode+0x1b/0x40
[  370.107600]  ? do_syscall_64+0x6c/0x90
[  370.107600]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[  370.107601] RIP: 0033:0x7f1f22aeabad
[  370.107604] RSP: 002b:00007fff4f60c1b8 EFLAGS: 00000216 ORIG_RAX: 00000000000000e7
[  370.107605] RAX: ffffffffffffffda RBX: 0000062005629a68 RCX: 00007f1f22aeabad
[  370.107605] RDX: 00000000000000e7 RSI: ffffffffffffee48 RDI: 0000000000000022
[  370.107606] RBP: 00007fff4f60c1c0 R08: 0000000000000000 R09: 0000000000000000
[  370.107606] R10: 000000000000009a R11: 0000000000000216 R12: 000006200585c1c0
[  370.107607] R13: 00007fff4f60c410 R14: 00007fff4f60c208 R15: 0000000000000001
[  370.107608]  </TASK>
[  370.107619] INFO: task kworker/2:2H:11760 blocked for more than 122 seconds.
[  370.107620]       Tainted: G           OE      6.3.4-zen1-1-zen #1
[  370.107620] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  370.107621] task:kworker/2:2H    state:D stack:0     pid:11760 ppid:2      flags:0x00004000
[  370.107622] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[  370.107627] Call Trace:
[  370.107627]  <TASK>
[  370.107627]  __schedule+0xc18/0x1620
[  370.107629]  ? try_to_wake_up+0x69/0xcc0
[  370.107630]  schedule+0x5e/0xd0
[  370.107631]  schedule_timeout+0x30a/0x370
[  370.107632]  dma_fence_default_wait+0x22b/0x270
[  370.107633]  ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[  370.107634]  dma_fence_wait_timeout+0x157/0x1d0
[  370.107635]  dma_resv_wait_timeout+0xc3/0x1c0
[  370.107637]  ttm_bo_delayed_delete+0x32/0xd0 [ttm 9e5ca2359beed24a342cfc8f06479077fc0047c0]
[  370.107640]  process_one_work+0x252/0x460
[  370.107642]  worker_thread+0x55/0x4f0
[  370.107643]  ? __pfx_worker_thread+0x10/0x10
[  370.107643]  kthread+0xde/0x110
[  370.107644]  ? __pfx_kthread+0x10/0x10
[  370.107645]  ret_from_fork+0x2c/0x50
[  370.107646]  </TASK>
[  370.107647] INFO: task kworker/2:3H:11761 blocked for more than 122 seconds.
[  370.107647]       Tainted: G           OE      6.3.4-zen1-1-zen #1
[  370.107647] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  370.107648] task:kworker/2:3H    state:D stack:0     pid:11761 ppid:2      flags:0x00004000
[  370.107648] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[  370.107652] Call Trace:
[  370.107652]  <TASK>
[  370.107653]  __schedule+0xc18/0x1620
[  370.107654]  ? try_to_wake_up+0x69/0xcc0
[  370.107655]  schedule+0x5e/0xd0
[  370.107656]  schedule_timeout+0x30a/0x370
[  370.107657]  dma_fence_default_wait+0x22b/0x270
[  370.107658]  ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[  370.107659]  dma_fence_wait_timeout+0x157/0x1d0
[  370.107660]  dma_resv_wait_timeout+0xc3/0x1c0
[  370.107661]  ttm_bo_delayed_delete+0x32/0xd0 [ttm 9e5ca2359beed24a342cfc8f06479077fc0047c0]
[  370.107665]  process_one_work+0x252/0x460
[  370.107666]  worker_thread+0x55/0x4f0
[  370.107667]  ? __pfx_worker_thread+0x10/0x10
[  370.107668]  kthread+0xde/0x110
[  370.107668]  ? __pfx_kthread+0x10/0x10
[  370.107669]  ret_from_fork+0x2c/0x50
[  370.107670]  </TASK>
[  370.107670] INFO: task kworker/2:4H:11762 blocked for more than 122 seconds.
[  370.107671]       Tainted: G           OE      6.3.4-zen1-1-zen #1
[  370.107671] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  370.107671] task:kworker/2:4H    state:D stack:0     pid:11762 ppid:2      flags:0x00004000
[  370.107672] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[  370.107676] Call Trace:
[  370.107676]  <TASK>
[  370.107676]  __schedule+0xc18/0x1620
[  370.107678]  ? try_to_wake_up+0x69/0xcc0
[  370.107678]  schedule+0x5e/0xd0
[  370.107679]  schedule_timeout+0x30a/0x370
[  370.107681]  dma_fence_default_wait+0x22b/0x270
[  370.107681]  ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[  370.107682]  dma_fence_wait_timeout+0x157/0x1d0
[  370.107683]  dma_resv_wait_timeout+0xc3/0x1c0
[  370.107685]  ttm_bo_delayed_delete+0x32/0xd0 [ttm 9e5ca2359beed24a342cfc8f06479077fc0047c0]
[  370.107688]  process_one_work+0x252/0x460
[  370.107689]  worker_thread+0x55/0x4f0
[  370.107690]  ? __pfx_worker_thread+0x10/0x10
[  370.107691]  kthread+0xde/0x110
[  370.107691]  ? __pfx_kthread+0x10/0x10
[  370.107692]  ret_from_fork+0x2c/0x50
[  370.107693]  </TASK>
[  370.107694] INFO: task kworker/2:5H:11763 blocked for more than 122 seconds.
[  370.107694]       Tainted: G           OE      6.3.4-zen1-1-zen #1
[  370.107694] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  370.107695] task:kworker/2:5H    state:D stack:0     pid:11763 ppid:2      flags:0x00004000
[  370.107695] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[  370.107699] Call Trace:
[  370.107699]  <TASK>
[  370.107699]  __schedule+0xc18/0x1620
[  370.107701]  ? try_to_wake_up+0x69/0xcc0
[  370.107702]  schedule+0x5e/0xd0
[  370.107703]  schedule_timeout+0x30a/0x370
[  370.107704]  dma_fence_default_wait+0x22b/0x270
[  370.107705]  ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[  370.107706]  dma_fence_wait_timeout+0x157/0x1d0
[  370.107706]  dma_resv_wait_timeout+0xc3/0x1c0
[  370.107708]  ttm_bo_delayed_delete+0x32/0xd0 [ttm 9e5ca2359beed24a342cfc8f06479077fc0047c0]
[  370.107711]  process_one_work+0x252/0x460
[  370.107712]  worker_thread+0x55/0x4f0
[  370.107713]  ? __pfx_worker_thread+0x10/0x10
[  370.107714]  kthread+0xde/0x110
[  370.107715]  ? __pfx_kthread+0x10/0x10
[  370.107715]  ret_from_fork+0x2c/0x50
[  370.107717]  </TASK>
[  370.107717] INFO: task kworker/2:6H:11764 blocked for more than 122 seconds.
[  370.107717]       Tainted: G           OE      6.3.4-zen1-1-zen #1
[  370.107718] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  370.107718] task:kworker/2:6H    state:D stack:0     pid:11764 ppid:2      flags:0x00004000
[  370.107718] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[  370.107722] Call Trace:
[  370.107723]  <TASK>
[  370.107723]  __schedule+0xc18/0x1620
[  370.107724]  ? try_to_wake_up+0x69/0xcc0
[  370.107725]  schedule+0x5e/0xd0
[  370.107726]  schedule_timeout+0x30a/0x370
[  370.107727]  dma_fence_default_wait+0x22b/0x270
[  370.107728]  ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[  370.107729]  dma_fence_wait_timeout+0x157/0x1d0
[  370.107730]  dma_resv_wait_timeout+0xc3/0x1c0
[  370.107731]  ttm_bo_delayed_delete+0x32/0xd0 [ttm 9e5ca2359beed24a342cfc8f06479077fc0047c0]
[  370.107735]  process_one_work+0x252/0x460
[  370.107736]  worker_thread+0x55/0x4f0
[  370.107737]  ? __pfx_worker_thread+0x10/0x10
[  370.107738]  kthread+0xde/0x110
[  370.107738]  ? __pfx_kthread+0x10/0x10
[  370.107739]  ret_from_fork+0x2c/0x50
[  370.107740]  </TASK>
[  370.107740] Future hung task reports are suppressed, see sysctl kernel.hung_task_warnings

@Starsam80
Copy link

I have no idea what is going on here, but my guess is gamescope (when it has more buffers, something I am still looking into...) is working as it should, but then the video encoding part (vcn_enc) of the GPU driver explodes for some other reason...

@Samsagax
Copy link

I'll open a different issue to track that then. Because the same doesn't happen on other standalone compositors (like mutter).

Thanks for your help @Starsam80

@timjrd
Copy link

timjrd commented Jul 4, 2023

If so, you can try increasing the number of textures/buffers allocated here and see if that helps:

--- a/src/rendervulkan.cpp
+++ b/src/rendervulkan.cpp
@@ -107,7 +107,7 @@ struct VulkanOutput_t
 
 	VkFormat outputFormat = VK_FORMAT_UNDEFINED;
 
-	std::array<std::shared_ptr<CVulkanTexture>, 8> pScreenshotImages;
+	std::array<std::shared_ptr<CVulkanTexture>, 32> pScreenshotImages;
 
 	// NIS and FSR
 	std::shared_ptr<CVulkanTexture> tmpOutput;

This patch is working for me. Thanks!

  • 8 works for low-res
  • 16 or 32 works for 1080p depending on the game
  • 64 works for 1440p

I use 64.

@Starsam80
Copy link

@timjrd (or anyone else with this problem) If you want a more proper fix, you can take a look at this commit I made: f13171e. Hopefully one day I will submit that (and a few other fixes) to the main repo, but I haven't had the time lately.

@Samsagax
Copy link

Samsagax commented Jul 4, 2023

I'll try this tonight and report back. Would be great to have this fixed since currently remote play is broken.

@timjrd
Copy link

timjrd commented Jul 4, 2023

After some quick tests, f13171e fixes Remote Play for me, thanks! 🙂

@ruineka
Copy link

ruineka commented Jul 4, 2023

After some quick tests, f13171e fixes Remote Play for me, thanks! slightly_smiling_face

Confirmed on my end here. This fixes streaming on ChimeraOS.

@laverdone
Copy link

Confirmed, fixed same error on EndevourOS

@Samsagax
Copy link

Samsagax commented Jul 4, 2023

I've tested the patch and it streamed the controller from my phone using Steam link. But in a few seconds the GPU locked the system up entirely with this dmesg:

[  112.079995] amdgpu 0000:08:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:24 vmid:1 pasid:32775, for process steam pid 962 thread steam:cs0 pid 1087)
[  112.080000] amdgpu 0000:08:00.0: amdgpu:   in page starting at address 0x000080010672a000 from client 0x12 (VMC)
[  112.080003] amdgpu 0000:08:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00105631
[  112.080004] amdgpu 0000:08:00.0: amdgpu:      Faulty UTCL2 client ID: VCN0 (0x2b)
[  112.080005] amdgpu 0000:08:00.0: amdgpu:      MORE_FAULTS: 0x1
[  112.080006] amdgpu 0000:08:00.0: amdgpu:      WALKER_ERROR: 0x0                     
[  112.080007] amdgpu 0000:08:00.0: amdgpu:      PERMISSION_FAULTS: 0x3
[  112.080008] amdgpu 0000:08:00.0: amdgpu:      MAPPING_ERROR: 0x0                     
[  112.080008] amdgpu 0000:08:00.0: amdgpu:      RW: 0x0
[  112.080011] amdgpu 0000:08:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:24 vmid:1 pasid:32775, for process steam pid 962 thread steam:cs0 pid 1087)
[  112.080013] amdgpu 0000:08:00.0: amdgpu:   in page starting at address 0x000080010672b000 from client 0x12 (VMC)
[  112.080015] amdgpu 0000:08:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000  [  112.080016] amdgpu 0000:08:00.0: amdgpu:      Faulty UTCL2 client ID: unknown (0x0)
[  112.080016] amdgpu 0000:08:00.0: amdgpu:      MORE_FAULTS: 0x0                       
[  112.080017] amdgpu 0000:08:00.0: amdgpu:      WALKER_ERROR: 0x0
[  112.080018] amdgpu 0000:08:00.0: amdgpu:      PERMISSION_FAULTS: 0x0
[  112.080018] amdgpu 0000:08:00.0: amdgpu:      MAPPING_ERROR: 0x0
[  112.080019] amdgpu 0000:08:00.0: amdgpu:      RW: 0x0
[  122.101998] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_enc_0.0 timeout, signaled seq=260, emitted seq=261                                                           [  122.102250] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process steam pid 962 thread steam:cs0 pid 1087
[  122.102473] amdgpu 0000:08:00.0: amdgpu: GPU reset begin!

What should I test next?

@Starsam80
Copy link

@Samsagax I think that is issue #779

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants