Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decky Loader XWayland Surface Instability #613

Open
Sterophonick opened this issue Sep 5, 2022 · 35 comments
Open

Decky Loader XWayland Surface Instability #613

Sterophonick opened this issue Sep 5, 2022 · 35 comments

Comments

@Sterophonick
Copy link

Sterophonick commented Sep 5, 2022

Hello!

On my Steam Deck, I've been experiencing various crashes on Steam Deck when in Game Mode. When these happen, the currently running game stops, and then gamescope and Steam restarts. The power is not cut, as bluetooth devices remain connected.

Most commonly, these crashes occur when using the in-game overlays, but there are a couple of cases where it happened while playing a game (shown in attached video).

I have run a memtest86 and everything came back as good.

20220902_115314_768x432.mp4

i am so good at doom that i crash my deck

Right here, I have some crash dumps and backtraces from two crashes from using the overlays.

(PID: 1142)
gamescope_1142_bt.log
gamescope_1142_info.log
gamescope_1142.zip (dump)

(PID: 3667)
gamescope_3667_bt.log
gamescope_3667_info.log
gamescope_3667.zip (dump)

My Steam Deck is currently on SteamOS 3.3.1 (20220817.1), and the problem, while uncommon, seems to persist even after switching OS branches or even refreshing the OS.

@Sterophonick Sterophonick changed the title Odd crashes on Steam Deck when using in-game overlays Odd crashes on Steam Deck Sep 5, 2022
@Sterophonick
Copy link
Author

Sterophonick commented Sep 7, 2022

Got another crash after opening the on-screen keyboard a handful of times.

gamescope_13003_bt.log
gamescope_13003_info.log
gamescope_13003.zip

Encountered on SteamOS 3.3.1 (20220812.101)

@1basti1
Copy link

1basti1 commented Sep 13, 2022

Hello, I don't have any logs. Because I wouldn't know where to find them. I'm just an end user.

Sooo, since 3 weeks or so, I also experience odd crashes. Mostly when I'm ingame and close quick settings. I can't reproduce it. It's completely random. (happend maybe 4 times in these 3 weeks now in maybe ~30h of gaming)
The deck restarts automatically, but game is obviously closed.

I also have sometimes random Blackscreens in the game mode main menu, it's there, I click something and random blackscreen, controls and sound still working. Good thing i remembered how to restart. because all good after a restart.

@Sterophonick
Copy link
Author

Hello, I don't have any logs. Because I wouldn't know where to find them. I'm just an end user.

Sooo, since 3 weeks or so, I also experience odd crashes. Mostly when I'm ingame and close quick settings. I can't reproduce it. It's completely random. (happend maybe 4 times in these 3 weeks now in maybe ~30h of gaming)
The deck restarts automatically, but game is obviously closed.

I also have sometimes random Blackscreens in the game mode main menu, it's there, I click something and random blackscreen, controls and sound still working. Good thing i remembered how to restart. because all good after a restart.

For me, it's happened way more often. I know I'm on an outdated build of the OS, but going on the Main branch leads to the integrated controllers hitching every so often and it maked shooters unplayable for me.

@1basti1
Copy link

1basti1 commented Sep 13, 2022

Hello, I don't have any logs. Because I wouldn't know where to find them. I'm just an end user.
Sooo, since 3 weeks or so, I also experience odd crashes. Mostly when I'm ingame and close quick settings. I can't reproduce it. It's completely random. (happend maybe 4 times in these 3 weeks now in maybe ~30h of gaming)
The deck restarts automatically, but game is obviously closed.
I also have sometimes random Blackscreens in the game mode main menu, it's there, I click something and random blackscreen, controls and sound still working. Good thing i remembered how to restart. because all good after a restart.

For me, it's happened way more often. I know I'm on an outdated build of the OS, but going on the Main branch leads to the integrated controllers hitching every so often and it maked shooters unplayable for me.

I'm on the latest beta build (iirc this all started with the current beta update, but could be wrong)

What do you mean controllers hitching? Because I don't have any problems. At least I don't notice anything.

I know there were performance problems, after the stable build got 3.3 but these are ok now

@Sterophonick
Copy link
Author

This only happens with the integrated controller, but the problem is how every so often the state of the controller seizes up for a split second. I have proof of this.

Integrated test: https://youtu.be/ToFnC9TDkbo
DualSense test: https://youtu.be/49RnOXsGJc0
Trackpad demonstration: https://youtu.be/zU54BJ7IYGA

Notice how in the trackpad demonstration thr pointer freezes up?

@1basti1
Copy link

1basti1 commented Sep 13, 2022

Oh, I see. But I don't think I have that. I would notice it. I'm so sensible to even small frame time stutters. Strange indeed.

I would need to test it. Maybe I'll later.

@Sterophonick
Copy link
Author

This only happens under Main (20220830.1000) so I have no clue what the deal is. I'm kinda just waiting out SteamOS 3.4 and hoping for a fix. I don't really know who I can talk to at Valve about this.

@failzers
Copy link

Have been experiencing the same exact issue.

@Sterophonick
Copy link
Author

Sterophonick commented Sep 14, 2022

gamescope_12923_info.log
gamescope_12923_bt.log
gamescope_12923.zip

So I tested it under 20220912 (currently under the Main branch) and I decided to record what happened when it crashed.

Video link

Edit: still happens on 20220914.1000

@Sterophonick
Copy link
Author

Sterophonick commented Sep 15, 2022

Seems to be fixed as of 7b51f59.

With it, I couldn't replicate the crash in either video. Keeping this open in case something happens though.

Update: Not fixed, false alarm.

@1basti1
Copy link

1basti1 commented Sep 15, 2022

How do you get that? Automatically?

@Sterophonick
Copy link
Author

sudo pacman -Syu

also i just got it to trigger again by accident, so not fixed. blegh.

@Sterophonick
Copy link
Author

Update: It appears to be caused by Decky Loader (https://github.com/SteamDeckHomebrew/decky-loader)

@failzers
Copy link

failzers commented Sep 16, 2022

Update: It appears to be caused by Decky Loader (https://github.com/SteamDeckHomebrew/decky-loader)

It isn't as far as they're aware. People in their discord have spoken about encountering it without it installed, and I've had a couple of first hand encounters with people who've never had it installed, ever.

@Joshua-Ashton
Copy link
Collaborator

Joshua-Ashton commented Sep 16, 2022

It's definitely related to Decky, I was in a VC with @Sterophonick and they were going back and forth with it enabled/disabled several times, and it only reproduced with it enabled. It's definitely caused by that.

That's not to say its the root cause, or just surfacing an existing problem or something. We probably shouldn't be crashing from a client either way.

The backtrace is very strange, there's a wlr_surface with a bad vtable (doesn't match surface implementation), which causes a crash when setting up the wl_id.

#0  0x00007fbad0ffad22 in raise () at /usr/lib/libc.so.6
#1  0x00007fbad0fe4862 in abort () at /usr/lib/libc.so.6
#2  0x00007fbad0fe4747 in _nl_load_domain.cold () at /usr/lib/libc.so.6
#3  0x00007fbad0ff3616 in  () at /usr/lib/libc.so.6
#4  0x0000556345db9290 in wlr_surface_from_resource (resource=0x556346b39d60) at ../subprojects/wlroots/types/wlr_surface.c:612
#5  0x0000556345d3a36e in gamescope_xwayland_server_t::set_wl_id(wlserver_x11_surface_info*, unsigned int) (this=0x5563469e4c40, surf=0x7fbabe37dfa8, id=56)
    at ../src/wlserver.cpp:1210
#6  0x0000556345d18e27 in handle_wl_surface_id(xwayland_ctx_t*, win*, uint32_t) (ctx=0x7fbabc0772b0, w=0x7fbabe37de90, surfaceID=56)
    at ../src/steamcompmgr.cpp:3569
#7  0x0000556345d19302 in handle_client_message(xwayland_ctx_t*, XClientMessageEvent*) (ctx=0x7fbabc0772b0, ev=0x7fbab77fd8a0)
    at ../src/steamcompmgr.cpp:3697
#8  0x0000556345d1cee4 in dispatch_x11(xwayland_ctx_t*) (ctx=0x7fbabc0772b0) at ../src/steamcompmgr.cpp:4827
#9  0x0000556345d1f1e9 in steamcompmgr_main(int, char**) (argc=28, argv=0x7ffc77d947b8) at ../src/steamcompmgr.cpp:5373
#10 0x0000556345d36a1a in steamCompMgrThreadRun(int, char**) (argc=28, argv=0x7ffc77d947b8) at ../src/main.cpp:578
#11 0x0000556345d37187 in std::__invoke_impl<void, void (*)(int, char**), int, char**>(std::__invoke_other, void (*&&)(int, char**), int&&, char**&&) (__f=
    @0x556346c493b8: 0x556345d369e0 <steamCompMgrThreadRun(int, char**)>) at /usr/include/c++/11.1.0/bits/invoke.h:61
#12 0x0000556345d3709e in std::__invoke<void (*)(int, char**), int, char**>(void (*&&)(int, char**), int&&, char**&&) (__fn=
    @0x556346c493b8: 0x556345d369e0 <steamCompMgrThreadRun(int, char**)>) at /usr/include/c++/11.1.0/bits/invoke.h:96
#13 0x0000556345d36fd1 in std::thread::_Invoker<std::tuple<void (*)(int, char**), int, char**> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>)
    (this=0x556346c493a8) at /usr/include/c++/11.1.0/bits/std_thread.h:253
#14 0x0000556345d36f6e in std::thread::_Invoker<std::tuple<void (*)(int, char**), int, char**> >::operator()() (this=0x556346c493a8)
    at /usr/include/c++/11.1.0/bits/std_thread.h:260
#15 0x0000556345d36f52 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(int, char**), int, char**> > >::_M_run() (this=0x556346c493a0)
    at /usr/include/c++/11.1.0/bits/std_thread.h:211
#16 0x00007fbad13df3c4 in std::execute_native_thread_routine(void*) (__p=0x556346c493a0) at /build/gcc/src/gcc/libstdc++-v3/src/c++11/thread.cc:82
#17 0x00007fbad1193259 in start_thread () at /usr/lib/libpthread.so.0
#18 0x00007fbad10bc5e3 in clone () at /usr/lib/libc.so.6

I made an ASAN build of Gamescope and they were still able to reproduce so it's not memory corruption or bad memory, which was my initial hunch.
I have no idea what Decky does to cause this yet, but I guess I will give it an install and see where stuff falls apart..

@Joshua-Ashton
Copy link
Collaborator

I toggled overlay, over 300 times automatically while running HL2 and didnt get any crash.

I just installed Decky and ran the same script, and it crashed in several seconds. It's definitely related. :P

@Joshua-Ashton
Copy link
Collaborator

Joshua-Ashton commented Sep 16, 2022

Okay, I may have found something after some more investigating with asan + Decky installed:
4f62e5d

This may fix the issue people are seeing. I am seemingly not getting crashes on overlay with it since...

NVM, was just good luck, ugh. Was still a problem though! :p

@failzers
Copy link

That's not to say its the root cause, or just surfacing an existing problem or something. We probably shouldn't be crashing from a client either way.

True, true. Was just talking about some investigations the team has had where they've encountered it on an uninstalled system. Great to see that we're getting somewhere though.

@TrainDoctor
Copy link

Please keep myself and the rest of the decky-loader team on what we can do to help out. We're getting close to a full stable release and we'd love to address this issue before we go for the full release.

@Joshua-Ashton
Copy link
Collaborator

Joshua-Ashton commented Sep 16, 2022

I think its just interfering with the timing of things making a bug that has the potential to happen but doesn't end up surfacing.

When the overlay opens, two 1x1 windows are created and then destroyed by steamwebhelper:

wlserver: [types/wlr_surface.c:742] New wlr_surface 0x55998b7b2b10 (res 0x55998b7b1f40)
wlserver: [types/wlr_surface.c:695] Destroyed wlr_surface 0x55998b7b2b10 (res 0x55998b7b1f40)
wlserver: [types/wlr_surface.c:742] New wlr_surface 0x55998b7b2b10 (res 0x55998b7b1580)
wlserver: [types/wlr_surface.c:695] Destroyed wlr_surface 0x55998b7b2b10 (res 0x55998b7b1580)

In the bad case it ends up looking like this:

wlserver: [types/wlr_surface.c:742] New wlr_surface 0x55998b7b2ba0 (res 0x55998b7b26e0)
wlserver: [types/wlr_surface.c:695] Destroyed wlr_surface 0x55998b7b2ba0 (res 0x55998b7b26e0)
wlserver: [types/wlr_surface.c:742] New wlr_surface 0x55998b7b2ba0 (res 0x55998b7af960)
gamescope: types/wlr_surface.c:612: wlr_surface_from_resource: Assertion `wl_resource_instance_of(resource, &wl_surface_interface, &surface_implementation)' failed.

I think what is happening here is the following:

  • [Client] The wl surface (therefore resource is made) is made
  • [Gamescope] wlserver processes surface creation, etc
  • [Client] X sends us the wl_id to associate with the window via atom prop
  • [Client] Window + surface is deleted
  • [Gamescope] wlserver processes surface/resource deletion
  • [Gamescope] steamcompmgr attempts to associate the with a surface, but it's already been freed?

But that doesn't make sense, because in wl_resource_destroy it seems like it frees the existing resource and inserts a NULL at the id, and we are doing this all in a lock so it can't be halfway through doing that or something either...
https://github.com/wayland-project/wayland/blob/main/src/wayland-server.c#L754

So I am not too sure right now.

@Joshua-Ashton
Copy link
Collaborator

I also tested if we hadn't processed creation fully either, by flushing wayland stuff before set_wl_id was called and that wasn't it either 🤔

@AAGaming00
Copy link

AAGaming00 commented Sep 16, 2022

It's definitely related to Decky, I was in a VC with @Sterophonick and they were going back and forth with it enabled/disabled several times, and it only reproduced with it enabled. It's definitely caused by that.

That's not to say its the root cause, or just surfacing an existing problem or something. We probably shouldn't be crashing from a client either way.

The backtrace is very strange, there's a wlr_surface with a bad vtable (doesn't match surface implementation), which causes a crash when setting up the wl_id.

#0  0x00007fbad0ffad22 in raise () at /usr/lib/libc.so.6
#1  0x00007fbad0fe4862 in abort () at /usr/lib/libc.so.6
#2  0x00007fbad0fe4747 in _nl_load_domain.cold () at /usr/lib/libc.so.6
#3  0x00007fbad0ff3616 in  () at /usr/lib/libc.so.6
#4  0x0000556345db9290 in wlr_surface_from_resource (resource=0x556346b39d60) at ../subprojects/wlroots/types/wlr_surface.c:612
#5  0x0000556345d3a36e in gamescope_xwayland_server_t::set_wl_id(wlserver_x11_surface_info*, unsigned int) (this=0x5563469e4c40, surf=0x7fbabe37dfa8, id=56)
    at ../src/wlserver.cpp:1210
#6  0x0000556345d18e27 in handle_wl_surface_id(xwayland_ctx_t*, win*, uint32_t) (ctx=0x7fbabc0772b0, w=0x7fbabe37de90, surfaceID=56)
    at ../src/steamcompmgr.cpp:3569
#7  0x0000556345d19302 in handle_client_message(xwayland_ctx_t*, XClientMessageEvent*) (ctx=0x7fbabc0772b0, ev=0x7fbab77fd8a0)
    at ../src/steamcompmgr.cpp:3697
#8  0x0000556345d1cee4 in dispatch_x11(xwayland_ctx_t*) (ctx=0x7fbabc0772b0) at ../src/steamcompmgr.cpp:4827
#9  0x0000556345d1f1e9 in steamcompmgr_main(int, char**) (argc=28, argv=0x7ffc77d947b8) at ../src/steamcompmgr.cpp:5373
#10 0x0000556345d36a1a in steamCompMgrThreadRun(int, char**) (argc=28, argv=0x7ffc77d947b8) at ../src/main.cpp:578
#11 0x0000556345d37187 in std::__invoke_impl<void, void (*)(int, char**), int, char**>(std::__invoke_other, void (*&&)(int, char**), int&&, char**&&) (__f=
    @0x556346c493b8: 0x556345d369e0 <steamCompMgrThreadRun(int, char**)>) at /usr/include/c++/11.1.0/bits/invoke.h:61
#12 0x0000556345d3709e in std::__invoke<void (*)(int, char**), int, char**>(void (*&&)(int, char**), int&&, char**&&) (__fn=
    @0x556346c493b8: 0x556345d369e0 <steamCompMgrThreadRun(int, char**)>) at /usr/include/c++/11.1.0/bits/invoke.h:96
#13 0x0000556345d36fd1 in std::thread::_Invoker<std::tuple<void (*)(int, char**), int, char**> >::_M_invoke<0ul, 1ul, 2ul>(std::_Index_tuple<0ul, 1ul, 2ul>)
    (this=0x556346c493a8) at /usr/include/c++/11.1.0/bits/std_thread.h:253
#14 0x0000556345d36f6e in std::thread::_Invoker<std::tuple<void (*)(int, char**), int, char**> >::operator()() (this=0x556346c493a8)
    at /usr/include/c++/11.1.0/bits/std_thread.h:260
#15 0x0000556345d36f52 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)(int, char**), int, char**> > >::_M_run() (this=0x556346c493a0)
    at /usr/include/c++/11.1.0/bits/std_thread.h:211
#16 0x00007fbad13df3c4 in std::execute_native_thread_routine(void*) (__p=0x556346c493a0) at /build/gcc/src/gcc/libstdc++-v3/src/c++11/thread.cc:82
#17 0x00007fbad1193259 in start_thread () at /usr/lib/libpthread.so.0
#18 0x00007fbad10bc5e3 in clone () at /usr/lib/libc.so.6

I made an ASAN build of Gamescope and they were still able to reproduce so it's not memory corruption or bad memory, which was my initial hunch. I have no idea what Decky does to cause this yet, but I guess I will give it an install and see where stuff falls apart..

This may be caused by Decky's QAM injection causing the SP window to destroy the QAM window and create a new one. I can try and remove the window re-creation from Decky (it is just a side effect of how we inject into it) but this is likely still an issue in Gamescope as I've had it occur while in-game without ever opening menus.

@AAGaming00
Copy link

AAGaming00 commented Sep 16, 2022

I have a stashed half-working version of this (the QAM tabs will disappear sometimes but the window is never re-created) that I can build for you if it would be helpful.

I can also provide a debug function to cause that window re-creation next time the quick access menu is opened.

@AAGaming00
Copy link

Does #623 fix this issue or is it unrelated?

@Sterophonick Sterophonick changed the title Odd crashes on Steam Deck Decky Loader XWayland Surface Instability Sep 16, 2022
@failzers
Copy link

but this is likely still an issue in Gamescope as I've had it occur while in-game without ever opening menus.

Yeah, it seems the most reproducible whilst opening overlays, but it does also crash whilst in game with no menus being displayed.

@Joshua-Ashton
Copy link
Collaborator

Does #623 fix this issue or is it unrelated?

It is unrelated.

@Joshua-Ashton
Copy link
Collaborator

This protocol https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/163

This xwayland PR https://gitlab.freedesktop.org/xorg/xserver/-/merge_requests/976

and this Gamescope PR https://github.com/Plagman/gamescope/tree/new-surface-association

Should properly solve the problem.

@dan3093
Copy link

dan3093 commented Sep 24, 2022

@Joshua-Ashton Is your last comment a fix that I can deploy on my own steam deck? Do I just need to be patient and wait for the Decky Loader to get an update?

@Joshua-Ashton
Copy link
Collaborator

I would just wait, there are a lot of still moving parts.

@infernn
Copy link

infernn commented Sep 26, 2022

i'm having system reboot sometimes after i close a game with a screen that say "verify installation" is problem related to decky? should i just disable cef or uninstall completely decky?

@dan3093
Copy link

dan3093 commented Sep 26, 2022

@infernn I completely uninstalled decky after my last post and I have not experienced a single crash since doing so.

@infernn
Copy link

infernn commented Sep 26, 2022

@infernn I completely uninstalled decky after my last post and I have not experienced a single crash since doing so.

Can i Just disable cef in the option ti try It or i have to unistall decky completly?

@Joshua-Ashton
Copy link
Collaborator

You can just disable the CEF option.

@Sterophonick
Copy link
Author

Decky Loader has just pushed a commit that fixed their QAM injection. Closing.

@Joshua-Ashton
Copy link
Collaborator

This hasn't fixed the root cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants