-
-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xpra Server Crash with GLib >= 2.76 #3822
Comments
FYI: unless you also attach with a browser session and it is having problems with TIL:
GDK is choking on parsing of an X11 event. You may be able to get more details by running the xpra server with |
Thank you for the information! I just figured websockets would be better because of the bidirectional channel. So, I haven't been able to get it to crash in debug mode. I even started a server without debug mode again to check if the bug got resolved somehow, and it still crashed. I'm going to keep using my normal setup in debug mode to see if it ever crashes. I should have enough disk space to handle the log file. Any other things to try? |
They all are - xpra cannot work without.
Yes, the usual: disable every feature and every encoder (without debug mode) and see if it still crashes. |
It finally died while in debug mode. I attached the last 1000 lines of the large log file that was produced. It should contain all the context of the error. |
Looks like you pressed control-C in window
Which caused a screen update:
And window
The window model cleanup code hits a bunch of errors we ignore since the window is already gone - I think that's safe:
Then we get some more screen updates ( What does look a little bit suspicious is a bunch of
Perhaps b1d2e9e introduced a bug - in which case the commit above should fix that. |
Often I just click control-c on my plotting to clear everything. However, it does crash sometimes when I close with window with the window X close button. I did only have one other window open at the time, an xterm window. Your explanation of the log matches the observed behavior. When it crashes, the window I am trying to close closes, and the remaining windows flash before everything disappears. Naively I would think it still expects the window that just closed to exist and it seg faulting on a null pointer for that window. This doesn't explain why it only happens sometimes, though. I just built xpra with that commit and installed it. It crashed using the same methodology... What I have been trying to figure out is what changed about my setup. I've been using this xpra setup for a couple of years now with no issues then all the sudden, a month ago, this issue showed up. Is it a gtk issue? A gtk python binding issue? All these packages seem to have received small updates over the past month, so I couldn't narrow it down that way. |
I believe that Arch must have upgraded an X11 or GTK library that makes it much more likely to hit the race condition that triggers this bug. |
Race condition makes sense. It's also a lot more reliable in debug mode, so the act of printing out a log message must slow down the right thread preventing the race condition. Any other debug suggestions? Honestly, I'm happily running in debug mode right now. I have plenty of disk space to handle the large log file it produces and don't notice any performance hits. |
This may still trigger the bug and give us more details: XPRA_X11_DEBUG_EVENTS="*" xpra start -d x11,gtk ... |
I ran this |
I can trigger the same crash in Arch by opening and closing the SciTE editor, it happens all the time, can't tell for how long this has been the case, it happens with xpra 4.4.4-1 on my Arch system. Simply opening SciTE and then File->Exit is enough to crash xpra on my machine. |
Assuming that I find the time to install ArchLinux in a VM, are there any applications in the default repositories that can trigger this bug? |
I think any program that spawns windows has the potential to trigger this bug. I ran gtkperf -a from installed from the AUR (https://aur.archlinux.org/gtkperf.git) inside a shell script while loop until it crashed. It took a minute but eventually exhibited the bug's behavior. Is there any other debug information I can provide you? If there is anything else you want to try, I would be happy to check out a development branch (or however you would want to distribute code) you create and test it. |
I can check this later too, I think (not hundred percent sure) that Firefox also triggered it for me, but SciTE was 100% consistent, it was to led me check if this issue had been reported. |
Just did extensive testing xpra segfaults pretty much when closing or exiting any application, examples: open xterm, type exit = crash Sometimes it doesn't crash, but only some times like 1 in 20. I can provide a full core dump if required via private link. At some point in the past it used to work fine in Arch, but since I do not do work in my Arch box often I have lost track of when and what version used to work. It was by chance that I tested it a couple of days ago in Arch. |
Can confirm such segfaults occur on the current Gentoo testing (~amd64), but not on the stable branch. This might help narrowing down the version differences, e.g.: https://packages.gentoo.org/packages/dev-libs/glib |
Gentoo might be easier as a testbed. |
libX11 and xorg-server are the same in stable and testing, but I just tried updating only glib (and its dependency gdbus-codegen) from 2.74.6 (stable) to 2.76.1 (testing), and that alone was enough to trigger segmentation faults in libgdk. I've attached the relevant dmesg output and the complete dependency trees of the broken and working configurations, but the only differences should be glib and gdbus-codegen. This is using the older 4.3.4 release, but I had the same behaviour building from the current master branch. dmesg.txt EDIT: |
@nentibusarchitectura @weingo2 can you try downgrading glib to see if that fixes it? I've scoured those gitlab links and there's nothing related to what we're doing here. For the record, the X11 threading change I was thinking about is this one: XInitThreads in library constructor breaks Motif! and I really don't think that this is relevant here: GTK does not call it and we never call X11 functions from other threads anyway, we even have thread checks in all the Cython bindings. |
My box uses glibc 2.37, I can only downgrade without screwing the system down to 2.36 which doesn't fix the issue, the crashes are still there. Tried to downgrade xpra I can go as low as 4.3.3-r0 and the crashes are still there too. Anything lower than 4.3.3 gives me python errors due to the system not meeting requirements due to rolling nature of dependencies in Arch. Hope this helps. |
The arch changelog shows an upgrade from glib2 2.74.6-1 to glib2 2.76.1-2 over the last couple months. I, too, tried to downgrade glib2 according to https://wiki.archlinux.org/title/Downgrading_packages by using the package cache method. That didn't work, and I quickly wound up with too many dependency conflicts. I then used the Arch Linux Archive method from https://wiki.archlinux.org/title/Arch_Linux_Archive to downgrade me to March 1st, 2023. This date has the older glib2-2.74.6-1 package in it. I also had to remove libgirepository with Needless to say, I am out of commission until I solve the booting issues. If I don't get to it today, I won't be able to get to it until the Monday after next. |
So, I managed to recover my system and get the March 12th, 2023 Arch Linux Archive working. The day after this archive version, the glib library was upgraded. My system does not crash with this archive, I have been running |
I concur. |
Updating Arch installation with glib2 2.76.2-1 from "testing" repository does NOT solve the issue |
Got same issue on Arch . before it was very old ubuntu switched to arch and now all broken cannot start any application. Constant crashes. this is my startup command :
Crash log :
Client show following :
|
create the corral window with the X11 bindings and dispose of it ourselves
So, I tried the GTK approach, which had the merit of being simple to implement but I didn't get very far with it because it caused too many intractable problems with focus and other events going MIA. @neuhalje @frej @chewi @thesamesam @weingo2 please give this commit a try, if I don't hear anything in the next 48 hours then 4.4.5 will be released with this fix as its headline. |
create the corral window with the X11 bindings and dispose of it ourselves
Can confirm that e2dfae9 fixes the segfault I experienced with glib 2.76.2. |
Unfortunately the fix in e2dfae9, although it fixes the the segfault experienced with glib 2.76.2, appears to prevent Firefox from creating its initial window.
This is with 4.4.4 [edit: corrected version] with e2dfae9 cherry picked, as well as with the complete v4.4.x branch at 4a4f45d. |
which is the root window. We don't hardcode 24 bit depth because it should be possible to run the seamless server against a 16-bit display
Saw this with
That's because Firefox (wrongly - but that's a different issue) uses a 32-bit window with transparency and these are hard to create with the correct arguments. |
which is the root window. We don't hardcode 24 bit depth because it should be possible to run the seamless server against a 16-bit display
The seg fault crash seems to be fixed but I also cannot open firefox, thunderbird, or google-chrome (from the aur). Basic guis I run like tcl/tk programs or runelite seem to be fine. Running gimp closed the xpra connection but the server was still in the LIVE state, requiring me to reconnect. But once I reconnected, it worked fine. So, I'd say this specific bug is fixed by a0cc722#diff-a4160a51eb3e2d697991cbebafcd5941baa16ee796c86f6282271fb1e3a41091 Are the firefox, thunderbird, chrome issues worth a separate bug or is it just something you will be aware of in the future? @totaam Thank you for the bug fix! |
@weingo2 you need c7ddee5 as per #3822 (comment) |
@totaam That was it. Thank you again for resolving this! |
v4.4.5 works for me, thanks for the quick fix @totaam! |
Thanks for the fix @totaam ! I cannot replicate anymore the crash with current master 99731ac. But the log reports some error that looks partially caused by the fix/rework. |
@totaam I go ahead reporting here; please let me know if you prefer I open new issues. Edit: moved to #1995 (comment) |
Late to the party, but my issues have been (nearly) resolved. Only very few (one window closing in about 50) crashes left. Thanks for the quick fix! |
I have this issue persisting very frequently. I guess some software is more prone to crashing (MATLAB in my case), but I also had other software cause the crash as well. Both the client and the server are at version 4.4.6. The end of the log reads the following after the crash:
@totaam Let me know if you need anything to squash the rest of the bug. It is somewhat unusable at the moment as I keep losing my work due to the crash. |
The refactoring that morphed gdk windows into |
Also broke |
Also: #4195 |
Describe the bug
I run xpra on Arch Linux on a desktop computer at home, so I can attach a thin client to it wherever I go. I start an xpra server with an xterm then open and close windows from that xterm (running tmux too). It may seem a bit archaic, but I primarily use cli tools that bring up plot windows. I also normally bind a websocket session to a port on the desktop machine then, forward that port over ssh to my local machine then attach to that port. This method has been easier for me than using the native xpra ssh.
Recently, I updated Arch Lunix on the desktop, and the xpra server started sporadically crashing when I open or close windows. When xpra crashes it seems to be segment faulting somewhere inside the GTK library and specifically inside glib. The session becomes "UNKNOWN" and I have to restart the session.
The stack trace from one of these crashes is here:
gdb.txt
A crash does not occur with every window open or close but seems to occur randomly.
To Reproduce
Steps to reproduce the behavior:
xpra start --start=xterm --bind-ws=localhost:10001
ssh -L ...
xpra attach ws://localhost:10001
System Information (please complete the following information):
The text was updated successfully, but these errors were encountered: