-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crash in eglFini from libglvnd libEGL #103
Comments
Our current workaround is to remove the libglvnd/libEGL.so.1* |
Here's the backtrace for gthumb crash with extra debug
|
Hm. I'll need to reproduce this locally to be sure, but it might be some problem related to loading both EGL and GLX in the same process. |
kbrenneman, possibly. |
@Hussamt @kbrenneman |
@leigh123linux ok, thank you. |
Ah, I've found the problem. It's a combination of having both EGL and GLX loaded, and the way entrypoint patching works. Both __eglFini and __glxFini start by calling into __glDispatchCheckMultithreaded. If the entrypoints are currently patched, then __glDispatchCheckMultithreaded will call the thread attached callback in the vendor. When GLX unloads, it unloads the vendor libraries, so that callback is no longer valid. But, simply clearing the current context doesn't unpatch the entrypoints anymore, because patching and unpatching on every MakeCurrent tended to cause a major performance drop on some badly-behaved programs. So, when libGLX unloads, it leaves dangling pointers to the entrypoint patching callbacks, which __eglFini ends up calling. I think I can fix this particular case just by adding an extra function to force libGLdispatch to unpatch everything. But, it would still run into the same problem if a different thread still had a current context. |
Added a new function to libGLdispatch, __glDispatchForceUnpatch, which forces it to unpatch the OpenGL entrypoints before libEGL or libGLX can unload the vendor library that patched them. If a vendor patches the OpenGL entrypoints, libGLdispatch doesn't unpatch them when that vendor's context is no longer current, because that adds too much overhead to repeated MakeCurrent+LoseCurrent calls. But, that also means that the patch callbacks end up being dangling pointers after the vendor library is unloaded. This mainly shows up at process termination when a process loads both libEGL and libGLX, because __glxFini and __eglFini will both call the vendor's threadAttach callback. Fixes NVIDIA#103
Okay, I think just unconditionally unpatching the OpenGL entrypoints before unloading the vendor libraries is enough to fix this. The only case I can think of where it would break is if another thread was trying to call an OpenGL function while it was being rewritten. But, that would mean that the other thread is trying to call an OpenGL function while the vendor library is being unloaded, so it's going fall apart no matter what we do in libGLdispatch. |
@kbrenneman I have just tested your commit and it fixes the crash, thank you. |
@kbrenneman Your fix works for gnome apps but still seems to fail with kde apps
I have managed to reproduce the issue by running systemsettings5, navigating to the 'desktop effects' tab then closing the window. |
@leigh123linux - To clarify, you're seeing the same crash in KDE without #105 as well? Or did #105 introduce the crash? |
@kbrenneman kde had the issue before your commit and #105 doesn't fix it. |
Okay. In that case, I'll check in the change to fix Gnome and see what the problem is in KDE. It might be something unrelated. |
Okay, I think I've found the problem with KDE. DSO finalizers get just run in the reverse order as their constructors, including DSO's that were loaded with dlopen(). The sequence that we're getting in this case is that it calls the driver's _fini callback, then calls __eglFini and then after that would call __glXFini. So, __eglFini tries to call the driver's thread attached callback after the driver has gone through all of its cleanup. So, I need to rearrange things so that it doesn't try to call into the vendor at all from any of the _fini functions. |
Is that true? It seems counter to comments in glibc's source: /* Lots of fun ahead. We have to call the destructors for all still
loaded objects, in all namespaces. The problem is that the ELF
specification now demands that dependencies between the modules
are taken into account. I.e., the destructor for a module is
called before the ones for any of its dependencies.
To make things more complicated, we cannot simply use the reverse
order of the constructors. Since the user might have loaded objects
using `dlopen' there are possibly several other modules with its
dependencies to be taken into account. Therefore we have to start
determining the order of the modules once again from the beginning. */ |
Oh, I just found what our problem is. We ran into exactly the same problem in GLX, fixed in commit b7d7542, but I never made the corresponding fix for EGL. |
The latest commit fixes the kde/plasma apps crash, thanks again. |
This bug was initialy reported as https://bugzilla.rpmfusion.org/show_bug.cgi?id=4303
Some EGL applications are crashing with a segmentation fault with current libglvnd libEGL
Reproduced with gthumb on fedora 24:
Run gthumb - then exit.
Starting program: /usr/bin/gthumb
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7fffeb998700 (LWP 18735)]
[New Thread 0x7fffeab25700 (LWP 18736)]
[New Thread 0x7fffdd5c6700 (LWP 18738)]
[New Thread 0x7fffc5a3e700 (LWP 18739)]
[New Thread 0x7fffbf6a4700 (LWP 18741)]
Thread 1 "gthumb" received signal SIGSEGV, Segmentation fault.
0x00007fffe89d4d00 in ?? ()
(gdb) bt
#0 0x00007fffe89d4d00 in ()
#1 0x00007ffff366ed89 in __eglFini () at /usr/lib64/libglvnd/libEGL.so.1
#2 0x00007ffff7de94aa in _dl_fini () at /lib64/ld-linux-x86-64.so.2
#3 0x00007ffff45ae1e8 in __run_exit_handlers () at /lib64/libc.so.6
#4 0x00007ffff45ae235 in () at /lib64/libc.so.6
#5 0x00007ffff4595738 in __libc_start_main () at /lib64/libc.so.6
#6 0x0000555555588fb9 in _start ()
The libglvnd package was built with USE_ATTRIBUTE_CONSTRUCTOR enabled.
The text was updated successfully, but these errors were encountered: