Fix random segfault on GTK icon functions #2277
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The segfaults were happening on GTK icon theme functions, which are called via the C++ interface functions such as Gtk::IconTheme::has_icon.
There are multiple modules and threads using this functions on the default icon theme by calling Gtk::IconTheme::get_default(), which returns the same object for all callers, and was causing concurrent access to the same internal data structures on the GTK lib. Even a seemingly read-only function such as has_icon can cause writes due to the internal icon cache being updated.
To avoid this issues, a program wide global mutex must be used to ensure a single thread is accessing the default icon theme instance.
This commit implements wrappers for the existing IconTheme function calls, ensuring the global lock is held while calling the underling GTK functions.
Additional details
I could only reproduce this consistently with icon=true for the sway/window module and on closing/opening the Telegram app (wayland native and does not have a tasbar icon), but from the root cause I was able determine, the issue may happen with other apps and also in other modules that uses the object returned by Gtk::IconTheme::get_default().
I've used the following command to reproduce the issue reliably:
while pidof waybar>/dev/null; do swaymsg '[title="Telegram"] kill'; sleep 0.2; Telegram; sleep 0.2; done
On the master branch, it normally takes just between 5 and 12 iterations to crash waybar.
With this fix it I ran up to thousands of iterations without ever crashing.
Attached the gdb backtrace for one of this segfaults: gdb_backtrace.txt
After some debugging it seemed to be a concurrency issue, which was feasible when verified that the get_default() object was the same across all the code base, and it was being used on different threads.
Using the valgrind tool hellgrind I was able to validate this data race: helgrind_has_icon_conflict.txt
About the solution, it fixes the what I'm confident to be the root cause, but I'm not an everyday C++ programmer and also not familiar with the Gtk lib, so this particular solution may not be best. But I'm happy to refactor it to better fit the project if needed.