New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ControllerInterface: Fix deadlock when Wii Remote disconnects #11635
ControllerInterface: Fix deadlock when Wii Remote disconnects #11635
Conversation
In UpdateInput, lock m_devices_population_mutex before m_devices_mutex to be consistent with other ControllerInterface functions. Normally the former lock isn't needed in UpdateInput, but when a Wii Remote disconnects it calls RemoveDevice which results in the mutexes being locked in the wrong order.
|
FifoCI detected that this change impacts graphical rendering. Here are the behavior differences detected by the system:
automated-fifoci-reporter |
|
So... yes, this deadlock is definitely possible, I've seen it before too. I'm not really happy with it dropping inputs if the lock is already held elsewhere, but I guess since it already does this for the |
|
We could have Device::UpdateInput return a bool or enum indicating whether the device should be removed or not, and letting the lock_guard on m_devices_mutex fall out of scope so we can call RemoveDevice on any devices normally. That's kind of cumbersome and kludgy though, so I'd rather not if we don't need to. |
|
It took me a good 5min to understand why your PR fixes the problem, so let me summarize things here: Thread A
Note: m_devices_mutex and m_devices_population_mutex are both recursive mutexes, so it's not a problem to double lock from the same thread. Thread B
The following interlacing is possible:
With that in mind, I think your fix is OK, but this design calls for a lot of simplification. I wonder if there's a strong reason to have two mutexes there, this is introducing a lot of complexity and bugs... |
I gave a go at implementing this, it seems to be working well and the code is really not that bad. Regarding the question as to why we have two mutexes (
|
This specific issue was already addressed by dolphin-emu#11635 though I felt like there was something more we could do, and wasn't too happy with the likelihood of devices update calls being skipped (due to `m_devices_population_mutex` being locked). Most importantly, this is the first step in getting rid of the fact that devices are stored as shared pointers all across the code base, as this created many problems with handling their lifetime and destructor (which needs to be called at a specific time for some device types). Devices now hold a unique handle, and they can always be retrieved from it. Currently there's only a way to retrieve them as shared pointer from the handle, but in the future we might make a function like this: `Device* GetDeviceFromHandle(int handle)` and asking to the calling site to lock `m_devices_population_mutex` for the usage duration, or add a new function like "PlatformPopulateDevices()" that takes a callback, which could internally handle the specified device safely. We can slowly adapt to the new "standard" when new input backends work is done, for now this just adds the framework.
This specific issue was already addressed by dolphin-emu#11635 though I felt like there was something more we could do, and wasn't too happy with the likelihood of devices update calls being skipped (due to `m_devices_population_mutex` being locked). Most importantly, this is the first step in getting rid of the fact that devices are stored as shared pointers all across the code base, as this created many problems with handling their lifetime and destructor (which needs to be called at a specific time for some device types). Devices now hold a unique handle, and they can always be retrieved from it. Currently there's only a way to retrieve them as shared pointer from the handle, but in the future we might make a function like this: `Device* GetDeviceFromHandle(int handle)` and asking to the calling site to lock `m_devices_population_mutex` for the usage duration, or add a new function like "PlatformPopulateDevices()" that takes a callback, which could internally handle the specified device safely. We can slowly adapt to the new "standard" when new input backends work is done, for now this just adds the framework.
This specific issue was already addressed by dolphin-emu#11635 though I felt like there was something more we could do, and wasn't too happy with the likelihood of devices update calls being skipped (due to `m_devices_population_mutex` being locked). Most importantly, this is the first step in getting rid of the fact that devices are stored as shared pointers all across the code base, as this created many problems with handling their lifetime and destructor (which needs to be called at a specific time for some device types). Devices now hold a unique handle, and they can always be retrieved from it. Currently there's only a way to retrieve them as shared pointer from the handle, but in the future we might make a function like this: `Device* GetDeviceFromHandle(int handle)` and asking to the calling site to lock `m_devices_population_mutex` for the usage duration, or add a new function like "PlatformPopulateDevices()" that takes a callback, which could internally handle the specified device safely. We can slowly adapt to the new "standard" when new input backends work is done, for now this just adds the framework.
This specific issue was already addressed by dolphin-emu#11635 though I felt like there was something more we could do, and wasn't too happy with the likelihood of devices update calls being skipped (due to `m_devices_population_mutex` being locked). Most importantly, this is the first step in getting rid of the fact that devices are stored as shared pointers all across the code base, as this created many problems with handling their lifetime and destructor (which needs to be called at a specific time for some device types). Devices now hold a unique handle, and they can always be retrieved from it. Currently there's only a way to retrieve them as shared pointer from the handle, but in the future we might make a function like this: `Device* GetDeviceFromHandle(int handle)` and asking to the calling site to lock `m_devices_population_mutex` for the usage duration, or add a new function like "PlatformPopulateDevices()" that takes a callback, which could internally handle the specified device safely. We can slowly adapt to the new "standard" when new input backends work is done, for now this just adds the framework.
|
This specific issue was already addressed by dolphin-emu#11635 though I felt like there was something more we could do, and wasn't too happy with the likelihood of devices update calls being skipped (due to `m_devices_population_mutex` being locked). Most importantly, this is the first step in getting rid of the fact that devices are stored as shared pointers all across the code base, as this created many problems with handling their lifetime and destructor (which needs to be called at a specific time for some device types). Devices now hold a unique handle, and they can always be retrieved from it. Currently there's only a way to retrieve them as shared pointer from the handle, but in the future we might make a function like this: `Device* GetDeviceFromHandle(int handle)` and asking to the calling site to lock `m_devices_population_mutex` for the usage duration, or add a new function like "PlatformPopulateDevices()" that takes a callback, which could internally handle the specified device safely. We can slowly adapt to the new "standard" when new input backends work is done, for now this just adds the framework.
This specific issue was already addressed by dolphin-emu#11635 though I felt like there was something more we could do, and wasn't too happy with the likelihood of devices update calls being skipped (due to `m_devices_population_mutex` being locked).
This specific issue was already addressed by dolphin-emu#11635 though I felt like there was something more we could do, and wasn't too happy with the likelihood of devices update calls being skipped (due to `m_devices_population_mutex` being locked).
Fixes the deadlock below, which is the same bug as issue 13027 and (I think) issue 13154.
The following steps will cause a deadlock which soon leads to a hang:
At this point the deadlock has already happened although the GUI is still responsive for the moment. Any number of things will trigger the actual hang, including:
Explanation:
There are several ControllerInterface functions (including RemoveDevice) which lock both the m_devices_population_mutex and m_devices_mutex in that order. UpdateInput doesn't directly need m_devices_population_mutex and so previously only locked m_devices_mutex.
Most of the time that doesn't cause any issues but ControllerInterface::UpdateInput calls WiimoteController::Device::UpdateInput, which calls ControllerInterface::RemoveDevice if a Wii Remote is disconnected. This tries to lock the two mutexes in the normal order, but because ControllerInterface::UpdateInput already locked m_devices_mutex a deadlock becomes possible. Worse, a number of input backends are trying to call RemoveDevice at the same time which makes it highly likely one of them will lock m_devices_population_mutex before the thread that locked m_devices_mutex (which is usually but not always the hotkey scheduler thread) manages to do so.