Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ControllerInterface: Fix deadlock when Wii Remote disconnects #11635

Merged

Conversation

Dentomologist
Copy link
Contributor

@Dentomologist Dentomologist commented Mar 10, 2023

Fixes the deadlock below, which is the same bug as issue 13027 and (I think) issue 13154.

The following steps will cause a deadlock which soon leads to a hang:

  1. Set Wii Remote 1 to Emulated Wii Remote
  2. Select Connect Wii Remotes for Emulated Controllers
  3. Pair a physical Wii Remote, which gets assigned to Wii Remote 1
  4. Use the Remote's power button to turn it off
  5. Wait until the console prints the error "Wiimote::IORead failed. Disconnecting Wii Remote".

At this point the deadlock has already happened although the GUI is still responsive for the moment. Any number of things will trigger the actual hang, including:

  • Changing a Wii Remote option
  • Starting a game
  • Quitting Dolphin

Explanation:

There are several ControllerInterface functions (including RemoveDevice) which lock both the m_devices_population_mutex and m_devices_mutex in that order. UpdateInput doesn't directly need m_devices_population_mutex and so previously only locked m_devices_mutex.

Most of the time that doesn't cause any issues but ControllerInterface::UpdateInput calls WiimoteController::Device::UpdateInput, which calls ControllerInterface::RemoveDevice if a Wii Remote is disconnected. This tries to lock the two mutexes in the normal order, but because ControllerInterface::UpdateInput already locked m_devices_mutex a deadlock becomes possible. Worse, a number of input backends are trying to call RemoveDevice at the same time which makes it highly likely one of them will lock m_devices_population_mutex before the thread that locked m_devices_mutex (which is usually but not always the hotkey scheduler thread) manages to do so.

In UpdateInput, lock m_devices_population_mutex before m_devices_mutex
to be consistent with other ControllerInterface functions. Normally the
former lock isn't needed in UpdateInput, but when a Wii Remote
disconnects it calls RemoveDevice which results in the mutexes being
locked in the wrong order.
@dolphin-emu-bot
Copy link
Contributor

FifoCI detected that this change impacts graphical rendering. Here are the behavior differences detected by the system:

  • xenoblade-menu on uberogl-lin-radeon: diff

automated-fifoci-reporter

@AdmiralCurtiss
Copy link
Contributor

AdmiralCurtiss commented Mar 11, 2023

So... yes, this deadlock is definitely possible, I've seen it before too. I'm not really happy with it dropping inputs if the lock is already held elsewhere, but I guess since it already does this for the m_devices_mutex the extra dropped inputs in practice are pretty negligible. I do wish we had a better system here though...

@Dentomologist
Copy link
Contributor Author

We could have Device::UpdateInput return a bool or enum indicating whether the device should be removed or not, and letting the lock_guard on m_devices_mutex fall out of scope so we can call RemoveDevice on any devices normally. That's kind of cumbersome and kludgy though, so I'd rather not if we don't need to.

@delroth
Copy link
Member

delroth commented Mar 14, 2023

It took me a good 5min to understand why your PR fixes the problem, so let me summarize things here:

Thread A

  1. ControllerInterface::UpdateInput locks m_devices_mutex
  2. ControllerInterface::UpdateInput calls (indirectly) ControllerInterface::RemoveDevice
  3. ControllerInterface::RemoveDevice locks m_devices_population_mutex then m_devices_mutex

Note: m_devices_mutex and m_devices_population_mutex are both recursive mutexes, so it's not a problem to double lock from the same thread.

Thread B

  1. ControllerInterface::RemoveDevice locks m_devices_population_mutex then m_devices_mutex

The following interlacing is possible:

  • A: ControllerInterface::UpdateInput locks m_devices_mutex
  • B: ControllerInterface::RemoveDevice locks m_devices_population_mutex
  • B: ControllerInterface::RemoveDevice waits for m_devices_mutex to get unlocked
  • A: ControllerInterface::UpdateInput calls (indirectly) ControllerInterface::RemoveDevice
  • A: ControllerInterface::RemoveDevice waits for m_devices_population_mutex to get unlocked

With that in mind, I think your fix is OK, but this design calls for a lot of simplification. I wonder if there's a strong reason to have two mutexes there, this is introducing a lot of complexity and bugs...

@delroth delroth merged commit 3783bed into dolphin-emu:master Mar 14, 2023
14 checks passed
@Dentomologist Dentomologist deleted the wiimote_fix_disconnection_deadlock branch March 14, 2023 01:46
@Filoppi
Copy link
Contributor

Filoppi commented May 24, 2023

We could have Device::UpdateInput return a bool or enum indicating whether the device should be removed or not, and letting the lock_guard on m_devices_mutex fall out of scope so we can call RemoveDevice on any devices normally. That's kind of cumbersome and kludgy though, so I'd rather not if we don't need to.

I gave a go at implementing this, it seems to be working well and the code is really not that bad.
Most importantly, to implement it without adding any further locks, I had to add a unique handle to each ciface::Core::Device object, which will lay the first stone towards getting rid of the necessity of our devices being held in shared pointers (std::shared_ptr<Device>), which gave us so much trouble in the past.

Regarding the question as to why we have two mutexes (m_devices_mutex and m_devices_population_mutex), there's a couple reasons, but I think the main one is explained in a comment within the code:

// We lock m_devices_population_mutex here to make everything simpler.
// Multiple devices classes have their own "hotplug" thread, and can add/remove devices at any
// time, while actual writes to "m_devices" are safe, the order in which they happen is not. That
// means a thread could be adding devices while we are removing them, or removing them as we are
// populating them (causing missing or duplicate devices).
std::lock_guard lk_population(m_devices_population_mutex);

Filoppi added a commit to Filoppi/dolphin that referenced this pull request Jun 1, 2023
This specific issue was already addressed by dolphin-emu#11635
though I felt like there was something more we could do, and wasn't too happy with the
likelihood of devices update calls being skipped (due to `m_devices_population_mutex` being locked).

Most importantly, this is the first step in getting rid of the fact that devices are stored as
shared pointers all across the code base, as this created many problems with handling their
lifetime and destructor (which needs to be called at a specific time for some device types).
Devices now hold a unique handle, and they can always be retrieved from it.

Currently there's only a way to retrieve them as shared pointer from the handle,
but in the future we might make a function like this:
`Device* GetDeviceFromHandle(int handle)` and asking to the calling site to lock
`m_devices_population_mutex` for the usage duration,
or add a new function like "PlatformPopulateDevices()" that takes a callback,
which could internally handle the specified device safely.

We can slowly adapt to the new "standard" when new input backends work is done,
for now this just adds the framework.
Filoppi added a commit to Filoppi/dolphin that referenced this pull request Jun 1, 2023
This specific issue was already addressed by dolphin-emu#11635
though I felt like there was something more we could do, and wasn't too happy with the
likelihood of devices update calls being skipped (due to `m_devices_population_mutex` being locked).

Most importantly, this is the first step in getting rid of the fact that devices are stored as
shared pointers all across the code base, as this created many problems with handling their
lifetime and destructor (which needs to be called at a specific time for some device types).
Devices now hold a unique handle, and they can always be retrieved from it.

Currently there's only a way to retrieve them as shared pointer from the handle,
but in the future we might make a function like this:
`Device* GetDeviceFromHandle(int handle)` and asking to the calling site to lock
`m_devices_population_mutex` for the usage duration,
or add a new function like "PlatformPopulateDevices()" that takes a callback,
which could internally handle the specified device safely.

We can slowly adapt to the new "standard" when new input backends work is done,
for now this just adds the framework.
Filoppi added a commit to Filoppi/dolphin that referenced this pull request Jun 1, 2023
This specific issue was already addressed by dolphin-emu#11635
though I felt like there was something more we could do, and wasn't too happy with the
likelihood of devices update calls being skipped (due to `m_devices_population_mutex` being locked).

Most importantly, this is the first step in getting rid of the fact that devices are stored as
shared pointers all across the code base, as this created many problems with handling their
lifetime and destructor (which needs to be called at a specific time for some device types).
Devices now hold a unique handle, and they can always be retrieved from it.

Currently there's only a way to retrieve them as shared pointer from the handle,
but in the future we might make a function like this:
`Device* GetDeviceFromHandle(int handle)` and asking to the calling site to lock
`m_devices_population_mutex` for the usage duration,
or add a new function like "PlatformPopulateDevices()" that takes a callback,
which could internally handle the specified device safely.

We can slowly adapt to the new "standard" when new input backends work is done,
for now this just adds the framework.
Filoppi added a commit to Filoppi/dolphin that referenced this pull request Jun 1, 2023
This specific issue was already addressed by dolphin-emu#11635
though I felt like there was something more we could do, and wasn't too happy with the
likelihood of devices update calls being skipped (due to `m_devices_population_mutex` being locked).

Most importantly, this is the first step in getting rid of the fact that devices are stored as
shared pointers all across the code base, as this created many problems with handling their
lifetime and destructor (which needs to be called at a specific time for some device types).
Devices now hold a unique handle, and they can always be retrieved from it.

Currently there's only a way to retrieve them as shared pointer from the handle,
but in the future we might make a function like this:
`Device* GetDeviceFromHandle(int handle)` and asking to the calling site to lock
`m_devices_population_mutex` for the usage duration,
or add a new function like "PlatformPopulateDevices()" that takes a callback,
which could internally handle the specified device safely.

We can slowly adapt to the new "standard" when new input backends work is done,
for now this just adds the framework.
@Filoppi
Copy link
Contributor

Filoppi commented Jun 1, 2023

I gave a go at implementing this
#11870

Filoppi added a commit to Filoppi/dolphin that referenced this pull request Jun 1, 2023
This specific issue was already addressed by dolphin-emu#11635
though I felt like there was something more we could do, and wasn't too happy with the
likelihood of devices update calls being skipped (due to `m_devices_population_mutex` being locked).

Most importantly, this is the first step in getting rid of the fact that devices are stored as
shared pointers all across the code base, as this created many problems with handling their
lifetime and destructor (which needs to be called at a specific time for some device types).
Devices now hold a unique handle, and they can always be retrieved from it.

Currently there's only a way to retrieve them as shared pointer from the handle,
but in the future we might make a function like this:
`Device* GetDeviceFromHandle(int handle)` and asking to the calling site to lock
`m_devices_population_mutex` for the usage duration,
or add a new function like "PlatformPopulateDevices()" that takes a callback,
which could internally handle the specified device safely.

We can slowly adapt to the new "standard" when new input backends work is done,
for now this just adds the framework.
Filoppi added a commit to Filoppi/dolphin that referenced this pull request Jun 4, 2023
This specific issue was already addressed by dolphin-emu#11635
though I felt like there was something more we could do, and wasn't too happy with the
likelihood of devices update calls being skipped (due to `m_devices_population_mutex` being locked).
Filoppi added a commit to Filoppi/dolphin that referenced this pull request Jun 4, 2023
This specific issue was already addressed by dolphin-emu#11635
though I felt like there was something more we could do, and wasn't too happy with the
likelihood of devices update calls being skipped (due to `m_devices_population_mutex` being locked).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants