Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetKDRequestDevice: Fix sporadic crashes during emulation shutdown #12463

Merged

Conversation

Dentomologist
Copy link
Contributor

In the last few months I've had a handful of random crashes when shutting down emulation which happened in NetKDRequestDevice::KDDownload().

The first was the result of using-after-freeing of some of NetKDRequestDevices's members by its WorkQueueThreads, which this PR resolves by calling the latter's Shutdown function in NetKDRequestDevice's destructor so the threads are finished before destroying any members.

The other crash was caused by one of two possible nullptr dereferences. The first is when calling GetIOS() in KDDownload(), which could return nullptr if called after IOS::HLE::Shutdown had reset s_ios. The other happened with the following sequence:

  • The work thread called GetIOS and got the non-null s_ios ptr.
  • The CPU thread called s_ios.reset(), which runs EmulationKernel's destructor and clears m_device_map.
  • The work thread called GetDeviceByName(), which failed to find NetKDTimeDevice in the now-empty map

This crash is resolved by storing a shared_ptr to NetKDTimeDevice inside NetKDRequestDevice, which skips the lookup via GetIOS and ensures the NetKDTimeDevice will exist through NetKDRequestDevice's lifespan.

Explicitly shut down work queues in NetKDRequestDevice's destructor to
prevent their threads from accessing members after they've been freed.

This crash would occur sporadically if NetKDRequestDevice's periodic
download or mail checks happened to overlap with emulation shutdown in
the wrong way.

An example sequence of events that could cause the crash:
* m_scheduler_timer_thread queues a periodic Download event in
  m_scheduler_work_queue, then waits for m_shutdown_event.
* A request to stop emulation results in s_ios being reset by the CPU
  thread. This triggers NetKDRequestDevice's destructor which sets
  m_shutdown_event and joins m_scheduler_timer_thread.
* m_scheduler_timer_thread wakes from m_shutdown_event and returns from
  its thread function, ending the thread.
* The CPU thread resumes execution at the end of NetKDRequestDevice's
  destructor and begins destroying NetKDRequestDevice's members in
  reverse declaration order.
* m_http is declared after m_scheduler_work_queue and is therefore
  destroyed earlier.
* m_scheduler_work_queue's destructor calls its Shutdown function, which
  by default finishes the work items in the queue.
* The queued Download event calls KDDownload which calls m_http.Get()
  which calls Fetch() which passes garbage data from the freed m_curl
  into curl_easy_setopt().
* Curl promptly crashes.

Shutting down the work queues manually in the destructor prevents the
above because m_http and the other members don't get freed until after
the queue threads finish.
Keep a shared_ptr to NetKDTimeDevice inside NetKDRequestDevice.

This allows the KDDownload task to finish its work without potentially
trying to dereference nullptr, which can potentially come from either
GetIOS() or GetDeviceByName() if EmulationKernel's destructor has
started running.
Comment on lines +194 to +195
m_scheduler_work_queue.Shutdown();
m_work_queue.Shutdown();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling Shutdown(true) would cancel the remaining items and be a bit faster, but I'm not familiar enough with NetKD to know if that would cause any problems.

@AdmiralCurtiss
Copy link
Contributor

Makes sense to me, though I suspect it's a code smell that our IOS Device lifetimes aren't well-defined during destruction. I don't think it should be necessary for individual IOS Devices to hold shared_ptrs to eachother, the main Kernel should manage the lifetimes... You don't need to fix this here but yeah.

Comment on lines 19 to +21
#include "Core/IOS/Network/KD/NWC24Config.h"
#include "Core/IOS/Network/KD/NWC24DL.h"
#include "Core/IOS/Network/KD/NetKDTime.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems clang-format doesn't sort it alphabetically correctly (or rather its sorting is not case insensitive, probably ASCII based since lowercase is after uppercase). Feel free to ignore this comment since there is not much you can do as it isn't related to this PR.

Copy link
Contributor

@sepalani sepalani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM, untested.

@AdmiralCurtiss AdmiralCurtiss merged commit e212d1c into dolphin-emu:master Dec 27, 2023
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants