New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NetKDRequestDevice: Fix sporadic crashes during emulation shutdown #12463
NetKDRequestDevice: Fix sporadic crashes during emulation shutdown #12463
Conversation
Explicitly shut down work queues in NetKDRequestDevice's destructor to prevent their threads from accessing members after they've been freed. This crash would occur sporadically if NetKDRequestDevice's periodic download or mail checks happened to overlap with emulation shutdown in the wrong way. An example sequence of events that could cause the crash: * m_scheduler_timer_thread queues a periodic Download event in m_scheduler_work_queue, then waits for m_shutdown_event. * A request to stop emulation results in s_ios being reset by the CPU thread. This triggers NetKDRequestDevice's destructor which sets m_shutdown_event and joins m_scheduler_timer_thread. * m_scheduler_timer_thread wakes from m_shutdown_event and returns from its thread function, ending the thread. * The CPU thread resumes execution at the end of NetKDRequestDevice's destructor and begins destroying NetKDRequestDevice's members in reverse declaration order. * m_http is declared after m_scheduler_work_queue and is therefore destroyed earlier. * m_scheduler_work_queue's destructor calls its Shutdown function, which by default finishes the work items in the queue. * The queued Download event calls KDDownload which calls m_http.Get() which calls Fetch() which passes garbage data from the freed m_curl into curl_easy_setopt(). * Curl promptly crashes. Shutting down the work queues manually in the destructor prevents the above because m_http and the other members don't get freed until after the queue threads finish.
Keep a shared_ptr to NetKDTimeDevice inside NetKDRequestDevice. This allows the KDDownload task to finish its work without potentially trying to dereference nullptr, which can potentially come from either GetIOS() or GetDeviceByName() if EmulationKernel's destructor has started running.
m_scheduler_work_queue.Shutdown(); | ||
m_work_queue.Shutdown(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling Shutdown(true) would cancel the remaining items and be a bit faster, but I'm not familiar enough with NetKD to know if that would cause any problems.
Makes sense to me, though I suspect it's a code smell that our IOS Device lifetimes aren't well-defined during destruction. I don't think it should be necessary for individual IOS Devices to hold shared_ptrs to eachother, the main Kernel should manage the lifetimes... You don't need to fix this here but yeah. |
#include "Core/IOS/Network/KD/NWC24Config.h" | ||
#include "Core/IOS/Network/KD/NWC24DL.h" | ||
#include "Core/IOS/Network/KD/NetKDTime.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems clang-format doesn't sort it alphabetically correctly (or rather its sorting is not case insensitive, probably ASCII based since lowercase is after uppercase). Feel free to ignore this comment since there is not much you can do as it isn't related to this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM, untested.
In the last few months I've had a handful of random crashes when shutting down emulation which happened in NetKDRequestDevice::KDDownload().
The first was the result of using-after-freeing of some of NetKDRequestDevices's members by its WorkQueueThreads, which this PR resolves by calling the latter's Shutdown function in NetKDRequestDevice's destructor so the threads are finished before destroying any members.
The other crash was caused by one of two possible nullptr dereferences. The first is when calling GetIOS() in KDDownload(), which could return nullptr if called after IOS::HLE::Shutdown had reset s_ios. The other happened with the following sequence:
This crash is resolved by storing a shared_ptr to NetKDTimeDevice inside NetKDRequestDevice, which skips the lookup via GetIOS and ensures the NetKDTimeDevice will exist through NetKDRequestDevice's lifespan.