Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DolphinQt: Avoid leaking the GameListModel instance to gracefully shutdown the GameTracker and prevent a crash on exit #7714

Merged
merged 7 commits into from Nov 18, 2020

Conversation

cristian64
Copy link
Contributor

@cristian64 cristian64 commented Jan 16, 2019

Because the GameListModel was never destroyed, its GameTracker instance outlived the Config module, which led to some crashes on shutdown.

The fix consists of binding the lifespan of GameListModel to the GameList instance, ensuring that the GameTracker is indeed destructed.

In order to be able to destroy the GameTracker instance, some race conditions, deadlocks and other crashes had to be addressed. These issues were all caused by a bad combination of the GameTracker's worker thread and the RunOnObject() function.

To summarize the list of changes:

  • All references to the GameListModel have been removed from Settings.
  • The GameListModel instance is now owned by the GameList instance.
  • Components that need to access the GameListModel instance are provided with a const reference in their constructors.
  • The GameTracker's internal worker thread is now flushed on shutdown, to avoid it accessing stale data, and to favor responsiveness.
  • RunOnObject() has been replaced with QueueOnObject() in GameTracker, to prevent a deadlock on shutdown.
  • (Extra) Refresh button and Purge Game List Cache menu entry have now their enabled state synced correctly, and they are not much more responsive.

The crash could be reproduced with the following steps:

  • Open Dolphin.
  • Start copying (or downloading) a game into any of the directories monitored by Dolphin. This will make the game list to be refreshed every now and then, until the game is fully copied (or downloaded).
  • Exit Dolphin by clicking on the close button, or selecting File > Exit in the top bar menu.
  • The application will potentially crash on shutdown:
Thread 10 "dolphin-emu" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc59f6700 (LWP 6006)]
0x000055555566b41b in bool Config::Get<bool>(Config::ConfigInfo<bool> const&) ()
(gdb) bt
#0  0x000055555566b41b in bool Config::Get<bool>(Config::ConfigInfo<bool> const&) ()
#1  0x00005555559a1c3f in UICommon::GameFile::CustomCoverChanged() ()
#2  0x00005555559a8442 in UICommon::GameFileCache::UpdateAdditionalMetadata(std::shared_ptr<UICommon::GameFile>*) [clone .constprop.227] ()
#3  0x00005555559a9363 in UICommon::GameFileCache::AddOrGet(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool*) ()
#4  0x000055555571bd1b in GameTracker::LoadGame(QString const&) [clone .part.191] ()
#5  0x000055555571e08b in GameTracker::UpdateFileInternal(QString const&) ()
#6  0x000055555571fb8d in std::_Function_handler<void (GameTracker::Command), GameTracker::Command(QObject*)::{lambda(GameTracker::Command)#3}>::_M_invoke(std::_Any_data const&, GameTracker::Command&&) ()
#7  0x00005555557200a8 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<Common::WorkQueueThread<GameTracker::Command>::Reset(std::function<void (GameTracker::Command)>)::{lambda()#1}> > >::_M_run() ()
#8  0x00007ffff09eb57f in  () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007ffff0cbe6db in start_thread (arg=0x7fffc59f6700) at pthread_create.c:463
#10 0x00007ffff02c088f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Source/Core/DolphinQt/NetPlay/NetPlaySetupDialog.cpp Outdated Show resolved Hide resolved
Source/Core/DolphinQt/Settings.h Outdated Show resolved Hide resolved
Source/Core/DolphinQt/GameList/GameList.cpp Outdated Show resolved Hide resolved
@cristian64
Copy link
Contributor Author

@Techjar Thanks for the feedback! Much appreciated.

I wasn't too sure about whether using shared pointers was a good fit for this case. Since these are all UI components, I assumed (maybe incorrectly) that all this would be running in the UI thread, therefore using shared pointers to prevent race conditions would be an overkill.

As of this pull request, the GameListModel is constructed with GameList as a parent, so it lives just about as long as the GameList instance. However, it relies on GameList::~GameList() to reset the pointer via Settings::SetGameListModel(nullptr), which feels pretty weak.

By using a combination of std::shared_ptr and std::weak_ptr, we can still make the GameList own the GameListModel instance, and whilst the pointer in Settings would be auto reset when the GameList instance is destroyed. I will amend the pull request later today.

Another good question... Why is Settings in charged of storing the GameListModel single instance pointer (or, after this pull request, a pointer to the single instance)? Thoughts?

@BhaaLseN
Copy link
Member

I don't see any reason why it is actually stored in Settings. It feels like they should be passed around from something at the top (preferably using constructors) instead of having every component grab it from somewhere else.

@cristian64
Copy link
Contributor Author

Following @BhaaLseN 's idea, I ended up removing the cheeky Settings::GetGameListModel() function. Now the GameListModel reference is passed in in the constructors of those components that need it.

However (1), after doing so, I've then discovered that there was probably a reason why it was decided to leak the GameListModel/GameTracker instances... Basically, if the reproduction steps are followed (as described in the pull request's description), there will be a big chance of deadlock when the GameTracker's destructor is executed. I've managed to get around the deadlock by calling QCoreApplication::processEvents(). There is a commit for it, where I attempted to describe the issue in depth (if you have not looked at that commit, yet, do it now, please, so that the following makes sense).

However (2), this work-around has only surfaced more issues... The code running inside the RunOnObject() may try to access some of the GameTracker's ancestors (such as the GameListModel), which have already been destroyed, and will cause another crash.

More work is required... My feeling is that both RunOnObject() and the GameTracker's worker thread need to be reconsidered. I don't think RunOnObject() is required; a combination of signal/slots should be used instead. When it comes to the worker thread, we'll see.

@BhaaLseN
Copy link
Member

I also think it would be better to use callbacks (or Qt signals/slots) in the classes that need access to the model. As far as I could tell, they only need the model to get a list of all games; and for that they could simply raise a signal to have someone else fill in the blanks. Could look like this (somewhat pseudo, assuming it works similarly to events in C#; at least docs say that code after emit runs once all the slots have finished):

// in NetPlaySetupDialog (and the other places):
signals:
void RequestGameList(std::vector<GameInfo>& list_of_games_to_be_shown);

public:
void PopulateGameList()
{
  std::vector<GameInfo> list_of_games_to_show_here;
  emit RequestGameList(list_of_games_to_show_here);
  for (auto game : list_of_games_to_show_here) { /* ... */ }
}

// in GameList (or any other place that might be appropriate):
slots:
void FillRequestedGameList(std::vector<GameInfo>& list_of_games_to_be_shown)
{
  // add all the games that should be shown, potentially filtering them first
  list_of_games_to_be_shown.push_back(game1);
  list_of_games_to_be_shown.push_back(game2);
  list_of_games_to_be_shown.push_back(game7);
}

I'm just hoping that disconnection is automatic (or at least good enough) and won't cause any problems there. As far as I can tell, signals without connected slots will just do nothing and leave the vector empty (so the worst that happens is an empty list instead of a crash).

@cristian64 cristian64 force-pushed the avoid_leaking_gamelistmodel branch 2 times, most recently from 79d9c46 to 4227627 Compare January 19, 2019 01:48
@cristian64 cristian64 changed the title Avoid leaking the GameListModel instance Avoid leaking the GameListModel instance to gracefully shutdown the GameTracker and prevent a crash on exit Jan 19, 2019
@cristian64
Copy link
Contributor Author

cristian64 commented Jan 19, 2019

I also think it would be better to use callbacks (or Qt signals/slots) in the classes that need access to the model. As far as I could tell, they only need the model to get a list of all games; and for that they could simply raise a signal to have someone else fill in the blanks. Could look like this (somewhat pseudo, assuming it works similarly to events in C#; at least docs say that code after emit runs once all the slots have finished):

// in NetPlaySetupDialog (and the other places):
signals:
void RequestGameList(std::vector<GameInfo>& list_of_games_to_be_shown);

public:
void PopulateGameList()
{
  std::vector<GameInfo> list_of_games_to_show_here;
  emit RequestGameList(list_of_games_to_show_here);
  for (auto game : list_of_games_to_show_here) { /* ... */ }
}

// in GameList (or any other place that might be appropriate):
slots:
void FillRequestedGameList(std::vector<GameInfo>& list_of_games_to_be_shown)
{
  // add all the games that should be shown, potentially filtering them first
  list_of_games_to_be_shown.push_back(game1);
  list_of_games_to_be_shown.push_back(game2);
  list_of_games_to_be_shown.push_back(game7);
}

I'm just hoping that disconnection is automatic (or at least good enough) and won't cause any problems there. As far as I can tell, signals without connected slots will just do nothing and leave the vector empty (so the worst that happens is an empty list instead of a crash).

@BhaaLseN After removing the GameTracker's worker thread and the RunOnObject() usage, we are off the hook. All good now.

Since this has grown a bit too large, I'd like to leave those other (pertinent) changes that you suggested for a follow-up pull request. Actually... I'm having second thoughts... Your suggestion should be implemented now, as otherwise it all relies on the GameList living longer than the NetPlay and CheatManager dialogs. At the moment that is the case, but that might change in the future. So, let's implement those signal/slots as part of this pull request...

@cristian64
Copy link
Contributor Author

cristian64 commented Jan 19, 2019

@BhaaLseN In the end, QPointer has been used instead. Having to define several signals and slots in several components was a bit convoluted, specially around things like the RequestGameList() signal/slot you suggested. emit is a fire-and-forget action, so you would not expect anything to be returned (or assigned to the output parameter).

@BhaaLseN
Copy link
Member

I'm pretty sure emit is synchronous, ie. it is not fire&forget (but instead more like a multi-cast with 0..n handlers to be run in succession). But yeah, it was mostly an idea, not necessarily a "you must do it that way" thing.

@cristian64
Copy link
Contributor Author

It seems it depends on the connection type (https://doc.qt.io/qt-5/signalsandslots.html#signals). In this case, you are right, slots would be executed immediately as I believe they'd be running in the same thread and all (i.e. direct connection type).

There was some discussion here: https://stackoverflow.com/questions/8455887/stack-object-qt-signal-and-parameter-as-reference. I guess I wouldn't use signals with output parameters in my own application, just to be safe.

@Tilka
Copy link
Member

Tilka commented Feb 3, 2019

With this PR I get an empty game list at startup. After about 3 seconds the complete list plops in. This didn't happen before.

@cristian64
Copy link
Contributor Author

Hi @Tilka , thanks for giving the patch a go and the feedback.

In my tests, the list (30 games) is loaded immediately as the main window shows up. How large is your list? Platform?

The only change that I can think of that may have affected performance negatively is the thread's priority, which I decided to set to the very lowest, QThread::IdlePriority. That may have been a mistake. I'll change the priority to QThread::InheritPriority (default) and update the PR shortly.

Would you mind giving it another go?

@Tilka
Copy link
Member

Tilka commented Feb 6, 2019

There are currently 427 items in my game list. You could probably simulate this with a few hundred dol files. The thread priority change didn't improve the speed. I'm on Linux. The games are on a network drive mounted via local WiFi, but this shouldn't matter much, on master it's fast. I'll try to figure out what's going on...

@Tilka
Copy link
Member

Tilka commented Feb 6, 2019

Is it possible that master is just showing the stale cached game list and updates it in the background as needed whereas you are only showing the cache once it has been validated? That would explain why master crashes for me if I close it right away (probably the crash you are fixing here).

@cristian64
Copy link
Contributor Author

cristian64 commented Feb 6, 2019

That is exactly what I'm looking at at the moment. Now I'm making sure that my local gamelist.cache file is erased. I'm also adding some sleep() in GameFileCache::Update() to mock a slow drive. So far I have NOT reproduced the problem you're seeing (but I've seem other problems such as duplicated entries in the list).

@cristian64
Copy link
Contributor Author

cristian64 commented Feb 6, 2019

I think I have spotted one(the?) problem. In short: you were right.

I messed the order of the lines m_cache.Update(...) and m_cache.ForEach(...) in GameTracker::StartSlot(). Basically they have to be swapped around, or else the cache is not really used, as you mentioned.

In master, the m_cache.ForEach(...) line is executed just after the cache has loaded, and before the m_cache.Update(...) line. Note the m_initial_games_emitted_event condition variable, which effectively delays the Update() call until the ForEach() has concluded.

I'll amend the PR once you confirm that restoring the correct order of those lines solves the issue for you. Ended up updating the PR.

@Tilka
Copy link
Member

Tilka commented Feb 7, 2019

Now the list shows up immediately, but when I close Dolphin right away I get this:

QThread: Destroyed while thread is still running

Thread 1 "dolphin-emu" received signal SIGABRT, Aborted.
0x00007fffef64dd7f in raise () from /usr/lib/libc.so.6
=> 0x00007fffef64dd7f <raise+271>:	48 8b 8c 24 08 01 00 00	mov    rcx,QWORD PTR [rsp+0x108]
(gdb) bt
#0  0x00007fffef64dd7f in raise () at /usr/lib/libc.so.6
#1  0x00007fffef638672 in abort () at /usr/lib/libc.so.6
#2  0x00007ffff46127fc in  () at /usr/lib/libQt5Core.so.5
#3  0x00007ffff4613055 in  () at /usr/lib/libQt5Core.so.5
#4  0x0000555555779d84 in GameListModel::~GameListModel() (this=0x5555579b1060, __in_chrg=<optimized out>) at /usr/include/qt/QtCore/qlist.h:75
#5  0x0000555555779d84 in GameListModel::~GameListModel() (this=0x5555579b1060, __in_chrg=<optimized out>) at ../Source/Core/DolphinQt/GameList/GameListModel.h:25
#6  0x00007ffff48317ab in QObjectPrivate::deleteChildren() () at /usr/lib/libQt5Core.so.5
#7  0x00007ffff7aa328f in QWidget::~QWidget() () at /usr/lib/libQt5Widgets.so.5
#8  0x00005555557187e9 in GameList::~GameList() (this=0x5555574b0930, __in_chrg=<optimized out>) at /usr/include/c++/8.2.1/bits/atomic_base.h:303
#9  0x0000555555718879 in GameList::~GameList() (this=0x5555574b0930, __in_chrg=<optimized out>) at ../Source/Core/DolphinQt/GameList/GameList.cpp:168
#10 0x00007ffff48317ab in QObjectPrivate::deleteChildren() () at /usr/lib/libQt5Core.so.5
#11 0x00007ffff7aa328f in QWidget::~QWidget() () at /usr/lib/libQt5Widgets.so.5
#12 0x00007ffff7aa34aa in QWidget::~QWidget() () at /usr/lib/libQt5Widgets.so.5
#13 0x00007ffff48317ab in QObjectPrivate::deleteChildren() () at /usr/lib/libQt5Core.so.5
#14 0x00007ffff7aa328f in QWidget::~QWidget() () at /usr/lib/libQt5Widgets.so.5
#15 0x00007ffff7c0f67a in QStackedWidget::~QStackedWidget() () at /usr/lib/libQt5Widgets.so.5
#16 0x00007ffff48317ab in QObjectPrivate::deleteChildren() () at /usr/lib/libQt5Core.so.5
#17 0x00007ffff7aa328f in QWidget::~QWidget() () at /usr/lib/libQt5Widgets.so.5
#18 0x000055555567ff59 in MainWindow::~MainWindow() (this=0x7fffffffe1a0, __in_chrg=<optimized out>) at /usr/include/c++/8.2.1/bits/unique_ptr.h:347
#19 0x0000555555652259 in main(int, char**) (argc=<optimized out>, argv=<optimized out>) at ../Source/Core/DolphinQt/Main.cpp:217

@Tilka
Copy link
Member

Tilka commented Feb 7, 2019

It's not consistent though. Also, shutting down takes longer (as long as the cache verification I guess).

@cristian64
Copy link
Contributor Author

Right... I don't think there is much to do here (but I may be wrong, as your stacktrace is not giving us all the info).

In GameTracker::~GameTracker() the thread is stopped. If it does not return within 5 seconds (arbitrary value), it attempts to terminate the thread (I believe that action does nothing on Linux; it does terminate the thread on Windows, but, again, I may be wrong).

The thread is potentially blocked by one of the cache functions, which you reckon may take several seconds to complete... If it takes longer than 5, bad things are going to happen... Ideally, we could try to identify what function of the cache is taking long (is it Update()? is it ForEach()? is it Load()?) and pass a flag by reference, so we can try cancel the slow-function's internal loop, therefore ensuring that shutting Dolphin down is very responsive.

Another option would be to increment the timeout from 5 seconds to 10 seconds... That should be a good value for everybody... :trollface:

@cristian64
Copy link
Contributor Author

I have added the aforementioned flag to GameFileCache::Update(), since that is the one that does the bulk of the work. Also, for completeness, I've added the flag to GameFileCache::UpdateAdditionalMetadata().

In theory, you should be able to close Dolphin at any time (even when the cache is still being built up for the first time), and it should react quickly, without crashing, and without leaking the GameListModel or the GameTracker.

@cristian64
Copy link
Contributor Author

Thanks for giving it a go, @iwubcode . After 20 months, this PR has become my nemesis.

I've just realized that the failed pr:android check is legit. The signature of a function in Source/Core/UICommon/GameFileCache.h has (unnecessarily) changed, and that function is used in also in Source/Android/jni/GameList/GameFileCache.cpp.

(The fact that the last error/warning in the log points to a missing fmt library led me astray! 🤦)

I'll update the PR to keep the signature as it was.

@cristian64
Copy link
Contributor Author

Updated to avoid modifying signature of functions in Source/Core/UICommon/GameFileCache.h, as it was affecting the Android build.

@jordan-woyak
Copy link
Member

I don't like how the "Dolphin could not find any [games]" message is shown for a second when dolphin launches before the list populates.
Ideally that message should be delayed until scanning is done.
Maybe I'm nitpicking. :P

@cristian64
Copy link
Contributor Author

I don't like how the "Dolphin could not find any [games]" message is shown for a second when dolphin launches before the list populates.
Ideally that message should be delayed until scanning is done.
Maybe I'm nitpicking. :P

Well spotted... Let me correct that...

(I enforced the connections to be Qt::QueuedConnection, thinking it'd be nicer, but that means some calls that could be invoked straight away are delayed (delayed enough to let the "could not find any games" message to be displayed).)

@cristian64
Copy link
Contributor Author

I don't like how the "Dolphin could not find any [games]" message is shown for a second when dolphin launches before the list populates.
Ideally that message should be delayed until scanning is done.
Maybe I'm nitpicking. :P

Well spotted... Let me correct that...

(I enforced the connections to be Qt::QueuedConnection, thinking it'd be nicer, but that means some calls that could be invoked straight away are delayed (delayed enough to let the "could not find any games" message to be displayed).)

It turned out these changes were not as straight forward as I thought they would be...

I ended up reverting some of my previous changes, and fixing the original problem (i.e. crashes and/or deadlocks on shutdown) in a different way. (In fact, now the diff is actually smaller than before.)

So, overall, with these changes:

  • Closing Dolphin while a game is being copied/downloaded should always be responsive.
  • Closing Dolphin while a game is being copied/downloaded should not crash.
  • Closing Dolphin while refreshing/purging the game list should always be responsive.
  • Closing Dolphin while refreshing/purging the game list should not crash.
  • Refresh button and Purge Game List Cache action should now be synced reliable (i.e. they become disabled when a refresh is ongoing). Previously, only the Refresh button would sync, in some cases, but not all cases.

@cristian64
Copy link
Contributor Author

Updated to address formatting.

@cristian64
Copy link
Contributor Author

  • Rebased.
  • Addressed conflicts.

@cristian64
Copy link
Contributor Author

@jordan-woyak , Although I mentioned I'd create a separate PR for the extra improvements you suggested, I ended up adding the commit to this PR. I was not too sure when this PR was going to progress, and I didn't want to create a PR with a dependency to this PR. Arguably, the last two commits of this PR could be moved to their own PR.

With this final commit, both the Refresh button and the Purge Game List Cache action will be disabled only briefly. As soon as the refresh request is processed, they will be re-enabled, allowing the user to refresh/purge the list again mid-way.

Current behavior in Dolphin 5.0-12716 (dirty):

Dolphin 5.0-12716 (dirty)

Demo of this PR:

PR 7714)

(For the sake of recording the videos, I added an artificial delay, so that the list would update slowly.)

Copy link
Member

@jordan-woyak jordan-woyak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to behave well now and code looks good.

@cristian64
Copy link
Contributor Author

Seems to behave well now and code looks good.

Thanks for approving and your time. Much appreciated.

@cristian64
Copy link
Contributor Author

@jordan-woyak Is there anything I can do to speed things up and get this through? In the past, the PRs I worked on were merged shortly after being approved, but I have a gut feeling that this one may fall through the cracks.

@leoetlino leoetlino merged commit 31d7be5 into dolphin-emu:master Nov 18, 2020
10 checks passed
@cristian64
Copy link
Contributor Author

Thanks for merging. Very nice.

@cristian64 cristian64 deleted the avoid_leaking_gamelistmodel branch November 22, 2020 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
7 participants