New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid leaking the GameListModel instance to gracefully shutdown the GameTracker and prevent a crash on exit #7714

Open
wants to merge 7 commits into
base: master
from

Conversation

4 participants
@cristian64
Copy link
Contributor

cristian64 commented Jan 16, 2019

Because the GameListModel was never destroyed, its GameTracker instance outlived the Config module, which led to some crashes on shutdown.

The fix consists of binding the lifespan of GameListModel to the GameList instance, ensuring that the GameTracker is indeed destructed.

In order to be able to destroy the GameTracker instance, some race conditions, deadlocks and other crashes had to be addressed. These issues were all caused by a bad combination of the now-removed GameTracker's worker thread and the RunOnObject() function.

To summarize the list of changes:

  • All reference to the GameListModel has been removed from Settings.
  • The GameListModel instance is now owned by the GameList instance.
  • Components that need to access the GameListModel instance are provided with a reference in their constructors.
  • The GameTracker's worker thread and the RunOnObject() usage has been removed.
  • Now the GameTracker's functions run in a separate QThread, and all communication is performed via signal/slot.

The crash could be reproduced with the following steps:

  • Open Dolphin.
  • Start copying (or downloading) a game into any of the directories monitored by Dolphin. This will make the game list to be refreshed every now and then, until the game is fully copied (or downloaded).
  • Exit Dolphin by clicking on the close button, or selecting File > Exit in the top bar menu.
  • The application will potentially crash on shutdown:
Thread 10 "dolphin-emu" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc59f6700 (LWP 6006)]
0x000055555566b41b in bool Config::Get<bool>(Config::ConfigInfo<bool> const&) ()
(gdb) bt
#0  0x000055555566b41b in bool Config::Get<bool>(Config::ConfigInfo<bool> const&) ()
#1  0x00005555559a1c3f in UICommon::GameFile::CustomCoverChanged() ()
#2  0x00005555559a8442 in UICommon::GameFileCache::UpdateAdditionalMetadata(std::shared_ptr<UICommon::GameFile>*) [clone .constprop.227] ()
#3  0x00005555559a9363 in UICommon::GameFileCache::AddOrGet(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool*) ()
#4  0x000055555571bd1b in GameTracker::LoadGame(QString const&) [clone .part.191] ()
#5  0x000055555571e08b in GameTracker::UpdateFileInternal(QString const&) ()
#6  0x000055555571fb8d in std::_Function_handler<void (GameTracker::Command), GameTracker::Command(QObject*)::{lambda(GameTracker::Command)#3}>::_M_invoke(std::_Any_data const&, GameTracker::Command&&) ()
#7  0x00005555557200a8 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<Common::WorkQueueThread<GameTracker::Command>::Reset(std::function<void (GameTracker::Command)>)::{lambda()#1}> > >::_M_run() ()
#8  0x00007ffff09eb57f in  () at /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007ffff0cbe6db in start_thread (arg=0x7fffc59f6700) at pthread_create.c:463
#10 0x00007ffff02c088f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
Show resolved Hide resolved Source/Core/DolphinQt/NetPlay/NetPlaySetupDialog.cpp Outdated
Show resolved Hide resolved Source/Core/DolphinQt/Settings.h Outdated
Show resolved Hide resolved Source/Core/DolphinQt/GameList/GameList.cpp
@cristian64

This comment has been minimized.

Copy link
Contributor Author

cristian64 commented Jan 17, 2019

@Techjar Thanks for the feedback! Much appreciated.

I wasn't too sure about whether using shared pointers was a good fit for this case. Since these are all UI components, I assumed (maybe incorrectly) that all this would be running in the UI thread, therefore using shared pointers to prevent race conditions would be an overkill.

As of this pull request, the GameListModel is constructed with GameList as a parent, so it lives just about as long as the GameList instance. However, it relies on GameList::~GameList() to reset the pointer via Settings::SetGameListModel(nullptr), which feels pretty weak.

By using a combination of std::shared_ptr and std::weak_ptr, we can still make the GameList own the GameListModel instance, and whilst the pointer in Settings would be auto reset when the GameList instance is destroyed. I will amend the pull request later today.

Another good question... Why is Settings in charged of storing the GameListModel single instance pointer (or, after this pull request, a pointer to the single instance)? Thoughts?

@BhaaLseN

This comment has been minimized.

Copy link
Member

BhaaLseN commented Jan 17, 2019

I don't see any reason why it is actually stored in Settings. It feels like they should be passed around from something at the top (preferably using constructors) instead of having every component grab it from somewhere else.

@cristian64 cristian64 force-pushed the cristian64:avoid_leaking_gamelistmodel branch from 0fe21c0 to a8c16ab Jan 18, 2019

@cristian64

This comment has been minimized.

Copy link
Contributor Author

cristian64 commented Jan 18, 2019

Following @BhaaLseN 's idea, I ended up removing the cheeky Settings::GetGameListModel() function. Now the GameListModel reference is passed in in the constructors of those components that need it.

However (1), after doing so, I've then discovered that there was probably a reason why it was decided to leak the GameListModel/GameTracker instances... Basically, if the reproduction steps are followed (as described in the pull request's description), there will be a big chance of deadlock when the GameTracker's destructor is executed. I've managed to get around the deadlock by calling QCoreApplication::processEvents(). There is a commit for it, where I attempted to describe the issue in depth (if you have not looked at that commit, yet, do it now, please, so that the following makes sense).

However (2), this work-around has only surfaced more issues... The code running inside the RunOnObject() may try to access some of the GameTracker's ancestors (such as the GameListModel), which have already been destroyed, and will cause another crash.

More work is required... My feeling is that both RunOnObject() and the GameTracker's worker thread need to be reconsidered. I don't think RunOnObject() is required; a combination of signal/slots should be used instead. When it comes to the worker thread, we'll see.

@BhaaLseN

This comment has been minimized.

Copy link
Member

BhaaLseN commented Jan 18, 2019

I also think it would be better to use callbacks (or Qt signals/slots) in the classes that need access to the model. As far as I could tell, they only need the model to get a list of all games; and for that they could simply raise a signal to have someone else fill in the blanks. Could look like this (somewhat pseudo, assuming it works similarly to events in C#; at least docs say that code after emit runs once all the slots have finished):

// in NetPlaySetupDialog (and the other places):
signals:
void RequestGameList(std::vector<GameInfo>& list_of_games_to_be_shown);

public:
void PopulateGameList()
{
  std::vector<GameInfo> list_of_games_to_show_here;
  emit RequestGameList(list_of_games_to_show_here);
  for (auto game : list_of_games_to_show_here) { /* ... */ }
}

// in GameList (or any other place that might be appropriate):
slots:
void FillRequestedGameList(std::vector<GameInfo>& list_of_games_to_be_shown)
{
  // add all the games that should be shown, potentially filtering them first
  list_of_games_to_be_shown.push_back(game1);
  list_of_games_to_be_shown.push_back(game2);
  list_of_games_to_be_shown.push_back(game7);
}

I'm just hoping that disconnection is automatic (or at least good enough) and won't cause any problems there. As far as I can tell, signals without connected slots will just do nothing and leave the vector empty (so the worst that happens is an empty list instead of a crash).

@cristian64 cristian64 force-pushed the cristian64:avoid_leaking_gamelistmodel branch 2 times, most recently from 79d9c46 to 4227627 Jan 19, 2019

@cristian64 cristian64 changed the title Avoid leaking the GameListModel instance Avoid leaking the GameListModel instance to gracefully shutdown the GameTracker and prevent a crash on exit Jan 19, 2019

@cristian64

This comment has been minimized.

Copy link
Contributor Author

cristian64 commented Jan 19, 2019

I also think it would be better to use callbacks (or Qt signals/slots) in the classes that need access to the model. As far as I could tell, they only need the model to get a list of all games; and for that they could simply raise a signal to have someone else fill in the blanks. Could look like this (somewhat pseudo, assuming it works similarly to events in C#; at least docs say that code after emit runs once all the slots have finished):

// in NetPlaySetupDialog (and the other places):
signals:
void RequestGameList(std::vector<GameInfo>& list_of_games_to_be_shown);

public:
void PopulateGameList()
{
  std::vector<GameInfo> list_of_games_to_show_here;
  emit RequestGameList(list_of_games_to_show_here);
  for (auto game : list_of_games_to_show_here) { /* ... */ }
}

// in GameList (or any other place that might be appropriate):
slots:
void FillRequestedGameList(std::vector<GameInfo>& list_of_games_to_be_shown)
{
  // add all the games that should be shown, potentially filtering them first
  list_of_games_to_be_shown.push_back(game1);
  list_of_games_to_be_shown.push_back(game2);
  list_of_games_to_be_shown.push_back(game7);
}

I'm just hoping that disconnection is automatic (or at least good enough) and won't cause any problems there. As far as I can tell, signals without connected slots will just do nothing and leave the vector empty (so the worst that happens is an empty list instead of a crash).

@BhaaLseN After removing the GameTracker's worker thread and the RunOnObject() usage, we are off the hook. All good now.

Since this has grown a bit too large, I'd like to leave those other (pertinent) changes that you suggested for a follow-up pull request. Actually... I'm having second thoughts... Your suggestion should be implemented now, as otherwise it all relies on the GameList living longer than the NetPlay and CheatManager dialogs. At the moment that is the case, but that might change in the future. So, let's implement those signal/slots as part of this pull request...

@cristian64 cristian64 force-pushed the cristian64:avoid_leaking_gamelistmodel branch from 4227627 to 363ee48 Jan 19, 2019

@cristian64

This comment has been minimized.

Copy link
Contributor Author

cristian64 commented Jan 19, 2019

@BhaaLseN In the end, QPointer has been used instead. Having to define several signals and slots in several components was a bit convoluted, specially around things like the RequestGameList() signal/slot you suggested. emit is a fire-and-forget action, so you would not expect anything to be returned (or assigned to the output parameter).

@BhaaLseN

This comment has been minimized.

Copy link
Member

BhaaLseN commented Jan 19, 2019

I'm pretty sure emit is synchronous, ie. it is not fire&forget (but instead more like a multi-cast with 0..n handlers to be run in succession). But yeah, it was mostly an idea, not necessarily a "you must do it that way" thing.

@cristian64

This comment has been minimized.

Copy link
Contributor Author

cristian64 commented Jan 19, 2019

It seems it depends on the connection type (https://doc.qt.io/qt-5/signalsandslots.html#signals). In this case, you are right, slots would be executed immediately as I believe they'd be running in the same thread and all (i.e. direct connection type).

There was some discussion here: https://stackoverflow.com/questions/8455887/stack-object-qt-signal-and-parameter-as-reference. I guess I wouldn't use signals with output parameters in my own application, just to be safe.

@cristian64 cristian64 force-pushed the cristian64:avoid_leaking_gamelistmodel branch from 363ee48 to 7cb4152 Jan 25, 2019

@Tilka

This comment has been minimized.

Copy link
Member

Tilka commented Feb 3, 2019

With this PR I get an empty game list at startup. After about 3 seconds the complete list plops in. This didn't happen before.

@cristian64

This comment has been minimized.

Copy link
Contributor Author

cristian64 commented Feb 6, 2019

Hi @Tilka , thanks for giving the patch a go and the feedback.

In my tests, the list (30 games) is loaded immediately as the main window shows up. How large is your list? Platform?

The only change that I can think of that may have affected performance negatively is the thread's priority, which I decided to set to the very lowest, QThread::IdlePriority. That may have been a mistake. I'll change the priority to QThread::InheritPriority (default) and update the PR shortly.

Would you mind giving it another go?

@cristian64 cristian64 force-pushed the cristian64:avoid_leaking_gamelistmodel branch from 7cb4152 to 272844f Feb 6, 2019

@Tilka

This comment has been minimized.

Copy link
Member

Tilka commented Feb 6, 2019

There are currently 427 items in my game list. You could probably simulate this with a few hundred dol files. The thread priority change didn't improve the speed. I'm on Linux. The games are on a network drive mounted via local WiFi, but this shouldn't matter much, on master it's fast. I'll try to figure out what's going on...

@Tilka

This comment has been minimized.

Copy link
Member

Tilka commented Feb 6, 2019

Is it possible that master is just showing the stale cached game list and updates it in the background as needed whereas you are only showing the cache once it has been validated? That would explain why master crashes for me if I close it right away (probably the crash you are fixing here).

@cristian64

This comment has been minimized.

Copy link
Contributor Author

cristian64 commented Feb 6, 2019

That is exactly what I'm looking at at the moment. Now I'm making sure that my local gamelist.cache file is erased. I'm also adding some sleep() in GameFileCache::Update() to mock a slow drive. So far I have NOT reproduced the problem you're seeing (but I've seem other problems such as duplicated entries in the list).

@cristian64

This comment has been minimized.

Copy link
Contributor Author

cristian64 commented Feb 6, 2019

I think I have spotted one(the?) problem. In short: you were right.

I messed the order of the lines m_cache.Update(...) and m_cache.ForEach(...) in GameTracker::StartSlot(). Basically they have to be swapped around, or else the cache is not really used, as you mentioned.

In master, the m_cache.ForEach(...) line is executed just after the cache has loaded, and before the m_cache.Update(...) line. Note the m_initial_games_emitted_event condition variable, which effectively delays the Update() call until the ForEach() has concluded.

I'll amend the PR once you confirm that restoring the correct order of those lines solves the issue for you. Ended up updating the PR.

@cristian64 cristian64 force-pushed the cristian64:avoid_leaking_gamelistmodel branch from 272844f to 0ee2781 Feb 7, 2019

@Tilka

This comment has been minimized.

Copy link
Member

Tilka commented Feb 7, 2019

Now the list shows up immediately, but when I close Dolphin right away I get this:

QThread: Destroyed while thread is still running

Thread 1 "dolphin-emu" received signal SIGABRT, Aborted.
0x00007fffef64dd7f in raise () from /usr/lib/libc.so.6
=> 0x00007fffef64dd7f <raise+271>:	48 8b 8c 24 08 01 00 00	mov    rcx,QWORD PTR [rsp+0x108]
(gdb) bt
#0  0x00007fffef64dd7f in raise () at /usr/lib/libc.so.6
#1  0x00007fffef638672 in abort () at /usr/lib/libc.so.6
#2  0x00007ffff46127fc in  () at /usr/lib/libQt5Core.so.5
#3  0x00007ffff4613055 in  () at /usr/lib/libQt5Core.so.5
#4  0x0000555555779d84 in GameListModel::~GameListModel() (this=0x5555579b1060, __in_chrg=<optimized out>) at /usr/include/qt/QtCore/qlist.h:75
#5  0x0000555555779d84 in GameListModel::~GameListModel() (this=0x5555579b1060, __in_chrg=<optimized out>) at ../Source/Core/DolphinQt/GameList/GameListModel.h:25
#6  0x00007ffff48317ab in QObjectPrivate::deleteChildren() () at /usr/lib/libQt5Core.so.5
#7  0x00007ffff7aa328f in QWidget::~QWidget() () at /usr/lib/libQt5Widgets.so.5
#8  0x00005555557187e9 in GameList::~GameList() (this=0x5555574b0930, __in_chrg=<optimized out>) at /usr/include/c++/8.2.1/bits/atomic_base.h:303
#9  0x0000555555718879 in GameList::~GameList() (this=0x5555574b0930, __in_chrg=<optimized out>) at ../Source/Core/DolphinQt/GameList/GameList.cpp:168
#10 0x00007ffff48317ab in QObjectPrivate::deleteChildren() () at /usr/lib/libQt5Core.so.5
#11 0x00007ffff7aa328f in QWidget::~QWidget() () at /usr/lib/libQt5Widgets.so.5
#12 0x00007ffff7aa34aa in QWidget::~QWidget() () at /usr/lib/libQt5Widgets.so.5
#13 0x00007ffff48317ab in QObjectPrivate::deleteChildren() () at /usr/lib/libQt5Core.so.5
#14 0x00007ffff7aa328f in QWidget::~QWidget() () at /usr/lib/libQt5Widgets.so.5
#15 0x00007ffff7c0f67a in QStackedWidget::~QStackedWidget() () at /usr/lib/libQt5Widgets.so.5
#16 0x00007ffff48317ab in QObjectPrivate::deleteChildren() () at /usr/lib/libQt5Core.so.5
#17 0x00007ffff7aa328f in QWidget::~QWidget() () at /usr/lib/libQt5Widgets.so.5
#18 0x000055555567ff59 in MainWindow::~MainWindow() (this=0x7fffffffe1a0, __in_chrg=<optimized out>) at /usr/include/c++/8.2.1/bits/unique_ptr.h:347
#19 0x0000555555652259 in main(int, char**) (argc=<optimized out>, argv=<optimized out>) at ../Source/Core/DolphinQt/Main.cpp:217
@Tilka

This comment has been minimized.

Copy link
Member

Tilka commented Feb 7, 2019

It's not consistent though. Also, shutting down takes longer (as long as the cache verification I guess).

@cristian64

This comment has been minimized.

Copy link
Contributor Author

cristian64 commented Feb 7, 2019

Right... I don't think there is much to do here (but I may be wrong, as your stacktrace is not giving us all the info).

In GameTracker::~GameTracker() the thread is stopped. If it does not return within 5 seconds (arbitrary value), it attempts to terminate the thread (I believe that action does nothing on Linux; it does terminate the thread on Windows, but, again, I may be wrong).

The thread is potentially blocked by one of the cache functions, which you reckon may take several seconds to complete... If it takes longer than 5, bad things are going to happen... Ideally, we could try to identify what function of the cache is taking long (is it Update()? is it ForEach()? is it Load()?) and pass a flag by reference, so we can try cancel the slow-function's internal loop, therefore ensuring that shutting Dolphin down is very responsive.

Another option would be to increment the timeout from 5 seconds to 10 seconds... That should be a good value for everybody... :trollface:

@cristian64 cristian64 force-pushed the cristian64:avoid_leaking_gamelistmodel branch from 0ee2781 to 3c7e7b1 Feb 7, 2019

@cristian64

This comment has been minimized.

Copy link
Contributor Author

cristian64 commented Feb 7, 2019

I have added the aforementioned flag to GameFileCache::Update(), since that is the one that does the bulk of the work. Also, for completeness, I've added the flag to GameFileCache::UpdateAdditionalMetadata().

In theory, you should be able to close Dolphin at any time (even when the cache is still being built up for the first time), and it should react quickly, without crashing, and without leaking the GameListModel or the GameTracker.

@Tilka

This comment has been minimized.

Copy link
Member

Tilka commented Feb 7, 2019

Seems to work well now but I think the code can be improved. I'll look at it this weekend.

Edit: I mostly slept instead, sorry.

cristian64 added some commits Jan 17, 2019

GameListModel instance ownership transferred back to the GameList ins…
…tance. The GameListModel instance will be passed as a constructor parameter where needed.
Removed worker thread in GameTracker, also removed the RunOnObject() …
…dependency, and also removed the "...Internal" version of the functions.
GameTracker no longer inherits from QFileSystemWatcher. Instead, a QF…
…ileSystemWatcher member has been added (prefer compositing over inheritance).
Added a flag to GameFileCache::Update() and GameFileCache::UpdateAddi…
…tionalMetadata(), so the functions can be aborted in a responsive manner.

@cristian64 cristian64 force-pushed the cristian64:avoid_leaking_gamelistmodel branch from 3c7e7b1 to 14518ad Feb 13, 2019

@cristian64

This comment has been minimized.

Copy link
Contributor Author

cristian64 commented Feb 13, 2019

Just updated the PR with a few changes:

  • Rebased on master.
  • Removed the Slot suffix from certain functions names. Instead, an On prefix has been added: StartSlot() -> OnStart(). This matches Dolphin's style a bit better.
  • Changed bool* aborted to const bool* aborted, as it is an optional, read-only parameter.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment