Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Add synchronization to state changes (Fix Frame Step and FIFO Player - Issue 8718) #3794

Merged
merged 3 commits into from
May 22, 2016

Conversation

EmptyChaos
Copy link
Contributor

@EmptyChaos EmptyChaos commented Apr 24, 2016

[Needs PR #3769 as well]

Alternative to #3712 that fixes the underlying race. The basic problem is that CPU::EnableStepping is a non-atomic function that was only designed to be called from the Host thread so it races when called from the CPU/GPU or any other system thread. One thread will go through and stop the CPU, mute the audio and pause the FIFO while the other unpauses the CPU, unmutes the audio and unpauses the FIFO resulting in a random final state (e.g. CPU running with no video, CPU and video running with no audio). Other functions which depend on EnableStepping like Core::SetState and CPU::Break inherit those bugs.

This PR makes the implicit design choices explicit: CPU::EnableStepping, Core::SetState and Core::PauseAndLock are now Host thread only. CPU::Break has been rewritten as a threadsafe way for system threads to do EnableStepping(true) only.


I've removed the anti-deadlock hack (lock timeout) for testing purposes but that needs to go back in before this can be merged. A known source of deadlocks is wxMsgAlert (CPU PanicAlerts, GUI tries to CORE_PAUSE/PauseAndLock/EnableStepping and deadlocks waiting for the CPU waiting for the GUI waiting for the CPU...), this is somewhat intractable to fix since it requires one of 1) removing wxMsgAlert entirely, 2) shifting Core::SetState/CPU::EnableStepping/Core::PauseAndLock calls onto a dedicated background thread so that the GUI never calls a blocking Core function directly, or 3) spawning an external helper process to display the messages.


This change is Reviewable

@JMC47
Copy link
Contributor

JMC47 commented Apr 24, 2016

This looks like a great alternative to saying "HACK IT."

@lioncash
Copy link
Member

Review status: 0 of 28 files reviewed at latest revision, 9 unresolved discussions.


Source/Core/Core/Core.cpp, line 737 [r1] (raw file):
Leftovers?


Source/Core/Core/Core.h, line 55 [r1] (raw file):
Spaces for comment alignment


Source/Core/Core/Movie.cpp, line 1436 [r1] (raw file):
Is a thread main sounds kind of ambiguous with regards to meaning


Source/Core/Core/Movie.cpp, line 1458 [r1] (raw file):
ditto


Source/Core/Core/HW/CPU.cpp, line 25 [r1] (raw file):
s_cpu_step_instruction


Source/Core/Core/HW/DVDInterface.cpp, line 481 [r1] (raw file):
Leftovers?


Source/Core/Core/PowerPC/PowerPC.cpp, line 259 [r1] (raw file):
spaces instead of tabs in comments for alignment


Source/Core/Core/PowerPC/PowerPC.cpp, line 275 [r1] (raw file):
spaces instead of tabs in comments for alignment


Source/Core/Core/PowerPC/PowerPC.cpp, line 287 [r1] (raw file):
spaces instead of tabs in comments for alignment


Comments from Reviewable

@EmptyChaos
Copy link
Contributor Author

Review status: 0 of 28 files reviewed at latest revision, 9 unresolved discussions.


Source/Core/Core/Core.cpp, line 737 [r1] (raw file):
It's a statement of precondition in the form of a commented assertion (Ideally it would be a real assertion instead but Core::IsHostThread does not exist), I'll change it to English instead of pseudo-code.


Source/Core/Core/Core.h, line 55 [r1] (raw file):
Done.


Source/Core/Core/Movie.cpp, line 1436 [r1] (raw file):
Changed to "entrypoint for own thread" which is less ambiguous?


Source/Core/Core/Movie.cpp, line 1458 [r1] (raw file):
Done.


Source/Core/Core/HW/CPU.cpp, line 25 [r1] (raw file):
Done.


Source/Core/Core/HW/DVDInterface.cpp, line 481 [r1] (raw file):
Done.


Source/Core/Core/PowerPC/PowerPC.cpp, line 259 [r1] (raw file):
Done.


Source/Core/Core/PowerPC/PowerPC.cpp, line 275 [r1] (raw file):
Done.


Source/Core/Core/PowerPC/PowerPC.cpp, line 287 [r1] (raw file):
Done.


Comments from Reviewable

@BhaaLseN
Copy link
Member

Hint, putting squash! <start of the commit message> or fixup! <start of the commit message> lets git rebase -i automatically move them in the right place (as long as you don't specify --no-autosquash, which by default is off but some clients like GitExtensions add it). Squash lets you keep both messages while Fixup discards the second one.


Reviewed 20 of 28 files at r1, 6 of 6 files at r2.
Review status: 26 of 28 files reviewed at latest revision, 6 unresolved discussions.


Source/Core/Core/Core.cpp, line 529 [r2] (raw file):
"That is not going to work" - why? Does it hurt including that bit in the comment?
Such half-assed comments usually always come back to bite you later, like "why is this not going to work?", and should be avoided in general (even when it may not be easy in first place)


Source/Core/Core/Core.h, line 85 [r2] (raw file):
Think you could add some spaces around the = if you happen to come back to that file?


Source/Core/Core/HW/CPU.cpp, line 20 [r2] (raw file):
Mh, should those have a s_ prefix instead for being static?


Source/Core/Core/PowerPC/PowerPC.cpp, line 36 [r2] (raw file):
s_ prefix?


Source/Core/Core/PowerPC/PowerPC.cpp, line 307 [r2] (raw file):
lol, but this really shouldn't be here. Have you looked into how complex deriving from CPUCoreBase would be?


Source/Core/DolphinWX/MainNoGUI.cpp, line 42 [r2] (raw file):
Have you considered using something like this inside Core (or elsewhere) rather than having the Host_RunOnHostThread function? One function to queue events (from any thread), another that runs on the appropriate thread and works through the list.


Comments from Reviewable

@RisingFog
Copy link
Member

@dolphin-emu-bot rebuild

@EmptyChaos
Copy link
Contributor Author

EmptyChaos commented Apr 26, 2016

Review status: 7 of 27 files reviewed at latest revision, 2 unresolved discussions.


Source/Core/Core/Core.cpp, line 529 [r2] (raw file):
The first sentence was meant to be the explanation but I may have leaned to heavily on knowledge of the context (how the FifoPlayer works). The problem is that the FifoPlayer replaces the CPU, so what exactly is supposed to happen when PowerPC::SingleStep is called without a program?

The problem is no longer relevant as it's been resolved through other changes.


Source/Core/Core/Core.h, line 85 [r2] (raw file):
Done.


Source/Core/Core/HW/CPU.cpp, line 20 [r2] (raw file):
Done.


Source/Core/Core/PowerPC/PowerPC.cpp, line 36 [r2] (raw file):
Done.


Source/Core/Core/PowerPC/PowerPC.cpp, line 307 [r2] (raw file):
I wanted to RFC the idea first. I've implemented the change now.


Source/Core/DolphinWX/MainNoGUI.cpp, line 42 [r2] (raw file):
Done. It's an improvement I think; it only needs a Host message (WM_USER_JOB_DISPATCH) to prompt draining of the queue.


Comments from Reviewable

@BhaaLseN
Copy link
Member

Review status: 7 of 27 files reviewed at latest revision, 1 unresolved discussion.


Source/Core/Core/FifoPlayer/FifoPlayer.h, line 55 [r6] (raw file):
Is this Play comment still relevant here?


Comments from Reviewable

@EmptyChaos EmptyChaos force-pushed the frame-advance-race branch 2 times, most recently from 366562a to 5838481 Compare April 27, 2016 12:34
@EmptyChaos
Copy link
Contributor Author

Review status: 7 of 27 files reviewed at latest revision, 1 unresolved discussion.


Source/Core/Core/FifoPlayer/FifoPlayer.h, line 55 [r6] (raw file):
I've clarified it.


Comments from Reviewable

@RisingFog
Copy link
Member

This will need a rebase

@EmptyChaos EmptyChaos force-pushed the frame-advance-race branch 2 times, most recently from da51314 to bc368e8 Compare April 30, 2016 06:54
@mimimi085181
Copy link
Contributor

Just to confirm, i can't reproduce the framestop unpause issue anymore with this. Tested with a fifo log and the intro of twilight princess(wii), where i can reproduce the issue reliably on Dolphin 4.0-9251, but not on the buiild from the buildbot based on Dolphin 4.0-9251 and this pr.

@EmptyChaos EmptyChaos force-pushed the frame-advance-race branch 4 times, most recently from d9c402e to dfc3c8f Compare May 2, 2016 08:04
@EmptyChaos
Copy link
Contributor Author

I've adjusted the architecture since the original version. I elided PowerPC::Start/Stop/Pause into CPU::EnableStepping/Break/Stop and moved PowerPC::state into CPU which simplifies things since the state variable can now be controlled by the CPU state lock. This makes most of CPU threadsafe which wasn't a goal but was basically free.

Has anyone tested the Android build? I don't have a testing setup for it and the change is mildly complex.

I haven't removed PowerPC::GetState/GetStatePtr, they just forward to CPU currently. I'm undecided on what to do with it. Other than that, I believe this is ready to be reviewed now.

@RisingFog
Copy link
Member

This seems to need another rebase

@mimimi085181
Copy link
Contributor

What's the status of this? This change does look really complicated. Is that what's holding it up? Should it be merged after 5.0?

@lioncash and @BhaaLseN: Do you see any issues with this?

@lioncash
Copy link
Member

Review status: 2 of 28 files reviewed at latest revision, 6 unresolved discussions.


Source/Core/Core/FifoPlayer/FifoPlayer.cpp, line 66 [r12] (raw file):

{
public:
  CPUCore(FifoPlayer* parent)
explicit CPUCore(FifoPlayer* parent)

Source/Core/Core/HW/CPU.cpp, line 63 [r12] (raw file):

// Requires holding s_state_change_lock
static inline void FlushStepSyncEventLocked()

Inline isn't necessary here, since it's static.


Source/Core/Core/HW/CPU.cpp, line 151 [r12] (raw file):

// Requires holding s_state_change_lock
static inline void RunAdjacentSystems(bool running)

Ditto


Source/Core/Core/HW/CPU.cpp, line 211 [r12] (raw file):

// Requires s_state_change_lock
static inline bool SetStateLocked(State s)

Ditto


Source/Core/Core/PowerPC/PowerPC.h, line 167 [r12] (raw file):

// detect state changes.
using CPUState = CPU::State;
constexpr CPUState CPU_RUNNING   = CPU::CPU_RUNNING;

Is there any reason why these are here in the header instead of being called/accessed 'long-form'? (or being put in the .cpp file) PowerPC.h is used in a lot of places, and some of them don't actually need these within the namespace (especially if it's pulling things in from another namespace).


Source/Core/DolphinQt2/Main.cpp, line 26 [r12] (raw file):

  // Whenever the event loop is about to go to sleep, dispatch the jobs
  // queued in the Core first.
  QObject::connect( QAbstractEventDispatcher::instance(),

Unnecessary space


Comments from Reviewable

@lioncash
Copy link
Member

@mimimi085181 I'm not opposed to it being merged, provided adequate testing by others for potential subtle regressions is done (it would be nice to include in 5.0, to be honest, since it'd make for a better stable release for TASers).

Some external testing from a few different people might be nice (for example, via other TASers, just to make sure it quashes all of their problems with frame stepping, etc).

Previously, mimimi085181 wrote…

What's the status of this? This change does look really complicated. Is that what's holding it up? Should it be merged after 5.0?

@lioncash and @BhaaLseN: Do you see any issues with this?


Comments from Reviewable

@BhaaLseN
Copy link
Member

I don't see much of an issue source wise, other than the ones @lioncash already pointed out. But tbh, I only skimmed thru them and don't have as much time to spend as I'd like...

Previously, lioncash (Mat M.) wrote…

@mimimi085181 I'm not opposed to it being merged, provided adequate testing by others for potential subtle regressions is done (it would be nice to include in 5.0, to be honest, since it'd make for a better stable release for TASers).

Some external testing from a few different people might be nice (for example, via other TASers, just to make sure it quashes all of their problems with frame stepping, etc).


Review status: 2 of 28 files reviewed at latest revision, 6 unresolved discussions.


Comments from Reviewable

@mimimi085181
Copy link
Contributor

@EmptyChaos: Can you address the latest comments and rebase on top of lastest master, after pr 3810 was merged? I'd like somebody to post a build of this in the TAS community, so both TAS related changes can be tested with one build.

@MoskovchenkoD
Copy link

If someone needs testing, I'll do it. I'm a TASer from TASVideos and ready to test your Work-in-Progress builds. (Sorry if my english is bad)

Fix Frame Advance and FifoPlayer pause/unpause/stop.

CPU::EnableStepping is not atomic but is called from multiple threads
which races and leaves the system in a random state; also instruction
stepping was unstable, m_StepEvent had an almost random value because
of the dual purpose it served which could cause races where CPU::Run
would SingleStep when it was supposed to be sleeping.

FifoPlayer never FinishStateMove()d which was causing it to deadlock.
Rather than partially reimplementing CPU::Run, just use CPUCoreBase
and then call CPU::Run(). More DRY and less likely to have weird bugs
specific to the player (i.e the previous freezing on pause/stop).

Refactor PowerPC::state into CPU since it manages the state of the
CPU Thread which is controlled by CPU, not PowerPC. This simplifies
the architecture somewhat and eliminates races that can be caused by
calling PowerPC state functions directly instead of using CPU's
(because they bypassed the EnableStepping lock).
Fix TASInputDlg which was trying to access the GUI without the GUI
lock from the CPU Thread.
EndPlayInput runs on the CPU thread so it can't directly call
UpdateWantDeterminism. PlayController also tries to ChangeDisc
from the CPU Thread which is also invalid. It now just pauses
execution and posts a request to the Host to fix it instead.

The Core itself also did dodgy things like PauseAndLock-ing
from the CPU Thread and SetState from EmuThread which have been
removed.
@mimimi085181
Copy link
Contributor

@EmptyChaos: Thank you very much.

@Dimon12321: Now it's ready for testing. You can download the windows version of the test build from here:
https://dl.dolphin-emu.org/prs/pr-3794-dolphin-latest-x64.7z

Please test if frame advance works properly, if you set frame advance to a hotkey, and use that hotkey instead of the menu option. Also, if you have the Wii U Gamecube Controller adapter, please test with that as well.

@EmptyChaos
Copy link
Contributor Author

This problem is the 'gift' that keeps on giving. I found a potential race I missed earlier in Core::PauseAndLock so that's been partially rewritten and needs to be reviewed again.

Previously, mimimi085181 wrote…

@EmptyChaos: Thank you very much.

@Dimon12321: Now it's ready for testing. You can download the windows version of the test build from here:

https://dl.dolphin-emu.org/prs/pr-3794-dolphin-latest-x64.7z

Please test if frame advance works properly, if you set frame advance to a hotkey, and use that hotkey instead of the menu option. Also, if you have the Wii U Gamecube Controller adapter, please test with that as well.


Review status: 2 of 36 files reviewed at latest revision, 6 unresolved discussions.


Source/Core/Core/FifoPlayer/FifoPlayer.cpp, line 66 [r12] (raw file):

Previously, lioncash (Mathew Maidment) wrote…

explicit CPUCore(FifoPlayer* parent)
```</details>
Done.

Source/Core/Core/HW/CPU.cpp, line 63 [r12] (raw file):

Previously, lioncash (Mathew Maidment) wrote…

Inline isn't necessary here, since it's static.


Done.


Source/Core/Core/HW/CPU.cpp, line 151 [r12] (raw file):

Previously, lioncash (Mathew Maidment) wrote…

Ditto


Done.


Source/Core/Core/HW/CPU.cpp, line 211 [r12] (raw file):

Previously, lioncash (Mathew Maidment) wrote…

Ditto


Done.


Source/Core/Core/PowerPC/PowerPC.h, line 167 [r12] (raw file):

Previously, lioncash (Mathew Maidment) wrote…

Is there any reason why these are here in the header instead of being called/accessed 'long-form'? (or being put in the .cpp file) PowerPC.h is used in a lot of places, and some of them don't actually need these within the namespace (especially if it's pulling things in from another namespace).


I've removed the block and made the consumers include HW/CPU.h directly instead.


Source/Core/DolphinQt2/Main.cpp, line 26 [r12] (raw file):

Previously, lioncash (Mathew Maidment) wrote…

Unnecessary space


That was for alignment. Removed.


Comments from Reviewable

@MoskovchenkoD
Copy link

A very small group of people from TASVideos (and me too) have tested your build already. We don't know what else are you going to change in the build, but now Frame Advance works just fine! We got NO random unpauses. Probably RisingFog will add his word there if needed.

@RisingFog
Copy link
Member

What's left to be done before this is ready to merge?

@EmptyChaos
Copy link
Contributor Author

@Dimon12321 Thanks for testing.

@RisingFog I think (hope) this is done now, I've resolved all the races I could find. There could be more of course, but I haven't identified any other issues currently. The MsgHandler problem is a legacy architecture issue that will need to be fixed later.

@JMC47
Copy link
Contributor

JMC47 commented May 19, 2016

Everyone has been testing and reviewing this for a while and I don't know of any big flaws. TASers have tested it, I've verified it to fix the issues I've run into, and we do know that it fixes some serious bugs and race conditions.

The question that remains is: can this affect things that we're not anticipating to cause regressions? Considering that we're trying to gear up for 5.0 and this fixes a blocker, a decision has to be made.

If we do decide to merge it, the sooner we merge it the more likely we are to catch any of those potential unanticipated bugs.

@lioncash
Copy link
Member

Review status: 2 of 36 files reviewed at latest revision, 1 unresolved discussion.


Source/Core/Core/HW/CPU.cpp, line 166 [r13] (raw file):

  s_state = CPU_POWERDOWN;
  s_state_cpu_cvar.notify_one();
  // FIXME: MsgHandler can cause this to deadlock the GUI Thread. Remove the timeout.

Now that I look over this again, it's slightly worrying to potentially include this in 5.0 with the small possibility of deadlocks (if this was pre-existing with the old code to begin with, then I don't particularly have a problem here).


Comments from Reviewable

@JMC47
Copy link
Contributor

JMC47 commented May 20, 2016

I could deadlock the GUI thread right now through that. I run into the bug all the time. It's not a regression in this Pull Request.

@EmptyChaos
Copy link
Contributor Author

Review status: 2 of 36 files reviewed at latest revision, 1 unresolved discussion.


Source/Core/Core/HW/CPU.cpp, line 166 [r13] (raw file):

Previously, lioncash (Mat M.) wrote…

Now that I look over this again, it's slightly worrying to potentially include this in 5.0 with the small possibility of deadlocks (if this was pre-existing with the old code to begin with, then I don't particularly have a problem here).

Like JMC47 said, the `MsgHandler` problem is a pre-existing flaw that I don't see an easy, clean way to fix. IMHO, the best fix is to separate the GUI from the Core using an asynchronous intermediary background thread but implementing that is a large, disruptive patch in itself.

Overall, this PR handles the deadlock better than master because master lacks a timeout in CPU::PauseAndLock so it freezes permanently whereas this code can recover if you wait 10 seconds for the timeout.


Comments from Reviewable

@lioncash
Copy link
Member

Alright, given this fixes quite a few races and underlying issues, as well as making stable a much better experience for TASers (and if any regressions related to TASing happens in later dev versions, at least they have a more modern stable release to fall back to as a last resort), I'd like to include this in 5.0. Particularly because it handles a deadlock case more gracefully than what master currently does.

I'll give everyone 24 hours to leave concerns or arguments against doing this.

Previously, JMC47 wrote…

I could deadlock the GUI thread right now through that. I run into the bug all the time. It's not a regression in this Pull Request.


Review status: 2 of 36 files reviewed at latest revision, 1 unresolved discussion.


Source/Core/Core/HW/CPU.cpp, line 166 [r13] (raw file):

Previously, EmptyChaos wrote…

Like JMC47 said, the MsgHandler problem is a pre-existing flaw that I don't see an easy, clean way to fix. IMHO, the best fix is to separate the GUI from the Core using an asynchronous intermediary background thread but implementing that is a large, disruptive patch in itself.

Overall, this PR handles the deadlock better than master because master lacks a timeout in CPU::PauseAndLock so it freezes permanently whereas this code can recover if you wait 10 seconds for the timeout.

OK, that's more reassuring. I think I'd be more comfortable including this in 5.0 in that case.

Comments from Reviewable

@JMC47
Copy link
Contributor

JMC47 commented May 21, 2016

Triggered the gui"crash" in this build; it recovered after a while and I didn't lost all my progress in "Bratz: Rock Angelz"

I'm now heavily, heavily favoring merging this. It works around a crash I see 3 - 5 times a week in Dolphin!

@lioncash lioncash merged commit 08d45b9 into dolphin-emu:master May 22, 2016
@lioncash
Copy link
Member

Grace period is over with no objections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
7 participants