Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mupen64Plus core crash can result in an infinite blocking operation on the main thread #1214

Open
NarryG opened this issue Jun 11, 2018 · 14 comments

Comments

Projects
None yet
3 participants
@NarryG
Copy link
Contributor

commented Jun 11, 2018

In the case of Mupen64Plus, when calling the unmanaged frame advance function from within the C# interface, a method named HackyPinvokeWaitOne is used to wait for the response from the unmanaged core. This method is essentially just a modified version of WaitHandle.WaitOne(), which blocks the thread until the WaitHandle receives a response.

Within the main interface, BizHawk.Emulation.Cores/Consoles/Nintendo/N64/NativeApi/mupen64plusCoreApi.cs, you have the Frame_Advance method which consists of an infinite loop. The main operation within this frame advance is calling the method
BizHawk.Common.Win32ThreadHacks.HackyPinvokeWaitOne(m64pEvent);

In the case of Mupen64Plus crashing internally, that WaitHandle() never gets a response, and thus, the main emulation thread is blocked forever as the default timeout value is Int32.MaxValue

public static bool HackyPinvokeWaitOne(WaitHandle handle)
{
     NativeMethods.WaitForSingleObject(handle.SafeWaitHandle, 0xFFFFFFFF);
}  

Even when you change the timeout to something such as 1000ms, the issue persists as the Bizhawk side isn't aware that the core has crashed and keeps going through the Frame_Advance() loop.

This could be fixed by changing the timeout to something reasonable and then breaking out of the loop, but I'm not sure if there's a better solution to the problem as I don't know much about managed > unmanaged interop.

NarryG referenced this issue Nov 2, 2018

does the n64 jit have a concept of crashing, or does it just go in an…
… infinite and possibly infinitely recursing loop? I dunno, but I changed at least one infinite loop to a kind of administrative 'end frame' so at least it doesn't hang the emulator. If it has a way of crashing or halting, we should use that instead, but I couldn't figure it out.txt

fixes #1362
@zeromus

This comment has been minimized.

Copy link
Contributor

commented Nov 2, 2018

Define "crashing internally".
If I can see it happen, I can manage the signals properly

@NarryG

This comment has been minimized.

Copy link
Contributor Author

commented Nov 2, 2018

When I say crashing internally, I mean any form of exception within the core that causes the thread to abort.
The two I've run into are memory access violations which stem from m64pCoreSaveState(buffer) and m64pCoreDoCommandPtr(m64p_command.M64CMD_EXECUTE, 0, Marshal.GetFunctionPointerForDelegate(cb));

I don't have any good ways to replicate it during normal gameplay, but I have run into it before.

I can provide some savestates which will kill the mupen thread and result in the main thread being blocked forever (intentional memory corruption within the emulated game which results in those exceptions being thrown) if those would suffice

@zeromus

This comment has been minimized.

Copy link
Contributor

commented Nov 2, 2018

memory access violations from m64pCoreDoCommandPtr(m64p_command.M64CMD_EXECUTE, 0, Marshal.GetFunctionPointerForDelegate(cb)); are just "program crashes" without any more specific detail.

If you crash the core, you WILL bring the emulator down. We're not fixing this for N64, because this problem affects any unmanaged core (except for waterboxed cores). The emulator may corrupt its static state, even if we managed to survive the thread crashing. I do not care to address this.

Fixing individual causes of crashes does interest me a little.

m64pCoreSaveState should never crash. if that crashes, we should fix that crash.

@NarryG

This comment has been minimized.

Copy link
Contributor Author

commented Nov 2, 2018

When I have some time, I'll try and pull together some savestates that result in crashing on load. I'll also see if I can throw together a script that can induce the m64pCoreSaveState crash on the main branch just to be 100% sure that the custom code we've slapped on top of bizhawk isn't inducing any of these issues.
A basic script to insert garbage into the emulated RDRAM should do the trick to spawn both of these issues (the savestate crash happens when it makes the rewind state), but I'll see if I can find something more reliable that doesn't involve external interference.

@zeromus

This comment has been minimized.

Copy link
Contributor

commented Nov 2, 2018

Well, if you're inserting garbage than I would hope that it would crash sooner so I can say I won't even investigate it. But if it happens during saving, I feel it's something we should look at, regardless, since it could be making a simple mistake to magnify otherwise benign stuff into additional crashes.

@zeromus

This comment has been minimized.

Copy link
Contributor

commented Nov 2, 2018

nullbytes just gave me a scenario essentially identical to the one I was using to test the infinite loop, and it made the jit crash simply due to it not being robust enough to handle buggy games. I certainly won't be fixing that one, even though it seems like one in the category that could be fixed (just for this core) by trying to catch the dead thread. That's the category I don't care about.

@zeromus

This comment has been minimized.

Copy link
Contributor

commented Nov 2, 2018

OK, he said it crashed the interpreter too, and that offended me enough to investigate it. I then discovered it was reading the rom out of range and that REALLY offended me. So I'm fixing it

@zeromus

This comment has been minimized.

Copy link
Contributor

commented Nov 2, 2018

addressed by d4aceb2

@NarryG

This comment has been minimized.

Copy link
Contributor Author

commented Nov 3, 2018

I haven't been able to land the savestate crash (I know it's possible as I've caught it in the debugger before, and I'm testing against a build I caught it in), but here's two situations I found in the process which completely kill the interpreter (pure interpreter mode).

Load up Banjo Kazooie 1.0 (U)
Once banjo pulls out his banjo, set 0x145146 within the RDRAM to E8
An AccessViolationException for r/w protected memory will be thrown within Mupen

Wave Race 64 1.0 U
Load into practice mode with all the default settings (just mash a)
Set 0xB196 in the RDRam to 4C
AccessViolationException

Obviously it'd be better if I could find actual in-game cases which caused the situation, but since you said the interpreter crashing offended you enough to investigate what was going on internally, thought I might as well throw these two cases at you while I see if I can hunt down a case to trigger the savestate issue.

zeromus added a commit that referenced this issue Nov 3, 2018

trap more cases where n64 emulation can halt when running buggy code …
…and shunt through my don't-halt-after-all logic. these are now all associated with error messages; it would be nice if someone knew a way to get those error messages out to c#. then again, that might make buggy code become hopelessly slow due to error spew

re #1214
@zeromus

This comment has been minimized.

Copy link
Contributor

commented Nov 3, 2018

I didnt check your cases yet, this was inspired by another one from nullbytes. see if this helps

@NarryG

This comment has been minimized.

Copy link
Contributor Author

commented Nov 3, 2018

The banjo case is fixed. Wave Race didn't crash with those exact steps, but I did run into other issues with wave race.
Obviously I'm writing garbage so you'd expect the game to crash, but it's lead to exceptions in two different modules (albeit very close in memory)

Exception thrown at 0x00007FFA76F11114 (mupen64plus-rsp-hle.dll) in EmuHawk.exe: 0xC0000005: Access violation writing location 0x00007FFA76F28118.
and
Exception thrown at 0x00007FFA67A2C369 (msvcr120.dll) in EmuHawk.exe: 0xC0000005: Access violation writing location 0x00007FFA76F26000.

Same setup. It seems like if you just load up the game, load up a stage (mash a), open the hex editor go to 0xB120, then hold down F so it writes a ton of FF you'll run into access violations within around a second. It's not 100% consistent in address from run to run, but that range seems to be volatile.

@zeromus

This comment has been minimized.

Copy link
Contributor

commented Nov 4, 2018

In 2 tries I got 2 different crashes inside the RSP HLE. About half a second of inspection reveals its code is as solid a mass of crash bugs as buzzy beetle drunk driving a mario kart. There's no way I'm fixing this. It's just fundamentally not engineered for any resilience at all. You might get some mileage if you spent several hours hacking at it, but youll probably still miss something. In one of those cases the crash was due to an assert from a misaligned access. OK, so someone has to figure out what to do in case of a misaligned access? This just needs to be part of LAWS OF TAS Subchapter on Core Quality: cores should be resilient against fuzzing. cc'ing @adelikat because someone should make note of that, in their head if nothing else

@NarryG

This comment has been minimized.

Copy link
Contributor Author

commented Nov 4, 2018

Well of the unmanaged, non-waterbox cores, the only really bad ones are Mupen and VBA-Next.
QuickNES has crashed with an exception on me a whole two times in two years
Gambatte had a few but they were fixed.
mGBA and the Mednafen based cores are 100% clean

On the managed side, some of the less cared-for cores are pretty bad

PCEHawk dies nearly instantly from fuzzing the 21 bit bus (you do have to override read-only so maybe that's good enough?) #1363
Virtu (Apple II) will die from fuzzing. Open issue from a while ago here #1169
Atarti2600Hawk will die from fuzzing relatively quickly (generic death message with no useful call stack so I couldn't open an issue)

Since I'm using the RTC to do this testing, I am on an outdated build of Bizhawk (based on 2.2.1 and has relevant changes merged on top of it), but it looks like none of those managed cores have been touched where the issues show.
I've also overridden various read-only protections since the entire point is to break games by writing garbage (so we don't care about read-only). There's a chance no game could ever realistically hit these cases, but you never know if a buggy game could run into them.

@zeromus

This comment has been minimized.

Copy link
Contributor

commented Nov 8, 2018

FYI the past few commits should have addressed this pretty thoroughly

@YoshiRulz YoshiRulz added this to the Mupen64+ Update milestone Feb 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.