Mupen64Plus core crash can result in an infinite blocking operation on the main thread #1214

NarryG · 2018-06-11T04:14:30Z

In the case of Mupen64Plus, when calling the unmanaged frame advance function from within the C# interface, a method named HackyPinvokeWaitOne is used to wait for the response from the unmanaged core. This method is essentially just a modified version of WaitHandle.WaitOne(), which blocks the thread until the WaitHandle receives a response.

Within the main interface, BizHawk.Emulation.Cores/Consoles/Nintendo/N64/NativeApi/mupen64plusCoreApi.cs, you have the Frame_Advance method which consists of an infinite loop. The main operation within this frame advance is calling the method
BizHawk.Common.Win32ThreadHacks.HackyPinvokeWaitOne(m64pEvent);

In the case of Mupen64Plus crashing internally, that WaitHandle() never gets a response, and thus, the main emulation thread is blocked forever as the default timeout value is Int32.MaxValue

public static bool HackyPinvokeWaitOne(WaitHandle handle)
{
     NativeMethods.WaitForSingleObject(handle.SafeWaitHandle, 0xFFFFFFFF);
}

Even when you change the timeout to something such as 1000ms, the issue persists as the Bizhawk side isn't aware that the core has crashed and keeps going through the Frame_Advance() loop.

This could be fixed by changing the timeout to something reasonable and then breaking out of the loop, but I'm not sure if there's a better solution to the problem as I don't know much about managed > unmanaged interop.

The text was updated successfully, but these errors were encountered:

… infinite and possibly infinitely recursing loop? I dunno, but I changed at least one infinite loop to a kind of administrative 'end frame' so at least it doesn't hang the emulator. If it has a way of crashing or halting, we should use that instead, but I couldn't figure it out.txt fixes #1362

zeromus · 2018-11-02T03:08:17Z

Define "crashing internally".
If I can see it happen, I can manage the signals properly

NarryG · 2018-11-02T03:14:59Z

When I say crashing internally, I mean any form of exception within the core that causes the thread to abort.
The two I've run into are memory access violations which stem from m64pCoreSaveState(buffer) and m64pCoreDoCommandPtr(m64p_command.M64CMD_EXECUTE, 0, Marshal.GetFunctionPointerForDelegate(cb));

I don't have any good ways to replicate it during normal gameplay, but I have run into it before.

I can provide some savestates which will kill the mupen thread and result in the main thread being blocked forever (intentional memory corruption within the emulated game which results in those exceptions being thrown) if those would suffice

zeromus · 2018-11-02T03:29:46Z

memory access violations from m64pCoreDoCommandPtr(m64p_command.M64CMD_EXECUTE, 0, Marshal.GetFunctionPointerForDelegate(cb)); are just "program crashes" without any more specific detail.

If you crash the core, you WILL bring the emulator down. We're not fixing this for N64, because this problem affects any unmanaged core (except for waterboxed cores). The emulator may corrupt its static state, even if we managed to survive the thread crashing. I do not care to address this.

Fixing individual causes of crashes does interest me a little.

m64pCoreSaveState should never crash. if that crashes, we should fix that crash.

NarryG · 2018-11-02T04:58:15Z

When I have some time, I'll try and pull together some savestates that result in crashing on load. I'll also see if I can throw together a script that can induce the m64pCoreSaveState crash on the main branch just to be 100% sure that the custom code we've slapped on top of bizhawk isn't inducing any of these issues.
A basic script to insert garbage into the emulated RDRAM should do the trick to spawn both of these issues (the savestate crash happens when it makes the rewind state), but I'll see if I can find something more reliable that doesn't involve external interference.

zeromus · 2018-11-02T05:16:12Z

Well, if you're inserting garbage than I would hope that it would crash sooner so I can say I won't even investigate it. But if it happens during saving, I feel it's something we should look at, regardless, since it could be making a simple mistake to magnify otherwise benign stuff into additional crashes.

zeromus · 2018-11-02T20:01:39Z

nullbytes just gave me a scenario essentially identical to the one I was using to test the infinite loop, and it made the jit crash simply due to it not being robust enough to handle buggy games. I certainly won't be fixing that one, even though it seems like one in the category that could be fixed (just for this core) by trying to catch the dead thread. That's the category I don't care about.

zeromus · 2018-11-02T20:26:08Z

OK, he said it crashed the interpreter too, and that offended me enough to investigate it. I then discovered it was reading the rom out of range and that REALLY offended me. So I'm fixing it

zeromus · 2018-11-02T20:29:07Z

addressed by d4aceb2

NarryG · 2018-11-03T02:05:37Z

I haven't been able to land the savestate crash (I know it's possible as I've caught it in the debugger before, and I'm testing against a build I caught it in), but here's two situations I found in the process which completely kill the interpreter (pure interpreter mode).

Load up Banjo Kazooie 1.0 (U)
Once banjo pulls out his banjo, set 0x145146 within the RDRAM to E8
An AccessViolationException for r/w protected memory will be thrown within Mupen

Wave Race 64 1.0 U
Load into practice mode with all the default settings (just mash a)
Set 0xB196 in the RDRam to 4C
AccessViolationException

Obviously it'd be better if I could find actual in-game cases which caused the situation, but since you said the interpreter crashing offended you enough to investigate what was going on internally, thought I might as well throw these two cases at you while I see if I can hunt down a case to trigger the savestate issue.

…and shunt through my don't-halt-after-all logic. these are now all associated with error messages; it would be nice if someone knew a way to get those error messages out to c#. then again, that might make buggy code become hopelessly slow due to error spew re #1214

zeromus · 2018-11-03T18:04:59Z

I didnt check your cases yet, this was inspired by another one from nullbytes. see if this helps

NarryG · 2018-11-03T23:40:44Z

The banjo case is fixed. Wave Race didn't crash with those exact steps, but I did run into other issues with wave race.
Obviously I'm writing garbage so you'd expect the game to crash, but it's lead to exceptions in two different modules (albeit very close in memory)

Exception thrown at 0x00007FFA76F11114 (mupen64plus-rsp-hle.dll) in EmuHawk.exe: 0xC0000005: Access violation writing location 0x00007FFA76F28118.
and
Exception thrown at 0x00007FFA67A2C369 (msvcr120.dll) in EmuHawk.exe: 0xC0000005: Access violation writing location 0x00007FFA76F26000.

Same setup. It seems like if you just load up the game, load up a stage (mash a), open the hex editor go to 0xB120, then hold down F so it writes a ton of FF you'll run into access violations within around a second. It's not 100% consistent in address from run to run, but that range seems to be volatile.

zeromus · 2018-11-04T10:49:46Z

In 2 tries I got 2 different crashes inside the RSP HLE. About half a second of inspection reveals its code is as solid a mass of crash bugs as buzzy beetle drunk driving a mario kart. There's no way I'm fixing this. It's just fundamentally not engineered for any resilience at all. You might get some mileage if you spent several hours hacking at it, but youll probably still miss something. In one of those cases the crash was due to an assert from a misaligned access. OK, so someone has to figure out what to do in case of a misaligned access? This just needs to be part of LAWS OF TAS Subchapter on Core Quality: cores should be resilient against fuzzing. cc'ing @adelikat because someone should make note of that, in their head if nothing else

NarryG · 2018-11-04T18:15:36Z

Well of the unmanaged, non-waterbox cores, the only really bad ones are Mupen and VBA-Next.
QuickNES has crashed with an exception on me a whole two times in two years
Gambatte had a few but they were fixed.
mGBA and the Mednafen based cores are 100% clean

On the managed side, some of the less cared-for cores are pretty bad

PCEHawk dies nearly instantly from fuzzing the 21 bit bus (you do have to override read-only so maybe that's good enough?) #1363
Virtu (Apple II) will die from fuzzing. Open issue from a while ago here #1169
Atarti2600Hawk will die from fuzzing relatively quickly (generic death message with no useful call stack so I couldn't open an issue)

Since I'm using the RTC to do this testing, I am on an outdated build of Bizhawk (based on 2.2.1 and has relevant changes merged on top of it), but it looks like none of those managed cores have been touched where the issues show.
I've also overridden various read-only protections since the entire point is to break games by writing garbage (so we don't care about read-only). There's a chance no game could ever realistically hit these cases, but you never know if a buggy game could run into them.

zeromus · 2018-11-08T00:59:47Z

FYI the past few commits should have addressed this pretty thoroughly

MrCheeze · 2019-12-17T02:58:04Z

Current fixes seem to have overcompensated in the reverse direction: Bizhawk no longer hard crashes in cases where the N64 would halt, but it doesn't halt execution like it should either.

zeromus · 2019-12-17T03:02:02Z

so change "bail if it did" to some kind of new logic in the n64 core that marks it as dead and turns everything into no-ops

YoshiRulz · 2022-05-02T14:39:57Z

I'm reading "no longer crashes EmuHawk" which, while low, is the bar we're aiming for. Closing.

zeromus added the Core: Mupen64Plus Nintendo 64 (N64) core label Nov 2, 2018

YoshiRulz added the Repro: Patch pending Potentially fixed in dev build, see readme for download label Jan 31, 2019

YoshiRulz added this to the Mupen64+ Update milestone Feb 4, 2019

YoshiRulz closed this as completed May 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mupen64Plus core crash can result in an infinite blocking operation on the main thread #1214

Mupen64Plus core crash can result in an infinite blocking operation on the main thread #1214

NarryG commented Jun 11, 2018 •

edited

zeromus commented Nov 2, 2018

NarryG commented Nov 2, 2018 •

edited

zeromus commented Nov 2, 2018

NarryG commented Nov 2, 2018 •

edited

zeromus commented Nov 2, 2018

zeromus commented Nov 2, 2018

zeromus commented Nov 2, 2018

zeromus commented Nov 2, 2018 •

edited

NarryG commented Nov 3, 2018 •

edited

zeromus commented Nov 3, 2018

NarryG commented Nov 3, 2018 •

edited

zeromus commented Nov 4, 2018

NarryG commented Nov 4, 2018 •

edited

zeromus commented Nov 8, 2018

MrCheeze commented Dec 17, 2019

zeromus commented Dec 17, 2019

YoshiRulz commented May 2, 2022

Mupen64Plus core crash can result in an infinite blocking operation on the main thread #1214

Mupen64Plus core crash can result in an infinite blocking operation on the main thread #1214

Comments

NarryG commented Jun 11, 2018 • edited

zeromus commented Nov 2, 2018

NarryG commented Nov 2, 2018 • edited

zeromus commented Nov 2, 2018

NarryG commented Nov 2, 2018 • edited

zeromus commented Nov 2, 2018

zeromus commented Nov 2, 2018

zeromus commented Nov 2, 2018

zeromus commented Nov 2, 2018 • edited

NarryG commented Nov 3, 2018 • edited

zeromus commented Nov 3, 2018

NarryG commented Nov 3, 2018 • edited

zeromus commented Nov 4, 2018

NarryG commented Nov 4, 2018 • edited

zeromus commented Nov 8, 2018

MrCheeze commented Dec 17, 2019

zeromus commented Dec 17, 2019

YoshiRulz commented May 2, 2022

NarryG commented Jun 11, 2018 •

edited

NarryG commented Nov 2, 2018 •

edited

NarryG commented Nov 2, 2018 •

edited

zeromus commented Nov 2, 2018 •

edited

NarryG commented Nov 3, 2018 •

edited

NarryG commented Nov 3, 2018 •

edited

NarryG commented Nov 4, 2018 •

edited