New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mupen64Plus core crash can result in an infinite blocking operation on the main thread #1214
Comments
… infinite and possibly infinitely recursing loop? I dunno, but I changed at least one infinite loop to a kind of administrative 'end frame' so at least it doesn't hang the emulator. If it has a way of crashing or halting, we should use that instead, but I couldn't figure it out.txt fixes #1362
Define "crashing internally". |
When I say crashing internally, I mean any form of exception within the core that causes the thread to abort. I don't have any good ways to replicate it during normal gameplay, but I have run into it before. I can provide some savestates which will kill the mupen thread and result in the main thread being blocked forever (intentional memory corruption within the emulated game which results in those exceptions being thrown) if those would suffice |
memory access violations from If you crash the core, you WILL bring the emulator down. We're not fixing this for N64, because this problem affects any unmanaged core (except for waterboxed cores). The emulator may corrupt its static state, even if we managed to survive the thread crashing. I do not care to address this. Fixing individual causes of crashes does interest me a little. m64pCoreSaveState should never crash. if that crashes, we should fix that crash. |
When I have some time, I'll try and pull together some savestates that result in crashing on load. I'll also see if I can throw together a script that can induce the m64pCoreSaveState crash on the main branch just to be 100% sure that the custom code we've slapped on top of bizhawk isn't inducing any of these issues. |
Well, if you're inserting garbage than I would hope that it would crash sooner so I can say I won't even investigate it. But if it happens during saving, I feel it's something we should look at, regardless, since it could be making a simple mistake to magnify otherwise benign stuff into additional crashes. |
nullbytes just gave me a scenario essentially identical to the one I was using to test the infinite loop, and it made the jit crash simply due to it not being robust enough to handle buggy games. I certainly won't be fixing that one, even though it seems like one in the category that could be fixed (just for this core) by trying to catch the dead thread. That's the category I don't care about. |
OK, he said it crashed the interpreter too, and that offended me enough to investigate it. I then discovered it was reading the rom out of range and that REALLY offended me. So I'm fixing it |
addressed by d4aceb2 |
I haven't been able to land the savestate crash (I know it's possible as I've caught it in the debugger before, and I'm testing against a build I caught it in), but here's two situations I found in the process which completely kill the interpreter (pure interpreter mode). Load up Banjo Kazooie 1.0 (U) Wave Race 64 1.0 U Obviously it'd be better if I could find actual in-game cases which caused the situation, but since you said the interpreter crashing offended you enough to investigate what was going on internally, thought I might as well throw these two cases at you while I see if I can hunt down a case to trigger the savestate issue. |
…and shunt through my don't-halt-after-all logic. these are now all associated with error messages; it would be nice if someone knew a way to get those error messages out to c#. then again, that might make buggy code become hopelessly slow due to error spew re #1214
I didnt check your cases yet, this was inspired by another one from nullbytes. see if this helps |
The banjo case is fixed. Wave Race didn't crash with those exact steps, but I did run into other issues with wave race.
Same setup. It seems like if you just load up the game, load up a stage (mash a), open the hex editor go to 0xB120, then hold down F so it writes a ton of FF you'll run into access violations within around a second. It's not 100% consistent in address from run to run, but that range seems to be volatile. |
In 2 tries I got 2 different crashes inside the RSP HLE. About half a second of inspection reveals its code is as solid a mass of crash bugs as buzzy beetle drunk driving a mario kart. There's no way I'm fixing this. It's just fundamentally not engineered for any resilience at all. You might get some mileage if you spent several hours hacking at it, but youll probably still miss something. In one of those cases the crash was due to an assert from a misaligned access. OK, so someone has to figure out what to do in case of a misaligned access? This just needs to be part of LAWS OF TAS Subchapter on Core Quality: cores should be resilient against fuzzing. cc'ing @adelikat because someone should make note of that, in their head if nothing else |
Well of the unmanaged, non-waterbox cores, the only really bad ones are Mupen and VBA-Next. On the managed side, some of the less cared-for cores are pretty bad PCEHawk dies nearly instantly from fuzzing the 21 bit bus (you do have to override read-only so maybe that's good enough?) #1363 Since I'm using the RTC to do this testing, I am on an outdated build of Bizhawk (based on 2.2.1 and has relevant changes merged on top of it), but it looks like none of those managed cores have been touched where the issues show. |
FYI the past few commits should have addressed this pretty thoroughly |
Current fixes seem to have overcompensated in the reverse direction: Bizhawk no longer hard crashes in cases where the N64 would halt, but it doesn't halt execution like it should either. |
so change "bail if it did" to some kind of new logic in the n64 core that marks it as dead and turns everything into no-ops |
I'm reading "no longer crashes EmuHawk" which, while low, is the bar we're aiming for. Closing. |
In the case of Mupen64Plus, when calling the unmanaged frame advance function from within the C# interface, a method named HackyPinvokeWaitOne is used to wait for the response from the unmanaged core. This method is essentially just a modified version of WaitHandle.WaitOne(), which blocks the thread until the WaitHandle receives a response.
Within the main interface,
BizHawk.Emulation.Cores/Consoles/Nintendo/N64/NativeApi/mupen64plusCoreApi.cs
, you have the Frame_Advance method which consists of an infinite loop. The main operation within this frame advance is calling the methodBizHawk.Common.Win32ThreadHacks.HackyPinvokeWaitOne(m64pEvent);
In the case of Mupen64Plus crashing internally, that WaitHandle() never gets a response, and thus, the main emulation thread is blocked forever as the default timeout value is Int32.MaxValue
Even when you change the timeout to something such as 1000ms, the issue persists as the Bizhawk side isn't aware that the core has crashed and keeps going through the Frame_Advance() loop.
This could be fixed by changing the timeout to something reasonable and then breaking out of the loop, but I'm not sure if there's a better solution to the problem as I don't know much about managed > unmanaged interop.
The text was updated successfully, but these errors were encountered: