Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Lua] Memory Corruption on 1.13.0 #867

Closed
Isotarge opened this issue Jun 4, 2017 · 19 comments
Closed

[Lua] Memory Corruption on 1.13.0 #867

Isotarge opened this issue Jun 4, 2017 · 19 comments
Labels
Heisenbug Bug that can be reproduced, but not consistently Meta Relating to code organisation or to things that aren't code re: Lua API/scripting Relating to EmuHawk's Lua API (not the Lua Console)

Comments

@Isotarge
Copy link
Contributor

Isotarge commented Jun 4, 2017

Just downloaded BizHawk 1.13.0 and tried running ScriptHawk with it. It runs fine for a couple of seconds and then throws this exception in the Lua console:

LuaInterface.LuaScriptException: [string "main"]:1370: attempt to index a nil value

The line referenced is an emu.yield() call inside a while true loop at the bottom of ScriptHawk.lua used to refresh the ScriptHawk OSD when the emulator is paused.

Running the TAS safe version of ScriptHawk completely locks up the emulator and requires a force-quit.

This happens with N64, SMS, and GBC modules. Both versions of ScriptHawk worked fine in 1.12.2.

I've looked over the changelog for 1.13.0 and didn't see any Lua API changes that would break ScriptHawk so I'm fairly sure it's something broken on BizHawk's end. Happy to change something on ScriptHawk's end though if I've done something I shouldn't have.

Cheers, Isotarge.

@zeromus
Copy link
Contributor

zeromus commented Jun 4, 2017

This script can repro the issue:
while true do input.getmouse(); emu.yield(); end
The cause is memory corruption at some point.
You can change getmouse to this to make it crash immediately due to stressing someone's GC harder

public int GetMouse() {
  var buttons = Lua.NewTable();
  for(int i=0;i<10000;i++) buttons[i.ToString()] = 0;
  return 0;
}

Our lua integration is garbage.

8bc067c seems to cause this.

Fuck this, lua is broken. Don't use lua. Give up. I'll figure this out some day after I cool down. Revert that commit if you'd rather have lua whose brokenness you havent located yet, and crashes due to memory leaks, instead of luas that crash after 2 seconds.

@Isotarge Isotarge changed the title [Lua] emu.yield() exception with ScriptHawk on 1.13.0 [Lua] Memory Corruption on 1.13.0 Jun 15, 2017
@PikalaxALT
Copy link

PikalaxALT commented Jun 18, 2017

Not sure if related, but emu.frameadvance() and emu.yield() return a .NET exception on the CGB core when called inside an event callback:

error running function attached by the event OnMemoryExecute
Error message: A .NET exception occured in user-code

@zeromus
Copy link
Contributor

zeromus commented Jul 10, 2017

I've begun testing NLua+KopiLua in place of LuaInterface+Lua.
Please evaluate.

@zeromus zeromus added re: Lua API/scripting Relating to EmuHawk's Lua API (not the Lua Console) Priority: Critical Bug (GoogleCode) Grandfathered label from GoogleCode. Use the other bug labels. labels Jul 10, 2017
@vadosnaprimer
Copy link
Contributor

vadosnaprimer commented Jul 10, 2017

It doesn't seem to like dropping optional parameters (tried with gui.drawBox and gui.pixelText).

text(1, 1, emu.framecount(), "white", "black") won't work, it wants tostring(emu.framecount()) for text output.

Also right now lua is 4 times slower than with LuaInterface. My script that had heavy calculations and was sped up by caching them, runs at 40fps before 8bc067c, and now it runs at 10.

@Isotarge
Copy link
Contributor Author

The good news:

  • No longer crashes or hard-locks
  • luanet works now!

The bad news:

  • Performance takes a big hit, I tested on BizHawk 1.12.2 + DK64 + ScriptHawk (pure interpreter, GLideN64, 320x240) and get 90 FPS, which drops to 45 FPS on the current dev build compiled in release mode with the same core settings
  • The current dev build (with ScriptHawk + DK64) leaks roughly 6MB/sec of RAM, not sure what is causing that. Rewind is definitely disabled, and this doesn’t happen on 1.12.2. There’s no leak on the current dev build without ScriptHawk open
  • forms.settext is strict with datatypes now, needs to be passed a string (could previously be a number)

I've committed some small changes to ScriptHawk to get the DK64 module booting with the current dev build for testing

@zeromus
Copy link
Contributor

zeromus commented Jul 10, 2017

I wrestled with it a bit more and I think I solved all the issues youve pointed out, except for the performance, which I don't promise we can ever fix. It is not appropriate to have an unmanaged scripting environment running with the potential of stomping all over our deterministic cores, is it? That's just life. (but: please test again, maybe i committed a debug dll last time)

At the present I haven't got any clues about why the old luainterface+lua based code was crashing. I was hoping I could see nlua+kopilua crashing in a more comprehensible way and get clues, but it hasn't worked out that way.

The only tiny clue I have is maybe finalizing lua objects was causing the interpreter to get touched (from another thread concurrently with main thread)... but I had previously added code to make the finalizer put work in a queue to get done in the main thread, so.... well, if we investigate any more, lets look extra hard for that.

@vadosnaprimer
Copy link
Contributor

Optional arguments and strings are fixed, but speed is the same.

@zeromus
Copy link
Contributor

zeromus commented Jul 10, 2017

I managed to get nlua+keralua+lua running, but it mysteriously crashes after some time, same as the old luainterface+lua.

While watching this happen i remember I decided maybe there were bugs in luainterface (and therefore in nlua) which made it think it had a reference to something it didnt, such that lua's GC is cleaning it up without the bridge part knowing. Because now I remember getting angry at luainterface and thinking "geeze I should just rewrite this myself entirely". So why doesn't it happen with nlua+kopilua?? I dont know. Maybe because the dead objects still exist in the .net heap, whereas in unmanaged lua, dead objects are pointing at the live stack, where other things may be parked. So it's possible (likely, even) nlua+kopilua is just avoiding some kind of bug (in nlua).

Could we add diagnostics to kopilua to detect when a supposedly lua-erased object is touched from nlua (because it thinks it's still alive?)

But for now, my best effort is a slow lua that doesnt crash.

You guys are on your own if you want faster luas that crash or leak memory -- you need only revert my commits over the past year or so

@vadosnaprimer
Copy link
Contributor

vadosnaprimer commented Jul 11, 2017

I mean what was the actual problem 8bc067c was trying to fix (as in, how do I reproduce it)? I don't even remember having bugs with reloading scripts I created (and I did tons of reloading).

@zeromus
Copy link
Contributor

zeromus commented Jul 11, 2017

You didnt do a few dozen, then. This particular case was 100% rock solid reproducible at about 45 restarts, due to lua's default stack getting blown. For ANY script.

@Isotarge
Copy link
Contributor Author

Strings are back to normal so I reverted the ScriptHawk commit from yesterday. No sign of a memory leak anymore too.

The performance issue is the dealbreaker for me here, several of my users are on weaker machines and even with the much faster LuaInterface + Lua builds ScriptHawk was only just usable for them. With the current state of performance I don’t think I can get my work done on this new backend.

I appreciate the hard work you've put in to implementing the new backend and it certainly was an interesting and worthwhile experiment, but if nothing can be done to improve the performance I’d prefer reverting back to before 8bc067c until the “few dozen reloads” bug can be fixed without crashes, hardlocks, or crippled performance.

If you don’t want to revert I can understand, and in that case I will distribute a custom build with the old backend until performance improves, and continue recommending 1.12.2 if my users don’t want to touch a custom build.

@vadosnaprimer
Copy link
Contributor

vadosnaprimer commented Jul 11, 2017

Pretty much what Isotarge said. Only basic scripts run at full speed, anything more complex can't.

You didnt do a few dozen, then. This particular case was 100% rock solid reproducible at about 45 restarts, due to lua's default stack getting blown. For ANY script.

I took the script I linked above and this build, and reloaded it 500 times in one go. With rewind disabled, in 10k frames it took some 25 MB of RAM, about as much as it takes without reloading. Several other approaches with about 500 reloads total, and no crash.

@zeromus
Copy link
Contributor

zeromus commented Jul 11, 2017

https://pastebin.com/tMUShQV9
@projectrevotpp - 20 or 30 times,
me - about 20 times
me later - about 45 times exactly every time for a very predictable reason of default lua stack size
(immediately after this)
commit 8bc067c and crashes stop

The outcome of this is not going to be "well nobody can reproduce the crash, so let's revert that commit".

If you people can't stand the performance, the outcome is most likely going to be replacing lua with javascript or something managed

But there's still a chance we can debug the problem in the most latest luainterface+lua version with kopilua's help

Another possibility in theory is rewriting luainterface/nlua. But I'm not going to do that. if I go to that much work, it'll be to end up with a secure, managed scripting language.

@zeromus
Copy link
Contributor

zeromus commented Jul 11, 2017

update: check latest build, the dlls were unoptimized. speed may be repaired.

@vadosnaprimer
Copy link
Contributor

After a test I get 25% of luainterface speed is now 50%.

@Isotarge
Copy link
Contributor Author

There's some speed boost in latest master, 45 -> 65 on my machine.

I'm keen on the idea discussed in IRC: Giving the user the choice between backends, the default should be slow + reliable but having the option available to swap back to before 8bc067c with an emulator reboot solves every one of my concerns.

@vadosnaprimer
Copy link
Contributor

Recent commits fix these bugs:
If you need speed and same experience as with 1.12.x, pick LuaInterface in Customize dialog. If you want no memleaks or memcorruption, at the cost of lower speed, pick NLua there.

@Isotarge
Copy link
Contributor Author

Thanks for all of your hard work on this issue

@vadosnaprimer
Copy link
Contributor

And with the current master I can't even get luainterface to crash to memleaks with the method zeromus provided. This has probably been fully fixed finally!

@YoshiRulz YoshiRulz added Meta Relating to code organisation or to things that aren't code Heisenbug Bug that can be reproduced, but not consistently and removed Bug (GoogleCode) Grandfathered label from GoogleCode. Use the other bug labels. Priority: Critical labels Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Heisenbug Bug that can be reproduced, but not consistently Meta Relating to code organisation or to things that aren't code re: Lua API/scripting Relating to EmuHawk's Lua API (not the Lua Console)
Projects
None yet
Development

No branches or pull requests

5 participants