Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

computer freezes after jumping with Warp Drive mod #2598

Closed
KyeDuo opened this issue Oct 24, 2017 · 11 comments
Closed

computer freezes after jumping with Warp Drive mod #2598

KyeDuo opened this issue Oct 24, 2017 · 11 comments
Assignees

Comments

@KyeDuo
Copy link

KyeDuo commented Oct 24, 2017

I use Warp Drive 1.3.34 for MC 1.7.10 with OC 1.7.0.1085 and Forge .1614 I get the following behaviors all in SP:

Using a T3 computer with T3 parts and screen the computer will seemingly freeze and become unresponsive after performing a warp jump of any distance. At the very least the screen freezes. Breaking the screen and replacing it will cause the program to reload restoring responsiveness.

Using a server, T3 parts, remote terminal and jumping, the number key tied commands will not update the screen but the program will still respond to the inputs. letter triggered commands still work fine and will update the screen.

First loading into the world also has this behavior.

@LemADEC
Copy link

LemADEC commented Oct 25, 2017

Using:
1.7.10-Forge10.13.4.1614-1.7.10
OpenComputers-MC1.7.10-1.7.0.1085-universal
WarpDrive-1.7.10-1.3.34.-

In single player:
Create new Creative world
Place an Enantiomorphic Reactor core
Target a block next to it and type /oc_spawnComputer
Install OpenOS, then reboot
Install Reactor core, then reboot
Press 1, expecting page changing => ok
Press 0, expecting page changing => ok
Press f, expecting red message in bottom about unsupported key => ok
After a couple seconds, red message disappears => ok

Press Esc > Save and quit to title
Reload same world

Press 1, expecting page changing => NOK, page does not change (1)
Note: periodic update is still able to update screen as we see green percent appearing
Press 0, expecting page changing => NOK, page does not change (1)
Press f, expecting red message in bottom about unsupported key => ok (2)
After a couple seconds, red message disappears => ok (2)

Break/replace the screen
=> system (partially ?) reboots and recovers

The event (2) and pages (1) handlers are both implemented through an array of functions.
From above results, we can see event registry is restored properly (keys are detected, script is aware that page changes), but page registry is 'corrupted' (display is no longer refreshing).
The main difference is the data structure for those registries. Here's an extract on how they're filled:

local function page_register(index, callbackDisplay, callbackKey)
  page_handlers[index] = { display = callbackDisplay, key = callbackKey }
end

local function event_register(eventName, callback)
  event_handlers[eventName] = callback
end

Event registry is a straight array of functions.
Page registry is a nested array of functions and keys.
When issue is present, the function is still defined (not nil), as seen in the library line 1028.
When issue is present, I don't know if the function is empty or missing some resources.
I've tried to reproduce it without going through a LUA library but didn't succeed.

According to players, using the legacy OpenOS (from OC1.5) with OC1.6 doesn't reproduce the issue.

@LemADEC
Copy link

LemADEC commented Oct 25, 2017

In other words, you don't need to jump, just reloading the chunk or restarting the game is enough.

@LemADEC
Copy link

LemADEC commented Oct 25, 2017

Issue is also reproduced:

  • in dev space with OpenComputers-MC1.7.10-1.6.2-dev
  • in dev space with OpenComputers-MC1.7.10-1.7.0-dev
  • in single player after removing WarpDrive (you just need the scripts on the computer)

@payonel
Copy link
Member

payonel commented Oct 25, 2017

Thanks for all the details!

@LemADEC
Copy link

LemADEC commented Dec 23, 2017

Any hope for a fix on this?

@payonel
Copy link
Member

payonel commented Feb 21, 2018

this does repro for me, you provided a very simple repro, thank you
note that openos is running fine, and the components are still registered and behaving well
you can ^c or alt+^c and kill the warpdrive startup program

I've confirmed all the components retain the same address, and no components are removed nor added to the system when the world reloads -- so proxies should be fine

the part of the screen that correctly updates the percentages ... works, that's interesting
but where is the code that should be refreshing the rest of the screen? I next need to figure out what that code is using, and what those calls aren't working

I set up an all-events listener that sends modem messages to another machine, to monitor behavior when things go bad - i suggest using a similar methods to debug remotely :)

@payonel
Copy link
Member

payonel commented Feb 21, 2018

for example, this rc program broadcasts all signals on port 1

local event = require("event")
local comp = require("component")
local s = require("serialization").serialize
local m = comp.modem

local id

function start()
  id = event.register(nil, function(...)
    local p = table.pack(...)
    if p.n > 0 then
      pcall(m.broadcast, 1, s(p))
    end
  end, math.huge, math.huge)
end

function stop()
  if id then
    event.cancel(id)
    id = nil
  end
end

function status()
  print("broadcast service is " .. (id and "running" or "stopped"))
end

Then i could use event.push to "log" data about different parts of the confused application to figure out what is working or not

@payonel
Copy link
Member

payonel commented Feb 21, 2018

so while debugging ... the reactor core exploding and broke through the bedrock (i'm in a flatworld) ... and i fell to the nether? ... and lost the machines i was testing...

anyways, that's not helping :)

@payonel
Copy link
Member

payonel commented Feb 21, 2018

TLDR: make selectPage a local function:
here: https://github.com/LemADEC/WarpDrive/blob/MC1.7/src/main/resources/assets/warpdrive/lua.OpenComputers/common/usr/lib/warpdriveCommons.lua#L1039
making: local function selectPage, this fixes it

Side note: there is nothing used with the local_env you are returning. Looks like you saw a thing I was doing in the 1.6.x era trying to reduce memory cost, but it turned out to be a bad idea and in 1.7.x things are much better and there is nothing used with the 2nd return of a library. Please feel free to remove your entire local_env table

NOW FOR THE BUG
ok, i've figured this out, and you're not going to believe it, but I believe you've found a bug in our lua state saving and this is very easy to repro

local refresh is not the same upvalue post restore in global functions as local functions

no warpdrive mod needed, just write a little script like this:

local value = 0
function g_fp()
  value = value + 1
end

while true do
  require("event").pull()
  local before_call = value
  g_fp()
  if before_call < value then
    print("value increased", value)
  end
end

That's it
run it, hit keys, see the value increase
exit world
reload world
hit keys, nothing

Now, openos and all is still running just fun. Do a hard interrupt (alt+control+c) to abort the program, and you're right back to shell, everything is fine

The issue appears to be that local values captured (upvalue) in global functions (because g_fp is a global function) do not restore the same reference, the local scope gets a new object, and the global function has its own. It's like its own global upvalue

What is ALSO interesting to me, is that you can write code to detect the desync of the upvalues:

local value = 0
local sync = true

function g_fp()
  value = value + 1
  if not sync then
    print("out of sync")
  end
end

function g2(given_value)
  if given_value ~= value then
    sync = false
  end
end

while true do
  require("event").pull()
  local before_call = value
  g_fp()
  if before_call < value then
    print("value increased", value)
  else
    g2(value)
  end
end

This shows us that the 2 global functions are sharing the SAME upvalue after the restore of the lua state, while the local functions are using a different one.

So. .... are we going to fix this? Probably not at this time. I'll discuss this finding with others, but wow -- what a weird one. I'll leave this ticket open for now

By the way, this isn't an issue with all local values captured in global functions
In my same examples, if I also make a local table and capture it in the functions (globals and locals), suddenly the bug is not hit -- it seems more is packed and correctly restored when there is a table to share

Testing the warpdriveCommons.lua code, when I added a local table and tested its address in the global function and outside -- it was shared, while other local variables were not (specifically, local refresh). So "just add a table" doesn't always fix things, but there are more layers of locals going on in that code path and I didn't take the time to build a full mental model of the methods and tables and cached library issues that could have something to do with this. Nonetheless, it appears local tables are immune for this duplication restored issue.

@payonel
Copy link
Member

payonel commented Feb 21, 2018

This won't be fixed in OC (we don't handle lua state restore in our OC code)
It would need to be fixed in eris: fnuecke/eris#29

@payonel payonel closed this as completed Feb 21, 2018
@LemADEC
Copy link

LemADEC commented Feb 22, 2018

Nice catch!

Sounds like changing the selectPage() method to local will change the refresh variable to only local usage, not global, hence working around the issue at hand.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants