Skip to content

Commit

Permalink
* fix #1919 (LuaUnitScript desync, sleeping threads were being woken …
Browse files Browse the repository at this point in the history
…up in non-deterministic order)
  • Loading branch information
kloot committed Jul 22, 2010
1 parent 7c365ed commit 939ead8
Showing 1 changed file with 11 additions and 4 deletions.
15 changes: 11 additions & 4 deletions cont/base/springcontent/LuaGadgets/Gadgets/unit_script.lua
Expand Up @@ -189,6 +189,8 @@ Format: {
--]]
local sleepers = {}
local numStartedThreads = 0
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Expand Down Expand Up @@ -244,11 +246,11 @@ end
-- Helper for MoveFinished and TurnFinished
local function AnimFinished(waitingForAnim, piece, axis)
local index = piece * 3 + axis
local threads = waitingForAnim[index]
if threads then
local wthreads = waitingForAnim[index]
if wthreads then
waitingForAnim[index] = {}
for i=1,#threads do
WakeUp(threads[i])
for i=1,#wthreads do
WakeUp(wthreads[i])
end
end
end
Expand Down Expand Up @@ -333,12 +335,16 @@ end
function Spring.UnitScript.StartThread(fun, ...)
local co = co_create(fun)
local thread = {
threadID = numStartedThreads,
thread = co,
-- signal_mask is inherited from current thread, if any
signal_mask = (co_running() and activeUnit.threads[co_running()].signal_mask or 0),
unitID = activeUnit.unitID,
}
activeUnit.threads[co] = thread
numStartedThreads = numStartedThreads + 1
-- COB doesn't start thread immediately: it only sets up stack and
-- pushes parameters on it for first time the thread is scheduled.
-- Here it is easier however to start thread immediately, so we don't need
Expand Down Expand Up @@ -735,6 +741,7 @@ function gadget:GameFrame(n)
sleepers[n] = nil
-- Wake up the lazy bastards.
for i=1,#zzz do
local threadID = zzz[i].threadID
local unitID = zzz[i].unitID
activeUnit = units[unitID]
sp_CallAsUnit(unitID, WakeUp, zzz[i])
Expand Down

9 comments on commit 939ead8

@tvo
Copy link
Member

@tvo tvo commented on 939ead8 Jul 22, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I see this commit has no effect?

The third hunk adds one key value pair to the table of a thread, and this value is retrieved in the last hunk, but the value is not used anywhere?

@jk3064
Copy link
Contributor

@jk3064 jk3064 commented on 939ead8 Jul 22, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem lies in code like this:
local thread = threads[co_running() or error("not in a thread", 2)]

You aren't allowed to use table/userdata objects as table-keys in synced code, because they return their address pointer and the table is then sort by those.

@tvo
Copy link
Member

@tvo tvo commented on 939ead8 Jul 23, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are allowed to use them that way if you don't iterate over the table ...

@tvo
Copy link
Member

@tvo tvo commented on 939ead8 Jul 23, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kloot: did you check the script of the involved unit? I think different thread IDs could happen also if the script bases some behaviour on the (synced) RNG. E.g. there are often pieces of script code like `Sleep(100 + random number)'

@jk3064
Copy link
Contributor

@jk3064 jk3064 commented on 939ead8 Jul 23, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| You are allowed to use them that way if you don't iterate over the table ...

But you do, here:
local function Destroy()
if activeUnit then
for _,thread in pairs(activeUnit.threads) do
if thread.container then
Remove(thread.container, thread)
end
end
units[activeUnit.unitID] = nil
activeUnit = nil
end
end

that wouldn't be a problem as long as you don't iterate over thread.container, but you do because:
function Spring.UnitScript.Sleep(milliseconds)
...
local zzz = sleepers[n]
...
thread.container = zzz
...
end

...

function gadget:GameFrame(n)
    local zzz = sleepers[n]
    if zzz then
        sleepers[n] = nil
        -- Wake up the lazy bastards.
        for i=1,#zzz do
            ...
        end
    end
end

@tvo
Copy link
Member

@tvo tvo commented on 939ead8 Jul 23, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh good catch, thanks.

I wrote that Destroy code under the assumption that Remove didn't change the order of the other elements in the thread.container array, but then apparently I later violated that assumption in a seemingly harmless optimization of the Remove function..

I suppose the two easiest fixes are

  • to make all Sleep (and other) calls O(n) (with n number of threads of that unit), by converting unit.threads to an array and always doing a linear search for the correct thread whenever a table lookup based on co_running() is used now (or maintain an array for iterating & a table mapping thread to array index?)
  • to make Remove maintain the relative ordering of all elements that are not removed

@jk3064
Copy link
Contributor

@jk3064 jk3064 commented on 939ead8 Jul 24, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sleepers table is an array, so 2nd option would just mean to switch the Remove function with table.remove.
But table.remove is pretty slow (same as std::vector::erase), still because it is called rarely it may be the better option.

Or you fill a table of all threads that need to be removed in Destroy() and then sort it by their _threadID_s before running them through Remove, but I don't know if that's really faster (depends on the default size of the sleepers array).

@tvo
Copy link
Member

@tvo tvo commented on 939ead8 Jul 24, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It applies not only to Destroy but also to Signal, so I think sorting by threadID could get too slow. Each sleeper[n] array could get pretty big, because it stores all threads of all units that need to be woken up in frame n. Another option is not shifting anything in the arrays and using some kind of marker object to indicate that that slot in the array is empty.

Wrt performance, the .threads[co_running()] lookup happens in every call to WaitFor* which actually waits and every call to Sleep, StartThread, SetSignalMask and Killed. So O(n) lookup in all those might be a bit too expensive. On the other hand average number of threads per unit is probably quite low. (Maybe 2-5 for the most common units?)

@tvo
Copy link
Member

@tvo tvo commented on 939ead8 Jul 24, 2010

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trepan made a patch to Lua that makes hash part of tables with tables/threads etc. as keys completely deterministic, that could get rid of all bugs like this. Should be in the Lua mailing list somewhere..

Please sign in to comment.