Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* fix #1919 (LuaUnitScript desync, sleeping threads were being woken …
…up in non-deterministic order)
- Loading branch information
kloot
committed
Jul 22, 2010
1 parent
7c365ed
commit 939ead8
Showing
1 changed file
with
11 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
939ead8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I see this commit has no effect?
The third hunk adds one key value pair to the table of a thread, and this value is retrieved in the last hunk, but the value is not used anywhere?
939ead8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the problem lies in code like this:
local thread = threads[co_running() or error("not in a thread", 2)]
You aren't allowed to use table/userdata objects as table-keys in synced code, because they return their address pointer and the table is then sort by those.
939ead8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are allowed to use them that way if you don't iterate over the table ...
939ead8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kloot: did you check the script of the involved unit? I think different thread IDs could happen also if the script bases some behaviour on the (synced) RNG. E.g. there are often pieces of script code like `Sleep(100 + random number)'
939ead8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| You are allowed to use them that way if you don't iterate over the table ...
But you do, here:
local function Destroy()
if activeUnit then
for _,thread in pairs(activeUnit.threads) do
if thread.container then
Remove(thread.container, thread)
end
end
units[activeUnit.unitID] = nil
activeUnit = nil
end
end
that wouldn't be a problem as long as you don't iterate over thread.container, but you do because:
function Spring.UnitScript.Sleep(milliseconds)
...
local zzz = sleepers[n]
...
thread.container = zzz
...
end
939ead8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh good catch, thanks.
I wrote that Destroy code under the assumption that Remove didn't change the order of the other elements in the thread.container array, but then apparently I later violated that assumption in a seemingly harmless optimization of the Remove function..
I suppose the two easiest fixes are
939ead8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sleepers table is an array, so 2nd option would just mean to switch the Remove function with table.remove.
But table.remove is pretty slow (same as std::vector::erase), still because it is called rarely it may be the better option.
Or you fill a table of all threads that need to be removed in Destroy() and then sort it by their _threadID_s before running them through Remove, but I don't know if that's really faster (depends on the default size of the sleepers array).
939ead8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It applies not only to Destroy but also to Signal, so I think sorting by threadID could get too slow. Each sleeper[n] array could get pretty big, because it stores all threads of all units that need to be woken up in frame n. Another option is not shifting anything in the arrays and using some kind of marker object to indicate that that slot in the array is empty.
Wrt performance, the .threads[co_running()] lookup happens in every call to WaitFor* which actually waits and every call to Sleep, StartThread, SetSignalMask and Killed. So O(n) lookup in all those might be a bit too expensive. On the other hand average number of threads per unit is probably quite low. (Maybe 2-5 for the most common units?)
939ead8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trepan made a patch to Lua that makes hash part of tables with tables/threads etc. as keys completely deterministic, that could get rid of all bugs like this. Should be in the Lua mailing list somewhere..