New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make game deterministic, so loaded saves always behave the same #1248
Comments
That is a sequence of good ideas; and a number of the prerequisites are also needed to get networking going. Would you be interested in refactoring input into a command pattern so that it could be recorded and replayed as well as outlining the worst of the determinism issues. At present save games do not lead to deterministic replays even with no further input - I haven't spent much time looking into why. |
@TheCycoONE I'm getting over a head cold and up against some deadlines for work. Maybe we can Skype/Hangout/Zoom to go over the tasks for this and find a good fit. I'm fine trying to serialize the input and/or help make it deterministic. I have OSX/Windows/Linux environments available but have not bootstrapped a CorsixTH development environment yet (docs seem good, just didn't have the time). My preferred method would be if we could get machine images I can boot on a spare hyper visor I have (gobs of ram and I think 32C/64T) so that I can run the tests in parallel and reset the environments easily. In an ideal world anyone could grab those and run them in VirtualBox or VMWare Player at home. If we setup a Pritunl VPN we could have the VMs on the same "LAN" which will come in handy for working on the network code. I have used GDB on Solaris in 2012 but I'm not particularly married to a C/Lua debugger. If it's easier to debug on Windows with Visual Studio I'm down with that. I'd like to use the environment "most developers" are using. If it's worth the time, I can help setup the VPN and provide remote access to pristine VMs and we can snapshot them (on actual hardware). If I can get it to PXE boot I can give some core members VPN access but the firewall will be clamped down for outgoing connections. As long as it's under $20/mo in power I don't really mind the noise of running a server under my bed. Random number generator seed? Where should I start reading? Append-only log Data/Query ergonomics? Unless we have a ton of custom lua code we'll need to interact with the data, I'd go the PostgreSQL route (especially with PL/Lua, that'll let you move some CorsixTH code into queries and triggers to normalize the stream). Networking FWIW, NATS would actually be a good foundation for the networking code (simple text based protocol, written in Go, tiny, single binary). It could work on a LAN or over the internet. You could use a STUN server to poke through NAT). I'd argue it makes more sense to bundle a server binary and provide public servers and allow people to run a server on their LAN if they choose. Troubleshooting NAT issues can be a painIf there's something purpose built that might be better. If you wanted to scale a production server, I'd probably just use NGINX (it can do UDP and TCP streams now) and the Lua Streams module. You can easily use C-structs or Lua Tables for state and it'll be really easy to deploy (not as easy as a single go binary, but, it doesn't introduce a new language and easily lets you leverage lua). If not there's a bunch of nice Lua bindings over an event loop suitable for writing a network server. Sorry, again, head cold, so there's a lot to process here, hopefully it makes sense. |
I have always assumed that was down to amount of randomness there is in the code that would not always be repeated when loading a save and could therefore send staff or patients into a different direction. I could be wrong, this is just what I have assumed when trying to replicate some of the issues over the years |
If we provide the same seed to a non cryptographic random number generator: If we call it in the same sequence we must get the same numbers. Assuming it's not timing dependent and we're managing input (repeating a sequence detached from actual user input) shouldn't we be able to achieve determism? From an outsider's perspective being able to achieve (determinism) could possibly resolve a number of bugs in itself by eliminating side effects. |
The idea was that the game would be deterministic. The random number generator is MT, and the current seed is suppose to be saved in the save game and reloaded (I haven't personally tested that). That said, the game currently makes heavy use of floating point; some actions are based on asynchronous events (music track / sound effect ending), and some things are triggered by frames instead of in game ticks. I believe both of those things currently impact the same random number generator, if not other aspects of the game state. There is also the possibility that lua's pair ordering might impact the game, since the order of objects in a lua table is not guaranteed. It's certainly something that needs to be addressed with more rigor than has been done until this point. |
@TheCycoONE I just pulled out an example I could find of where the lua pair ordering affects random state. CorsixTH/CorsixTH/Lua/research_department.lua Lines 377 to 387 in e14d4ff
Between a save and reload the research_policy can result in reordering when processing over a pairs loop and math.n_random calls math.random twice. You'd want something like this to help make it deterministic. CorsixTH/CorsixTH/Lua/research_department.lua Lines 163 to 165 in e14d4ff
There is probably some effort in ensuring the deterministic nature |
Found probably one of the main culprits of ruining the deterministic nature also. CorsixTH/CorsixTH/Lua/game_ui.lua Lines 198 to 201 in e14d4ff
Every GameUI:draw call, two calls to math.random - even whilst paused and no active quake. |
That one is my fault, I think we need two random number generators - one tied to the game, where we save the seed, and one for visuals or other non-behavior cases. Also those calls should be gated :) |
I put some logic in to check shake_screen_intensity, with a hospital it did have a research dept actually researching so that surpised me a little - but only specialisation/improvements were running - so a bit of luck there. Ran for 3 months logging every math.random call. The level was completed so the offers were coming every 3 months, when the current 3 month period ended, and game was paused with offer letter I took a save. Then reloaded that twice in 2 different sessions and just resumed it until the next 3 months. all 40k plus calls matched - ie returned value, source file called from, line of the calling function and function name if possible. Obviously I didn't interact with the game (music was disabled), but announcements/sounds were set to play and obviously no quake. So yes isolating those math randoms for ui/visuals/music randomisation etc, should assist in tracking issues between 2 autosaves as long as the game has run with no user input for the period between 2 autosaves and an issue arose. |
I made a simple change to test the separation out int luaopen_random(lua_State *L)
{
lua_getglobal(L, "math");
lua_pushliteral(L, "lua_random");
lua_pushliteral(L, "random");
lua_gettable(L, -3);
lua_settable(L, -3);
lua_pushliteral(L, "random");
lua_pushcfunction(L, l_random);
lua_settable(L, -3);
lua_pushliteral(L, "lua_randomseed");
lua_pushliteral(L, "randomseed");
lua_gettable(L, -3);
lua_settable(L, -3);
lua_pushliteral(L, "randomseed");
lua_pushcfunction(L, l_randomseed);
lua_settable(L, -3);
lua_pushliteral(L, "randomdump");
lua_pushcfunction(L, l_randomdump);
lua_settable(L, -3);
return 0;
} Then just changed app.lua to seed it with math.lua_randomseed and changed GameUI:draw to use math.lua_random instead and I got some repeatability occuring. I also picked up on some more diversion. There is this in World:onTick CorsixTH/CorsixTH/Lua/world.lua Lines 997 to 1003 in 9de46b7
Which calculates the after 'ticks' date, and if the world we be in a future day (month and year processing also) does related processing. Hospital:onEndDay at least has some calls to random, one not really deterministic, (ui message) the other might be a little more (boiler breakdown) and subsequent function calls refer to random. This means the speed of the game changes the outcome of the random processing. My earlier comment, the game was let run with no inteference at "And then some more", but a period of processing at "And then some more" and "Speed up" at the end of a day will result in changes to the order of the random calculations. Likewise the other one is onEndMonth calculating the future spawn dates. And individual spawn hours are affected as the date is advanced with each onEndDay call. CorsixTH/CorsixTH/Lua/world.lua Lines 1022 to 1024 in 9de46b7
At the faster speeds, the 'hours' on occasion can skip a spawn as the date will be advanced 3 or 4 hours, if onEndDay is say run at the 49th hour (actual end of day), at speed "And then some more", the current hour leading into the above loop will be 2, and if onEndDay calcuates one at 1st (currently there is a bug around the spawn hour too, 0 to 49 are valid spawn hours, but Corsix currently uses a 1 to 50 range) the spawn will not occur. So running at the lower speeds is really the only way to ensure repeatability with the current build. I'm guessing it was done to simplify some things, but means all onEndDay/Month/Year processsing occurs before all the ticks for those entities that day have completed. |
And there is some other code dependent on the hours_per_tick in World:tickEarthquake along with random calls, and since a high game speed calls random in a different order, it alters the outcome. Outcomes like machine damage, might be added before the ticks of other entities would process when run at the lower speeds. |
Hello!
Long time fan looking to see if I can help squash some of the remaining bugs.
Atlassian support uses an internal tool called Hercules to scan log files attached to issues to provide end users with solutions for solved issues and/or help track duplicate issues.
I thought maybe we could get some code quality metrics and track the most problematic code structures by checking the commit history and bug traces by creating a harness that can run save files at a given revision in a given environment (this doesn't seem like it'd be difficult given the availability of binary releases). If we can serialize a "logical input stream" (stream of game input, state mutations normalized between platforms) and use it with a
git bisect
type tool for isolating the bad state changes and/or logical input that triggers a bug. If you can step through a replay frame by frame and mark when the bug occurs you should be able to do a diff of the in game state or step through with a debugger to find the error.Goals:
Inspiration:
There's a conference talk (can't find link) on YouTube outlining how they cleaned up Witcher 3's undrawable geometry using some cobbled together internal tooling to ensure correctness. It reminds us that sometimes it's worth it to spend time on tooling when a task is large enough, especially if that time/effort is expertly applied.
Can I help?
I've used Lua/LuaJIT professionally and tinkered on the ESP8266 but I'm no expert, I simply memorized the differences between Lua and JavaScript and learned about FFI and LuaJIT's performance characteristics versus V8. I haven't written over 1,000 lines of Lua in a single project, however, I'm pretty sure I can help setup/spec an end to end solution for all of the goals above MINUS interfacing with the game/tooling, which will be best done (or specced) by someone familiar with the project.
Is this a "bad idea", unnecessary or otherwise impossible?
Please feel free to "nope" this without explanation. I have no idea what tooling you guys have used to get this far, but, kudos to you for all of your hard work!
Please no dying in the corridors.
The text was updated successfully, but these errors were encountered: