-
Notifications
You must be signed in to change notification settings - Fork 759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The current RNG implementation generates a negative number from a specific seed #2206
Comments
|
|
Had a look at the original executable this arvo and the reversed code from the initial commit of this repository almost matches the base game implementation, the only difference is the way certain types are interpreted. It looks like Diablo was originally compiled using Borland C++ but at some point they switched to their own rand() implementation. Borland C++ provides two random number functions based off an LCG engine. lrand() masks off the most significant bit, returning a result in the interval [0, 2^31). Because this set of LCG factors produces a fairly cyclic bit pattern in the low bits, rand() discards the low half of the result and masks off the sign bit (by performing The RNG used by Diablo attempts to provide both functions but seems to be incorrectly implemented. As both rand() and lrand() mask off the MSB, the Diablo implementation attempts to do the same thing by interpreting the state as a signed integer and negating it. This fails in the case noted in this bug report. The logic to determine when to discard the low bits is also ill-formed. An upper bound of 2^16 - 2 is used (which is already an off by one error) but both Borland rand() and the Diablo version return a max value of 2^15 - 1, just over half of this limit. Using mod instead of bit masking also allows the sign bit propagation to continue which introduces a tiny bit of bias. Each value in the interval [1, 2^15 - 1] has a 2^17 in 2^32 chance of being returned, 0 is slightly less likely at 2^17 - 1 in 2^32, and -(2^15) has a 1 in 2^32 chance. If it's provable that a limit lower than 2^15 is always used then we could fix this without breaking compatibility. Regarding the LCG implementation, I suspect whatever tool was used to generate the code in the initial commit incorrectly interpreted one of the operations in the core _rand() function. The engine state is an unsigned value which uses the three argument form of the x86 The code in its current form is inconsistent about whether seeds are signed or unsigned, with some callers using int, some using uint, and some using int16 values. Signed to unsigned conversion is well defined, but unsigned to signed is implementation defined, so when the seed is passed to these callers or set on the structs it's not guaranteed the values will be interpreted correctly. I hope to open a PR sometime in the next few days updating this to avoid implementation defined behaviour. |
Very impressive investigative work @ephphatha . I'm always amazed at these ultra low-level details that I'd absolutely never dive into in a higher level language 🤣 It is kind of a shame to me that we have to dedicate so much effort for such a "common" thing like "generating a random number". I don't expect DevilutionX to break compatibility here at any point, but I'm certainly looking forward to do doing so in a mod and getting rid of this insanity altogether in favor of a more standard approach using known libraries. I also assume it would be possible to "fix" this by forcing the function to re-execute "until a non-negative value is returned". Would be slightly less efficient per call (due to the added |
Found a nice example of how this bug can give unexpected behaviour. The following code is slightly modified from quests.cpp:InitQuests() to show the steps involved. Using a modified save where glSeedTbl[15] is one of the values leading up to the troublesome state you can reliably trigger out of bounds reads and writes. On my machine (running a debug x64 build with msvc) this does not crash the game but it does trample some region of memory and results in all quests in QuestGroup1 remaining active. if (!gbIsMultiplayer && sgOptions.Gameplay.bRandomizeQuests) {
vanilla::SetRndSeed(glSeedTbl[15]); // glSeedTbl of 988045466 results in int_min after two rounds
if (vanilla::GenerateRnd(2) != 0)
quests[Q_PWATER]._qactive = QUEST_NOTAVAIL;
else
quests[Q_SKELKING]._qactive = QUEST_NOTAVAIL;
int32_t questGroupIndex = vanilla::GenerateRnd(sizeof(QuestGroup1) / sizeof(*QuestGroup1)); // questGroupIndex is -2, expected [0, 2]
int questId = QuestGroup1[questGroupIndex]; // read OOB, questId is -65536 in this test
quests[questId]._qactive = QUEST_NOTAVAIL; // write OOB. All quests in group 1 are still active, expected Q_GARBUD or Q_BUTCHER to be deactivated with this seed.
quests[QuestGroup2[vanilla::GenerateRnd(sizeof(QuestGroup2) / sizeof(int))]]._qactive = QUEST_NOTAVAIL;
quests[QuestGroup3[vanilla::GenerateRnd(sizeof(QuestGroup3) / sizeof(int))]]._qactive = QUEST_NOTAVAIL;
quests[QuestGroup4[vanilla::GenerateRnd(sizeof(QuestGroup4) / sizeof(int))]]._qactive = QUEST_NOTAVAIL;
} A safe fix (that modifies behaviour) would be to transform int_min to 0 in the RandomNumberDistribution that performs int32_t questGroupIndex = vanilla::GenerateRnd(sizeof(QuestGroup2) / sizeof(int));
if (questGroupIndex >= 0)
quests[QuestGroup2[questGroupIndex]]._qactive = QUEST_NOTAVAIL;
// else leave all quests active |
One thing I should've realised earlier is INT_MIN is a power of 2, so calling This makes code like the barrel spawning function safe in all cases because it uses devilutionX/Source/objects.cpp Line 564 in 3f69faa
The first example in this bug report is safe as well 😅, the second could fail but not the specific situation I described. |
I suggest fixing this in a non-atomic fashion, since I suspect that it would take a lot of time and effort. First, we could do a project-wide rename GenerateRnd to GenerateRndUnsafe, and implement GenerateRnd to essentially return abs(GenerateRndUnsafe()). This would effectively make it deprecated and place a "dunce cap" on everything that uses it, telling us where attention is needed. Next, we could look at every instance closely and see what can be done about it:
If it produces weird side-effects that we can't get rid of without breaking saved games, we'd have to look at it on a case-by case basis. We could try to cleanly mimic the weird side-effects s.t. the behavior follows clearly from the code and/or try to blacklist the problematic seeds. I have no idea to what extent this is possible. |
I like the idea of this |
Agreed, that's part of the reason I proposed instanced generators in #2261 as it's relatively easy to track calls to GenerateRnd() between calls to SetRndSeed() and then isolate those sections of code.
In that PR I've marked unsafe functions as deprecated with references to safe alternatives (using doxygen comments and namespaces). While I haven't provided a safe version using the original engine I'd avoid using a further call to abs() to handle the negative case. It would be better to clamp to 0 for the negative result as it reduces bias slightly. @qndel identified a lot of areas that don't rely on repeatable sequences in #2084, these can be freely replaced with a generator using standard library concepts from
As you say handling this appropriately depends on the use. Using the quest pool example from above, I've implemented a Cases like when Blacklisting seeds is something I would prefer to avoid at all costs, given the way level generation works the only difference I expect from devilutionX compared to vanilla is a level that always crashes vanilla to be playable in devilutionX. |
Just don't forget that the active quests affect the level layout :) |
Important information
Bug present in all DevilutionX versions since the initial commit on any compiler with similar implementation defined behaviour as msvc
Describe
Calling
GenerateRnd(int)
with a seed of 1457187811 (either supplied as the seed from a call toSetRndSeed(int)
immediately prior or reached naturally from calls toAdvanceRndSeed()
) results in the next call to eitherAdvanceRndSeed()
orGenerateRndSeed(x)
(where x is any value in the range [2, INT_MAX]) generating a negative number.In instances like this where the number is used in a switch statement none of the cases will be taken, but the program will continue as normal.
devilutionX/Source/drlg_l1.cpp
Line 1602 in 1421808
There are instances where a random number is used as an array index where the possible values are still within bounds. For example
PlaySfx
callsRndSfx
to get a random sound from a certain category. The lowest value that could be returned is 8 (corresponding to the effect PS_LGHIT1) when called with PS_SWING (expected values are either 9 or 10, PS_SWING or PS_SWING2)devilutionX/Source/effects.cpp
Line 1256 in 1421808
Other instances will lead to reading a negative array index. The code which looks for a free item space is one example, this code can generate a point with an x or y position of -32 (16 + -48) which is immediately used as an index into
dMonster
devilutionX/Source/items.cpp
Line 457 in 1421808
devilutionX/Source/items.cpp
Line 437 in 1421808
To Reproduce
Edit a save or use breakpoints to seed the RNG before one of these call sites. If there's a save with a known seed leading to a crash when generating a dungeon that would be great, I haven't tried to craft one yet.
Expected behavior
I would expect non-critical calls to the RNG to generate within the expected range, so situations like
PlaySfx
should always play one of the expected sounds.Critical calls to the RNG should mimic the same behaviour if it didn't lead to a crash. Using the first example to keep compatibility with old saves/original clients bypassing the defined cases in the switch statement is desirable.
I am unsure what the best approach is for critical calls that lead to a crash/reading out of bounds. If a seed leading to this behaviour always crashes the game when trying to enter the next floor of a dungeon, fixing it would be desirable so that the seed is actually playable. If it's possible to recover and a second attempt generates a different dungeon then maybe not?
Additional context
I have not done an exhaustive search, the lines I linked were the first examples I found illustrating the different ways I could see this case causing odd behaviour. There may be additional ways for the game to crash or give unexpected results.
I also have not tested this on the original game yet, if I get the time to craft a save that reliably crashes I'll update this issue.
The text was updated successfully, but these errors were encountered: