Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash in Alpha 4 due to Assert #409

Closed
gertjanschouten opened this issue Aug 22, 2021 · 14 comments
Closed

Crash in Alpha 4 due to Assert #409

gertjanschouten opened this issue Aug 22, 2021 · 14 comments

Comments

@gertjanschouten
Copy link

Hello,

After beating the boss in Alpha 4, the game crashes within seconds. I tried walking towards the exit, then saving before it crashes, but when I do so, it does not give me any extra time, i.e. the saved game will crash immediately upon loading, leading me to believe that it's about something going on in the level itself, like those moving platforms? Perhaps when they reach a certain state, something goes wrong? It's 100% reproducible. Attached are the savegame and the error message.

The version is 1.5.1rc2.

Error.zip

Keep up the good work, dhewm3 is awesome!

@DanielGibson
Copy link
Member

What operating system are you using?
Can you try if it works better with latest dhewm3?
A build of a recent-ish state for Windows: dhewm3-1.52pre-win32-more-quicksaves.zip
(if you're not using Windows please try to build the latest code from git yourself)

@DanielGibson
Copy link
Member

DanielGibson commented Aug 22, 2021

I wonder if this is the same bug as (or related to) #280

@DanielGibson
Copy link
Member

Ok, I think I can reproduce the assertion with the savegame you provided

@gertjanschouten
Copy link
Author

Ah, good :-) As you can see from the error message, I use Linux. To be more precise: Ubuntu 21.04, kernel 5.11.0-31-generic. If you need me to try anything, please let me know.

@DanielGibson
Copy link
Member

DanielGibson commented Feb 6, 2022

This bug is a pain in the ass, because for some reason the debugger quits when inspecting the entity object in the backtrace.. so I'd have to debug the debugger first :-/

(Not sure if gdb or Eclipse CDT's gdb Integration is to blame, but as far as I can tell gdb doesn't crash but exits more or less regularly with exit code 0)

This already stopped my progress last August, and now I tried again with a new Eclipse version (back in August I also tried several gdb versions), running into the same problem.. I hope I can get myself to look into the issues (first the one in Eclipse/gdb then the one in dhewm) deeper soon, as this is a bug I'd like to fix before releasing dhewm3 1.5.2

@DanielGibson
Copy link
Member

OTOH, some information can be gathered from what dhewm3 prints and the backtrace:

WARNING: script/map_alphalabs4.script(722): Thread 'map_alphalabs4::RideOfDeathPath': teleported 'ride_of_death2_parent' which has the pushing mover 'ride_of_death2' bound to it

idCollisionModelManagerLocal::Translation: huge translation
(... several more of these ...)
idCollisionModelManagerLocal::Translation: huge translation

ASSERTION FAILED!
dhewm3/neo/game/physics/Clip.cpp(968): '0'

Thread 1 "dhewm3" received signal SIGTRAP, Trace/breakpoint trap.
raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
50	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007fffd35c900e in AssertFailed (file=0x7fffd361d210 "dhewm3/neo/game/physics/Clip.cpp", line=968, 
    expression=0x7fffd361d45c "0") at dhewm3/neo/idlib/Lib.cpp:521
#2  0x00007fffd34d06da in TestHugeTranslation (results=..., mdl=0x5555667a9f90, start=..., end=..., trmAxis=...)
    at dhewm3/neo/game/physics/Clip.cpp:968
#3  0x00007fffd34cd097 in idClip::Translation (this=0x7fffd3a177f0 <gameLocal+3614128>, results=..., start=..., end=..., mdl=0x5555667a9f90, 
    trmAxis=..., contentMask=1057, passEntity=0x55555ebaccac) at dhewm3/neo/game/physics/Clip.cpp:1062
#4  0x00007fffd34f8d90 in idPhysics_AF::ClipTranslation (this=0x55555ebad4c4, results=..., translation=..., model=0x0)
    at dhewm3/neo/game/physics/Physics_AF.cpp:7696
#5  0x00007fffd352654c in idPush::ClipEntityTranslation (this=0x7fffd3a17988 <gameLocal+3614536>, trace=..., ent=0x55555ebaccac, clipModel=0x0, 
    skip=0x5555663d8850, translation=...) at dhewm3/neo/game/physics/Push.cpp:695
#6  0x00007fffd3527170 in idPush::TryTranslatePushEntity (this=0x7fffd3a17988 <gameLocal+3614536>, results=..., check=0x55555ebaccac, 
    clipModel=0x5555663d8850, flags=0, newOrigin=..., move=...) at dhewm3/neo/game/physics/Push.cpp:924
#7  0x00007fffd3527aff in idPush::ClipTranslationalPush (this=0x7fffd3a17988 <gameLocal+3614536>, results=..., pusher=0x55555eacaa8c, flags=0, 
    newOrigin=..., translation=...) at dhewm3/neo/game/physics/Push.cpp:1138
#8  0x00007fffd3528c5a in idPush::ClipPush (this=0x7fffd3a17988 <gameLocal+3614536>, results=..., pusher=0x55555eacaa8c, flags=0, oldOrigin=..., 
    oldAxis=..., newOrigin=..., newAxis=...) at dhewm3/neo/game/physics/Push.cpp:1410
#9  0x00007fffd3508201 in idPhysics_Parametric::Evaluate (this=0x55555eacadec, timeStepMSec=16, endTimeMSec=2205936)
    at dhewm3/neo/game/physics/Physics_Parametric.cpp:629
#10 0x00007fffd3344251 in idEntity::RunPhysics (this=0x55555eacb3cc) at dhewm3/neo/game/Entity.cpp:2579
#11 0x00007fffd333fa44 in idEntity::Think (this=0x55555eacb3cc) at dhewm3/neo/game/Entity.cpp:838
#12 0x00007fffd336f5f4 in idGameLocal::RunFrame (this=0x7fffd36a5240 <gameLocal>, clientCmds=0x7fffffffcf00)
    at dhewm3/neo/game/Game_local.cpp:2340
#13 0x00005555556c7a76 in idSessionLocal::RunGameTic (this=0x555555d79b00 <sessLocal>)
    at dhewm3/neo/framework/Session.cpp:2887
#14 0x00005555556c770e in idSessionLocal::Frame (this=0x555555d79b00 <sessLocal>) at dhewm3/neo/framework/Session.cpp:2833
#15 0x000055555566f084 in idCommonLocal::Frame (this=0x555555ce9020 <commonLocal>) at dhewm3/neo/framework/Common.cpp:2439
#16 0x00005555557ec9e2 in main (argc=5, argv=0x7fffffffdf68) at dhewm3/neo/sys/linux/main.cpp:452

Apparently 'ride_of_death2_parent' (which seems to be some kind of physics object) is teleported, and 'ride_of_death2' (which also seems to be some kind of physics object) is "bound" to it (probably connected with a joint or something?).
As far as I understand, 'ride_of_death2' is not explicitly teleported, but is basically dragged along by the physics code, because it's bound to the teleported object - and the physics code seems to consider a movement (translation) by such a huge distance wrong (because by just pushing/shooting an object around it won't move that far from one frame or physics simulation step to the next, I assume).

However, this is just a theory based on the backtrace and the warning logged earlier, I couldn't verify if the assertion is indeed caused by these two entities (or physics objects or whatever they are) because the debugger quits first.. maybe I should try again with "pure" gdb, but I think last time getting to the right place/objects was cumbersome and (theoretically) a lot easier by clicking around in Eclipse's debugger frontend

@DanielGibson
Copy link
Member

DanielGibson commented Feb 7, 2022

Turns out the warning and the assertion are (most probably) not related: when moving the assertion in TestHugeTranslation() a few lines lower (and printing additional info), it says:

huge translation for clip model 0 on entity 1424 'monster_zsec_shotgun_12'
  from (-1133.40 -821.75 -278.39) to (2580.56 3272.25 -278.39)

right before the assertion, so it's some zombie security guard with a shotgun (and not the "ride_of_death" moving platform) that for some reason is yeeted far into the void - that's the other thing, (-1133.40 -821.75 -278.39) is very close to that monster's initial position, but (2580.56 3272.25 -278.39) is somewhere in the void (outside the actual level), no idea (yet) why/how that happens..

@gertjanschouten
Copy link
Author

Yeah, it's a complex map, with all of those moving platforms. Good to see you're making some progress, thanks for all the efforts!

@DanielGibson
Copy link
Member

Bad news: Still no progress on this bug

Good news: While investigating the Eclipse debugger issue I stumbled upon when debugging this, after debugging through Eclipse CDT, gdb and glibc, it turned out to be a kernel bug which I reported: https://bugzilla.kernel.org/show_bug.cgi?id=215611

@gertjanschouten
Copy link
Author

Wow, great effort! Amazing that a dhewm3 bug report leads to a kernel fix by the great Mr. Torvalds :-)

@paulwratt
Copy link

Great find, hopefully thats the end of the GDB issues

@DanielGibson
Copy link
Member

Finally investigated this bug a bit further.

The huge translation of "monster_zsec_shotgun_12" is indeed caused by the huge translation of "ride_of_death2" (which is caused by "ride_of_death2_parent" being teleported by script/map_alphalabs4.script:722 with eRide.restorePosition(); and "ride_of_death2" being dragged along).
Because "ride_of_death2" is a "pusher", idPush::ClipTranslationalPush() is called which collects entities affected by this push with listedEntities = gameLocal.clip.EntitiesTouchingBounds( pushBounds, -1, entityList, MAX_GENTITIES ); and apparently this innocent zombie security guard with a shotgun qualifies as pushable and is standing in the way, so it gets yeeted into outer space.

WARNING: script/map_alphalabs4.script(722): Thread 'map_alphalabs4::RideOfDeathPath': teleported 'ride_of_death2_parent' which has the pushing mover 'ride_of_death2' bound to it

suggests that one should not teleport parents of pushing movers around, so this should be a bug in the map (or its script), not in the C++ code.
No idea why the map shipped like this, even if the assertion rarely triggers (maybe that monster is usually dead when those ride of death parts are moved around?), the WARNING should always have been there..

I'm also not sure what to do about this:

  • Disable the assertion?
  • Disable the assertion only for this case?
  • Try to fix idEntity::Event_RestorePosition() to always teleport the connected things as well? (This would be scary invasive)
  • Try to detect and hotpatch the script to make the "children" of "ride_of_death*_parent" non-pushing while teleported (is this even possible?)
  • ...

@DanielGibson
Copy link
Member

No idea why the map shipped like this, even if the assertion rarely triggers (maybe that monster is usually dead when those ride of death parts are moved around?), the WARNING should always have been there..

Looks like that monster is even dead (in the savegame), but not gibbed, so that ride_of_death thing moves through the cadaver (and apparently gibs it).

Not that it makes a difference, really..

I think I'll just disable the assertion for this case - and probably this whole issue isn't that release-critical anyway, because it only happens if assertions are enabled, i.e. in debug builds.

If it ever turns out that there are more levels with the same issue (pushing movers connected to other entities being dragged around when the other entity is teleported and crashing into things on the way) I might look at this again (and maybe try to teleport the connected movers as well), but for now I hope that this isn't a common mistake

@gertjanschouten
Copy link
Author

Thanks for looking into it!

rorgoroth pushed a commit to rorgoroth/dhewm3 that referenced this issue Apr 8, 2023
In the savegame from that bugreport, "monster_zsec_shotgun_12" was
lying dead pretty much at its spawn point.
script/map_alphalabs4.script moves those "ride_of_death" platforms
around, and at the end of a cycle teleports "ride_of_death*_parent" back
to its starting position - and the "ride_of_death*" bound to it, which
is a pushing mover, just gets dragged along by the physics code and thus
can collide with that zombie cadaver, which then tries to push it along,
causing that assertion in TestHugeTranslation().
This is a map bug - Doom3 even prints a warning:
 WARNING: script/map_alphalabs4.script(722): Thread
  'map_alphalabs4::RideOfDeathPath': teleported 'ride_of_death2_parent'
  which has the pushing mover 'ride_of_death2' bound to it
So I just disable that assertion for this specific case..
Also moved the assertion behind the corresponding warning, so that gets
printed before the assertion kills the game..

Also a small change to CMakeLists.txt that should make enabling
LINUX_RELEASE_BINS after CMake has already been run without it work
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants