New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use recastnavigation for pathfinding (#2229) #1633

Merged
merged 97 commits into from Oct 30, 2018

Conversation

Projects
None yet
@elsid
Contributor

elsid commented Mar 15, 2018

Solves #2229.

Should replace #1542.

To build use recastnavigation from repository.

Tasks and decisions to be done:

  • When build NavMesh? On cell preload and cell load in background.
  • Load Navigator configuration from settings.cfg.
  • Use tiles to build navmesh `. Allows actors to start find path faster. Possibly to do in parallel. Allows caching.
  • Visualize NavMesh. togglenavmesh command.
  • Visualize actors paths. toggleactorspaths command.
  • Split RecastMesh into tiles. Reduces time to build single navmesh tile.
  • Limit number of tiles to add to NavMesh. Important when number of loaded cells more than 9. There is a physical limit in recastnavigation how much polygons and tiles could be in navmesh. Polygon identifier contains tile number and polygon number inside tile 22 bits total. Now 10 bits used for tile and 12 for polygon. Could be increased if recastnavigation would use 64 bits identifiers (need a lot of changes).
  • Support changing collision shapes. Navmesh updates when object changed or transformed.
    • Animated
    • Transformed (doors)
  • Support avoided areas (task #1724)
  • Support swim (water). Navmesh surface with specific water flag. Possible to forbid actors to swim:
    • Exterior cells
    • Interior cells
  • Support doors. Allow actors to find path through closed doors:
    • without teleports
    • with teleports (who can use it?)
  • Use different navmesh with half extents for different actors:
    • Exterior cells (takes too much time to build navmesh, need optimization)
    • Interior cells
  • Different settings for interior and exterior cells. There is a restrictions for that. Tile size depends on cell size. Navmesh can't contain tiles with different sizes, so different navmeshes are required to use different cell size for interior and exterior. Probably will not be possible to make off mesh connections for teleports (needs research).
  • Documentation for settings
    • Comment for each option in settings-default.cfg file
    • docs/source/reference/modding/settings/navigator.rst document
  • Cache NavMesh tiles data
    • Select key generation approach:
      • Use btCollisionShape user pointer to store original memory address or counter value. Small size. Possible collisions. Surrogate key.
      • Use RecastMesh binary data. Requires ordered build of RecastMesh. Large size. Persistent key.
      • ???
    • Select limit cache size approach:
      • Limit by total NavMeshData size. Allows to limit by used memory. Too large objects possible could not fit cache size in some specific cases.
      • Limit by number of items. May lead to large variance of used memory.
      • ???
    • Select cache item replacement method:
      • Replace least recently used item. Requires external lifetime handling of a NavMeshData value.
      • Replace currently unused least recently used item. Allows internal lifetime handling of a NavMeshData value. Requires to hold used item externally.
      • ???
  • Find optimal parameters values
  • Unit tests
  • Consider actors as obstacles (implemented with some issues here, disabled now), (see #1633 (comment))
    • Find out how to communicate openmw physics and dtCrowd physics.
    • Use dtCrowd for multiple navmeshes.
    • Handle agents number overflow.
  • Support jump
  • Build NavMesh tiles data on cell preload
@akortunov

This comment has been minimized.

Contributor

akortunov commented Mar 16, 2018

I am not sure if popular Linux distros contain recastnavigation package. If not, OpenMW will be harder to maintain with new dependency, especially we will use custom patches.

UPDATE:
builds fine with latest commit, helps with bug #1997 and bug #3524.

@elsid

This comment has been minimized.

Contributor

elsid commented Mar 16, 2018

I agree, but there are some solutions. We can include recastnavigation as submodule. Custom patches also bad idea, and I've made pull requests for those changes to eliminate this. Of course, it is always possible to go back to master version. For now it's just more convenient to make it this way.

@JDGBOLT

This comment has been minimized.

JDGBOLT commented Mar 16, 2018

Well, was doing some testing of this pull request within my game and it's pretty dang impressive. It definitely will need work on some sort of caching mechanism or whatever, as those load times when changing cells are pretty distracting. Perhaps might be better to generate them once when loading the cell, with it having basically a cache stored for each cell which keeps track of the order of mods affecting that particular cell. Also just experienced a fatal error seg fault within it. Kernel log says within OpenAL but it was during one of the parts where it was generating, as the sound just all of a sudden paused. Maybe OpenAL was expecting output from the game but as it was completely paused while doing the generation it didn't get the output it needed or something. Will try to recompile with debugging symbols and see if it does it again, running it within GDB. But I have to admit this seems much more impressive than the #1542 pull request, as that one would constantly be regenerating the paths as you were moving and was causing a lot of stuttering as you get further from the enemies. Was playing around outdoors with a kwama forager and it was really cool seeing it not being stupid, taking actual intelligent paths and seeming to always get to the player. As I said, the only diminishing part of the experience is the long load times when changing cells, will have to do more testing while indoors but I'm definitely liking where this is going so far. Was pretty magical to see Morrowind AI not being dumb and actually navigating around obstacles. Will do further testing but good work so far!

@JDGBOLT

This comment has been minimized.

JDGBOLT commented Mar 16, 2018

All right, it's been consistently crashing and segfaulting, in different areas, another time in OpenAL, one time in OSG as it is trying to run a single pass render. Though it seems the commonality is related to starting a thread, I'm going to try rebuilding without this patch just to make sure that it's not something on my end, but it definitely seems to be maybe related to some thread altering memory it shouldn't be, or something like that.

@ghost

This comment has been minimized.

ghost commented Mar 16, 2018

Drools

That's so cool. Sounds like all of our woes with AI and pathfinding will be fixed for good.

As you noted, the main question is when to build the navmesh. When I thought about recast a while back, I thought that obviously we'll save navmeshes with the editor and that way recast can only be a post 1.0 feature because we need to break the ESM file format to include navmeshes. But now that I think about it again, prebuilding the navmesh into the files is probably a bad idea. Different mods can add different objects into a cell, and you can't build the final navmesh until you know all of the mods. Plus, the system will have to somehow cope with objects that are placed or disabled at runtime (e.g. the faction strongholds) and rebuild the cell's navmesh then. So in my mind the navmesh should probably be built at runtime. Obviously the 2-3 seconds extra loading time on each cell change are too much though. Can the navmeshes be built in a background thread and then the AI could fall back to the 'dumb' (current) pathfinding implementation as long as the navmesh is not built yet (or use the previous navmesh in the case where we already have one, but need to update it to account for some disabled/enabled objects)? Is recast modular/threadsafe enough to accomplish that? If not, could we run two separate instances of recast and swap them every time the navmesh has finished building?

@JDGBOLT

This comment has been minimized.

JDGBOLT commented Mar 16, 2018

This is especially impressive when I went in and basically deleted all the nav meshes in Morrowind, the AI is surprisingly capable of navigating around the interiors, other than the annoyance of the game crashing like every couple minutes with segfaults. The two main areas I was seeing crashes was with libopenal and creating a thread, and osg when creating a thread to try to render. There are some corner cases where the ai just falls over, such as anything needing jumping to reach, anything over water, the AI hates trying to go into water, and also sometimes it tries to choose a closer path to the player through a wall and basically just runs into the wall. There are also some points where the ai will stop at the wrong place when trying to do spells or whatever and will basically have the spells ram into objects and not reach the player.

But otherwise the AI will do what it's supposed to, trying to navigate to the player in order to attack. Also by removing the nav meshes the wandering/patrolling is kind of broken, since they no longer just go along the path grids they will still try strolling around a bit, but like sometimes the guards can go outside the walls in balmora or whatever. But again, this is with every navmesh in Morrowind deleted using openmw-cs, and when starting combat they will try to navigate around remarkably well considering there is no hand placed stuff, just whatever recast pregenerates in the like 5 seconds or whatever when entering a new cell.

Honestly I would say pregeneration should be when starting up the game for the first time with a new load order, as the generation times are pretty dang painful when running around, especially outdoors, the game will constantly be pausing and stuttering when running around, and when it's doing the pregeneration the game basically just stops 100%, no sound or anything. Also I have encountered when the ai is trying to do a path sometimes the game will also just completely pause while it's doing it's calculations. Interiors might potentially be fine with doing the pregeneration when first entering the cell, you kind of expect that it might take a moment to load the cell. But yeah, there really needs to be a caching system in place, hopefully that just takes into account the plugins that affect that cell, and in what order, that way only massive changes to the cell are necessary for the generation. And while very impressive without the help of the nav meshes, the nav meshes still are a bit better in regards to getting around, as sometimes the characters will be skirting a wall or whatever when navigating, rather than going in a straight line.

But otherwise, great work as far from a player experience, only the pauses to the game and the crashing, and a few edge cases that need to be fixed up tarnish the experience.

@JDGBOLT

This comment has been minimized.

JDGBOLT commented Mar 17, 2018

Just for an example of the crash that I experience with this patch applied: https://pastebin.com/5d5CN0sf , the other I notice being somewhere in OSG, but it also was crashing with the start_thread() of libpthread, so I'm guessing there is some sort of thread contention or something.

@kcat

This comment has been minimized.

Contributor

kcat commented Mar 17, 2018

Smells to me like memory corruption (buffer overrun, or use-after-free or something). I'd suggest compiling with Clang's memory or address sanitizer, or using Valgrind, to find where invalid writes happen.

@JDGBOLT

This comment has been minimized.

JDGBOLT commented Mar 17, 2018

All right, after running OpenMW through valgrind, and suffering through 2fps performance running around with NPC's until it crashed, I think this should hopefully help with tracking down the crash: https://pastebin.com/sA2QxNHi

Edit 1: Looks like valgrind is specifically complaining about this line: https://github.com/OpenMW/openmw/pull/1633/files#diff-2318e0517898da5f974fd390a55511c9R1216

Probably moving/accessing some memory it shouldn't be or something like that.

Edit 2: One other thing I have encountered is where the game will just freeze and have 100% cpu utilization for a thread and will call that function listed above lots of times and begin just gobbling lots of memory. As far as what is calling it, not sure, as I just put in a cout for that part of the function and it was doing it constantly.

Edit 3: Just thought I would update and put in the memmove command which caused the game to segfault, which was memmove(0x563294c0e3b0 + 2, 0x563294c0e3b0 + 1, 18446744073709551615 * 4), with the values being:
path = 0x563294c0e3b0
req = 2
orig = 1
size = 18446744073709551615
Not 100% sure if it is because req = 2 or size = some huge number, will try plugging those in and seeing what causes the crash.

Edit 4: Well, plugging in the numbers into the function, it was the size being a huge number that caused the crash, and I would guess it is related to the line https://github.com/OpenMW/openmw/pull/1633/files#diff-2318e0517898da5f974fd390a55511c9R1214 .

Edit 5: A bit more debugging done, looks like maxPath becomes 1 and the other value to 2, which when it reaches that point it basically does 1-2 and creates the maximum negative number as it is unsigned. An example of the calculation working is req(1) + size(0) > maxPath(2) , but then right before the crash req(2) + size(0) > maxPath(1), size = -1 and memmove(0x55f9bc2342c0 + 2, 0x55f9bc2342c0 + 1, 18446744073709551615 * 4) .

@greye

This comment has been minimized.

Contributor

greye commented Mar 17, 2018

@scrawl
Recast is able to handle dynamically added static objects (huh) in the world using tiling and cutting. I.e. when objects being added it is possible to just cut wholes in adjacent navmeshes, when object being removed it is possible to recreate just an affected tiles in realtime. Therefore it is quite appealing to prebuild navmeshes for at least main files.

@elsid

This comment has been minimized.

Contributor

elsid commented Mar 17, 2018

@JDGBOLT, thanks for feedback and investigation about memory corruption. Actually, code of makeSmoothPath and underneath is taken from recastnavigation demo. I will try to fix it.

@JDGBOLT

This comment has been minimized.

JDGBOLT commented Mar 17, 2018

@elsid Just to let you know did a bit of testing in the spots where I could get it to crash every single time, and running around a bunch and everything, and not a single crash, so great job on that. One thing I notice is that the pathfinder seems to have a lot of problems when the player is on a different vertical position than the creature/npc, and also doors can confuse the heck out of the pathfinding. When the npc/creature is following closely behind the player usually they can keep up with the player in navigating around, but say if you move too fast or whatever, they can do things like being above/beneath the player or in another corridor then say they will let the player go this time or whatever (basically enter the flee state) and stop combat or whatever.

One other thing I notice is that the aiwander and other ai subsets other than aifollow seem to be broken with this, as they don't seem to use the navmesh, probably just not wired in to use it. Other than that, water can also confuse it as it seems to be very hydrophobic. There also seems to be some issues with pathfinding around in areas with lots of vertical geometry, it's fine say outdoors going up a stair or whatever, but indoors with lots of stairs and vertical components it confuses them horribly. Most likely the recastnavigation demo code wasn't designed for massively overlapping environments, so might need to do our own customizations to that code.

@ghost

This comment has been minimized.

ghost commented Mar 17, 2018

Most likely the recastnavigation demo code wasn't designed for massively overlapping environments, so might need to do our own customizations to that code.

It could be the scale of the Morrowind world is confusing recast, if it's used to a scale of 1 unit = meter. Morrowind is different by around a factor of 70. Adjusting the parameters that elsid linked in the first post will probably help with that.

Recast is able to handle dynamically added static objects (huh) in the world using tiling and cutting. I.e. when objects being added it is possible to just cut wholes in adjacent navmeshes, when object being removed it is possible to recreate just an affected tiles in realtime.

Nice. That should simplify things a lot for the scripted objects.

Therefore it is quite appealing to prebuild navmeshes for at least main files.

Ok. But we can't change the ESM format yet, nor distribute morrowind files, and we can't expect the user to run something that takes hours to complete before they can actually play (just a wild guess, 2000 cells x 3 seconds = ~2 hours).

If it really just takes a few seconds per cell, we could build the navmeshes in the preloading thread along with the other preloading procedures (textures, meshes, etc.) here. In that case we probably want to increase the default of preload num threads because building the navmeshes would be CPU intensive and benefit from an extra thread, unlike the asset loading which is often bottlenecked on I/O. Or even run a separate thread just for building navmeshes. For the best performance the asset loading and navmesh threads need to add objects in the same order so that the navmesh thread won't stumble on any non-loaded objects. As long as navmeshes don't take significantly longer than loading meshes/textures does, the preloader can just do both at the same time and there should be a very low risk of stutters on cell transitions. Care to give it a try @elsid? I'll try to give some hints:

  • Most of OpenMW is not thread safe, but components/resource mostly is.
  • Currently the preloader just gets a list of models to load. We'll also need to know their position, rotation and scale to be able to load the navmesh. This probably means the code path for PreLoaded CellStores needs to be retired or adjusted because in PreLoaded state the CellStore only knows its list of objects. We might also want to know the MW-type of each object because some object types never use collisions even if their model includes it.
  • If we want to keep the prebuilt navmeshes in a cache so they don't have to be loaded more than once per session, we could make a new ResourceManager. That will also be a convenient place to actually store the prebuilt meshes for the main thread to retrieve them. I don't recommend storing a cache on disk though, as we'll have troubles making sure that it's still valid.
@JDGBOLT

This comment has been minimized.

JDGBOLT commented Mar 17, 2018

One thing I also notice when running around in the exteriors is that it seems the bulk of the time when loading a new cell is actually the removing of the old mesh or something, it scrolls quite slowly through the "NavigatorImpl::removeObject object=0x55f55640b2d0" lines but the add object component is practically instantaneous. So it might be freeing of the old assets or something from the navmesh is what is taking the most time?

Edit: Well, as far as the addobject/removeobject messages go, there is still the second or two pause where it does absolutely nothing that I am guessing is the generating of the navmesh. And honestly I think we should be caching the navmeshes in local file storage so even after the session it doesn't need to regenerate it. As even in a background thread it would be doing a lot of work that it shouldn't need to be doing. And if possible maybe just have a checksum or something of the entire load order that is stored along with it and it can be regenerated if the checksum is wrong. Even could make it so that only the plugins that have records in that particular cell being stored so we don't invalidate the cache more than we have to.

Edit 2: One thing I also notice is that even though the pathfinding process itself usually doesn't affect the game at all, there are some times where all of a sudden the game will just completely pause for a second or two as it is doing the path calculation. Can notice it when say I have enemies chasing after me and I enter another cell, as soon as I get out again and it starts the combat that initial pathfind to me can freeze the game for a bit.

Edit 3: I think one thing I also notice is that when it adds objects to the calculation for the navmesh, I'm not 100% sure if it's culling objects that have no collision mesh, which I think would be a great optimization. Playing around with this with a grass mod, and I think a lot of the objects it's finding when generating the mesh is these grass objects, and probably one thing that makes it so that pathfinding takes so long as it's trying to find paths around these meshes that have no collision. I could be wrong of course, just laying out what I am experiencing.

Edit 4: All right, got the obj exporting working for OpenMW so I could see examples of cells where I know there is a problem with an npc pathing, I think really to get this to work better just need to fine tune the parameters given to recast and also support some of the functions already in the library, can see support for doors or swimming or whatever, even jumping, which I admit would be pretty cool if npc's could try jumping to get you.

Edit 5: Also thought I would mention something that someone on the morrowind modding discord mentioned, and that the obj file output via the writeobj function is mirrored, which maybe causing some problems?

@psi29a

This comment has been minimized.

Member

psi29a commented Mar 17, 2018

I googled a bit and couldn't come up with any pre-existing debian (or rpm) packages for Detour (recastnavigation). Anyone else know of any or is this something I'll have to package and get downstream to Debian and Ubuntu?

@greye

This comment has been minimized.

Contributor

greye commented Mar 17, 2018

@psi29a, afaik there is none, sorry.

@elsid

This comment has been minimized.

Contributor

elsid commented Mar 17, 2018

One thing I notice is that the pathfinder seems to have a lot of problems when the player is on a different vertical position than the creature/npc, and also doors can confuse the heck out of the pathfinding.

It would be great to have save files to reproduce all of such cases.

It could be the scale of the Morrowind world is confusing recast, if it's used to a scale of 1 unit = meter. Morrowind is different by around a factor of 70. Adjusting the parameters that elsid linked in the first post will probably help with that.

I already use scale.

If it really just takes a few seconds per cell, we could build the navmeshes in the preloading thread along with the other preloading procedures (textures, meshes, etc.) here. In that case we probably want to increase the default of preload num threads because building the navmeshes would be CPU intensive and benefit from an extra thread, unlike the asset loading which is often bottlenecked on I/O. Or even run a separate thread just for building navmeshes. For the best performance the asset loading and navmesh threads need to add objects in the same order so that the navmesh thread won't stumble on any non-loaded objects. As long as navmeshes don't take significantly longer than loading meshes/textures does, the preloader can just do both at the same time and there should be a very low risk of stutters on cell transitions. Care to give it a try @elsid?

Thanks for the hints. I will try to build navmeshes in cell preloader.

Update: Can't produce game situation when this path executes. Any ideas?

@kcat

This comment has been minimized.

Contributor

kcat commented Mar 18, 2018

Therefore it is quite appealing to prebuild navmeshes for at least main files.

The main problem is that it's too easy to require rebuilding anyway. A mod can add to or remove from a cell that invalidates the navmesh. Worse, two mods modify a cell's terrain but only the early one contains a navmesh edit, which makes it use the navmesh of one mod with a terrain of another mod.

@JDGBOLT

This comment has been minimized.

JDGBOLT commented Mar 18, 2018

@elsid Hey, sorry that this took so long to get back to you, had some things I had to take care of today so was away from the computer for a chunk of time. Here are some test saves that I put together of situations that I have found will confuse the AI. It should only be dependant on the Morrowind.esm/expansions so it shouldn't have anything conflicting with it. This was just from a few minutes of work, though also say the vivec sewers will confuse the rats as they will refuse to swim and the like.
recast-test1.zip

In there are 3 files, the enemies running into a wall is self explanatory, they can't seem to be able to path to the player and are just running through the wall, I think it's related to the door that is in the way, as with the obj file generated the door is a solid wall so they are reverting to just running in a straight line, though have also noticed it with some other circumstances though don't remember the exact situations off the top of my head. The other 2 saves are related to the npc being unable to path to the player at all. In the lighthouse one if you select the npc and say startcombat player, she will initially start the combat but then will say something along the lines of "Help!" or whatever, the fleeing part. I think this one is related to perhaps the settings for the navmesh generation, as I notice in the obj file that it's unable to create a path for the stairs in the default settings. The final one is an npc is trying to follow the player but is unable to path to them, if you do twf you should be able to see them above you running around in circles. These are the 3 that immediately come to mind and are quick to setup a save of the situation. Hopefully this helps with the testing, will be doing some testing myself of getting the settings used for the navmesh generation and be able to view it in the demo program which I just got working.

@psi29a

This comment has been minimized.

Member

psi29a commented Mar 18, 2018

Worse than I thought, it uses premake5, which apart from also not being in Debian (or any other distro), it is also considered 'alpha' code. Crap. :( So that's two new packages...
What do you guys think if I switched the repath project over to cmake and send a PR?

@JDGBOLT

This comment has been minimized.

JDGBOLT commented Mar 18, 2018

Just so you know @elsid already converted the project to cmake and sent in a pull request, you can find the commit listed in the first post at elsid/recastnavigation@3686e06 .

@psi29a

This comment has been minimized.

Member

psi29a commented Mar 18, 2018

I must have skipped that, the commit itself leads me to: "Unconditionally initialize rcCompactHeightfield borderSize" which had nothing to do with cmake. Found the PR: recastnavigation/recastnavigation#310 That'll make my job easier! Thanks. :)

For the time being, I'll go about creating a WIP Debian package and throw it up on the PPA to save everything time if they are using Ubuntu/Debian.

@JDGBOLT

This comment has been minimized.

JDGBOLT commented Mar 18, 2018

It looks like the project itself recommends just integrating the recast library itself as part of the source code of the project, a submodule could be possible to have it a part of the main OpenMW source. It looks like they themselves maybe aren't really trying to push for integration of it into distros or whatever. It may be easier to do it that way rather than working it into various distros, will make compiling from source much easier for other people as well as it would be checked out with the main git source. That way we can integrate only the parts that we need and don't need any of the rest. Since honestly this seems like one of those things where no one else is using it, and it would be a big effort to pull it into the various distros and the like. It could even make some distros drop OpenMW, as they may not wish to pull in yet another package. Just a thought. In addition, it looks like upstream isn't exactly the most active, so if we need to make our own customizations of the library, it might indeed be best to pull it in as a library in extern or whatever.

@psi29a

This comment has been minimized.

Member

psi29a commented Mar 18, 2018

@JDGBOLT You might be right... but packaging and being angry at Debian/Ubuntu is kinda my thing. :P

@JDGBOLT

This comment has been minimized.

JDGBOLT commented Mar 18, 2018

Hey, no complaints from me here, I made my own pkgbuild to build recast and install it on my system. Just saying that as far as what might work best for the project, especially if we need to apply bugfixes to the library or whatever, having it included with OpenMW might be the easiest option.

@psi29a

This comment has been minimized.

Member

psi29a commented Mar 18, 2018

@elsid why did you switch the #include "" to <>? This seems like an unnecessary update, touches waaaaaayyyy too many files for the scope of adding cmake and confuses the preprocessor (wasting time).

https://github.com/recastnavigation/recastnavigation/pull/310/files#diff-969a1c0b241a06848bf04e8a1d4357d9R22

In practice, the difference is in the location where the preprocessor searches for the included file.

For #include the preprocessor searches in an implementation dependent manner, normally in search directories pre-designated by the compiler/IDE. This method is normally used to include standard library header files.

For #include "filename" the preprocessor searches first in the same directory as the file containing the directive, and then follows the search path used for the #include form. This method is normally used to include programmer-defined header files.

If I was you, I would revert that. Your chance of having your PR accepted will be increased.

@elsid

This comment has been minimized.

Contributor

elsid commented Mar 18, 2018

@psi29a the problem is when you build project itself it’s ok, but when you are use include_directories macro to add it to another project and include its headers, it doesn’t work.

@ghost

This comment has been minimized.

ghost commented Mar 18, 2018

It's pretty normal for experimental libraries to be hesistant to being packaged, because if they can't guarantee API/ABI stability, packages are less useful. Plus, the fact that many distributions are only shipping "stable" (outdated) versions in their repo can be very annoying for something that's still heavily in development. As the libraries mature over time this will become less of a concern. IIRC Bullet used to recommend (or still do?) bundling your own copy, but now they're packaged pretty much everywhere.

For the moment, adding recast/detour to the openmw tree is the way to go. Not sure if it should be a submodule though, as submodules can be error prone to use and they don't even get pulled by a git pull by default. We used submodules for "openengine" a long time ago and eventually booted them because of the irritation they caused. Git's handling of submodules has improved a bit since then, though.

Can't produce game situation when this path executes. Any ideas?

Ah ok, just ignore that code path for now and use the opposite one.

It's probably not used because the preloader evaluates door destinations which always causes the remote cell's object list to be loaded. There are other things even that can cause the object list to be loaded even if you've never set foot in the cell, like running scripts; and once it's loaded, it's never unloaded until you start a new game, so the "object list is already loaded" path is the more prevalent one in any case.

@JDGBOLT

This comment has been minimized.

JDGBOLT commented Mar 18, 2018

I have to admit one thing that I notice is that the game does pause a lot even when not passing by cell borders and loading new cells. It seems like if I am understanding recast/detour correctly it generates a new mesh for each of the various parameters such as radius and height. One problem I have been seeing is if say you have lots of creatures or npcs or whatever following you, it will be generating meshes for each type of creature or whatever, which has the issue of being a lot of work for the pathfinder. I think you do do caching of them which is good, but still it makes the loading quite painful when running around and such. Honestly I think it might be better to generate one mesh with some set radius/height/whatever, and only try to generate the meshes customized for creatures/npcs/whatever when they get stuck and say do the animation where they back up and try to sidestep or whatever. So we would find some value that would work for most circumstances and be able to store that on disk or whatever, and only generate customized ones for when the situation needs it, to avoid needlessly generating lots of different meshes for each type of creature or whatever.

Indoors it's usually not too much of a problem, it finishes in a fraction of a second, but outdoors where it can take on the order of 3+ seconds to generate each map, it would be a lot of work that probably isn't needed. One thing I also notice is that when running with exterior cell distance of greater than one, the mesh generation is almost excruciating to play, as it's trying to generate maps constantly for this massive play area. Honestly it might be better to only have mesh generation in the cell the player is currently in, and only rely on stored maps for further out. Since otherwise it's just constantly generating new maps and in general being a laggy mess where it will just pause for 1+ seconds even when just slowly running around. I did notice that there was a crowd functionality within recast which had a maximum radius or whatever, so it might be good to look into that and see how they are doing things, as the mesh generation is quite an intensive process, and should be used as sparingly as possible. Just some thoughts of what I have encountered while playing around.

@elsid

This comment has been minimized.

Contributor

elsid commented Mar 18, 2018

I've tried to make NavMesh in cell preloader and failed for not. The problem is I can't get rotation of object. Here is an example how I do this. getBaseNode() returns nullptr there. Probably I can get it another way.

Just to move forward I've created separated background thread, which one generates NavMeshes for each cell load. No more freezes, but it is possible for actors to find path when there is no NavMesh, then just original mechanism works.

One problem I have been seeing is if say you have lots of creatures or npcs or whatever following you, it will be generating meshes for each type of creature or whatever, which has the issue of being a lot of work for the pathfinder. I think you do do caching of them which is good, but still it makes the loading quite painful when running around and such. Honestly I think it might be better to generate one mesh with some set radius/height/whatever, and only try to generate the meshes customized for creatures/npcs/whatever when they get stuck and say do the animation where they back up and try to sidestep or whatever.

I will try to do some research, what happens when mesh built for different agent size. It is a problem, I've seen up to 10 different agent sizes for one scene. It could take too long.

@AnyOldName3

This comment has been minimized.

Member

AnyOldName3 commented Nov 1, 2018

I'll try and get you another for a Debug build.

The past few days have been really effective for finding Windows-specific issues. I can't build in debug mode because a function declaration seems to have gone missing. I'll get this sorted, but @elsid if you can manage to work with the RelWIthDebInfo stack trace screenshots in the meantime, that's your best bet, as it may be some time.

@elsid

This comment has been minimized.

Contributor

elsid commented Nov 1, 2018

Ok, I will try to find the problem somehow different. One thing I don’t understand. Tests execute same code and doesn’t crash.

@AnyOldName3

This comment has been minimized.

Member

AnyOldName3 commented Nov 1, 2018

Do we actually run the tests on AppVeyor or do we just assume they'll do the same thing as on Travis and ignore them? The AppVeyor builds don't work on my computer, so it's possible.

@psi29a

This comment has been minimized.

Member

psi29a commented Nov 1, 2018

That is something we should fix, the tests should work on all platforms. It is possible that it just hasn't be implemented.

@psi29a

This comment has been minimized.

Member

psi29a commented Nov 1, 2018

Things that need addressing:

  1. OpenMW needs to not crash on Windows
  2. 'New Game' needs to work, as in... the guard needs to come down and un-glue you from the boat's hold

Hopefully these can be resolved and we can move on with 0.46, otherwise we can always revert.

@elsid

This comment has been minimized.

Contributor

elsid commented Nov 1, 2018

One thing I've figured out. New game was broken before this merge. I'm trying to find a commit where.

@JDGBOLT

This comment has been minimized.

JDGBOLT commented Nov 1, 2018

I just found what broke, it's here: 49d8124#diff-f13496d9875b2507b0140ff8db91a2f0R25

Wrong datatype, since it's bool it's 1*1 for the max distance. Changing to float or int or whatever causes the new game sequence to successfully work.

@elsid

This comment has been minimized.

Contributor

elsid commented Nov 1, 2018

Changing type works, but there is another issue. Scripted actors couldn't reach target, so they are making circles around.

@AnyOldName3

This comment has been minimized.

Member

AnyOldName3 commented Nov 2, 2018

I'll try and get you another for a Debug build.

The past few days have been really effective for finding Windows-specific issues. I can't build in debug mode because a function declaration seems to have gone missing. I'll get this sorted, but @elsid if you can manage to work with the RelWIthDebInfo stack trace screenshots in the meantime, that's your best bet, as it may be some time.

It looks like there were two issues with Debug builds on Windows.

One was a regression from a year ago when d19839a bumped up the minimum CMake version, thereby changing a bunch of policies from OLD to NEW. In the future, this can be avoided by looking at the warnings CMake outputs before bumping the CMake version. I didn't know this was a thing, so may well have signed off on the change. Either way, it shows how infrequently I clear my CMake cache.

The other was that https://github.com/OpenMW/openmw/blob/master/components/debug/debugging.hpp#L55 uses OutputDebugString, but we hadn't actually included the Windows header which contains it. I guess maybe the headers included in that file might have indirectly included it when testing was done, but some other change removed it.

Expect a PR soon.

@AnyOldName3

This comment has been minimized.

Member

AnyOldName3 commented Nov 2, 2018

#2010 should fix debug build on Windows. I've not merged it because I'm not 100% sure that including the whole of Windows.h is the best approach, even though that seems to be what Microsoft's documentation recommends.

@AnyOldName3

This comment has been minimized.

Member

AnyOldName3 commented Nov 3, 2018

Although I think we've got everything fixed that we'd found so far, someone just reported a bump in memory usage from 1GB to 13GB+ when using recent nightlies to explore emba-5, which I assume is because of this PR. Anything we can do about that, do we back off our RAM usage if we're eating more than a computer has, and is emba-5 doing anything weird that wasn't taken into account when this PR was written that we can work around?

@akortunov

This comment has been minimized.

Contributor

akortunov commented Nov 3, 2018

Suggestions how to improve the "toggleactorspaths" console command:

  1. Do not render paths for dead actors.
  2. Do not render path, if an actor (a wandering one, for example) reached the destination, but with destination tolerance (when pathTo() returns true, I suppose).
  3. Add a shortcut ("tap", for example).
@psi29a

This comment has been minimized.

Member

psi29a commented Nov 3, 2018

I ran around Morrowind (with Expansions) for about 10 minutes:

bcurtis@Wintermute:/Workspace/Private/OpenMW/build$ ps aux | grep openmw
bcurtis 24101 232 2.4 2607352 805432 pts/12 Rl+ 09:32 7:11 ./openmw
bcurtis 24506 0.0 0.0 14756 1060 pts/17 S+ 09:35 0:00 grep --color=auto openmw
bcurtis@Wintermute:
/Workspace/Private/OpenMW/build$ ps aux | grep openmw
bcurtis 24101 241 3.9 3165424 1285744 pts/12 Rl+ 09:32 15:34 ./openmw
bcurtis 24903 0.0 0.0 14756 1012 pts/17 S+ 09:39 0:00 grep --color=auto openmw
bcurtis@Wintermute:~/Workspace/Private/OpenMW/build$ ps aux | grep openmw
bcurtis 24101 242 5.7 3707644 1876676 pts/12 Rl+ 09:32 22:29 ./openmw
bcurtis 25430 0.0 0.0 14756 1012 pts/17 S+ 09:41 0:00 grep --color=auto openmw

@psi29a

This comment has been minimized.

Member

psi29a commented Nov 3, 2018

Gluka said Yesterday at 10:47 PM

yeah, i've noticed that it can struggle pretty hard to draw the navmesh for emba5 in a reasonable amount of time but the geometry can get kind of crazy there.
also had an out of memory issue running around the tunnels in emba5 for some reason. i think it could be related to recastnav, but i need to test it some more.

@psi29a

This comment has been minimized.

Member

psi29a commented Nov 3, 2018

#2014 was added thanks to @akortunov

@Capostrophic

This comment has been minimized.

Contributor

Capostrophic commented Nov 3, 2018

Is there anything possible to do with the fact that actors may shortcut over deep holes to reach the player and subsequently fall through them?

@elsid

This comment has been minimized.

Contributor

elsid commented Nov 3, 2018

@AnyOldName3 Found problem with memory consumption. Navmesh cache tracks only navmesh data size to limit total size. Actually a cache key is usually bigger than data, because it contains all geometry, and navmesh data contains only usable for actors to move. For morrowind the ratio is 1 / 7 (data size / key size), but for emba-5 it is 1 / 48. With cache limit 256 MiB actual memory consumption should be 12544 MiB. The fix is to calculate cache item size as data size + key size. I will make a pr.

Another problem I've found is updating every tick animated objects with collision object in emba-5. It blocks navmesh updater worker to do anything except reacting to this updates. I see two possible fixes. First is to change job priority for those updates. Second is to skip updates when job queue is not empty. I will make a pr, but need some time to try and to test solutions.

@Capostrophic Is is about https://gitlab.com/OpenMW/openmw/issues/1997 ? It seems the problem is with this fallback. Need to do more research to how to fix this without breaking anything else.

@psi29a

This comment has been minimized.

Member

psi29a commented Nov 3, 2018

There is a much higher memory allocation with navmeshes, 3x so many. Using valgrind and loading my savegame just outside of excise office for both 0.45 and master:

0.45:

==27413== HEAP SUMMARY:
==27413== in use at exit: 295,334 bytes in 2,888 blocks
==27413== total heap usage: 5,301,971 allocs, 5,299,083 frees, 921,940,302 bytes allocated
==27413==
==27413== LEAK SUMMARY:
==27413== definitely lost: 8,785 bytes in 15 blocks
==27413== indirectly lost: 13,898 bytes in 77 blocks
==27413== possibly lost: 1,352 bytes in 18 blocks
==27413== still reachable: 271,299 bytes in 2,778 blocks
==27413== of which reachable via heuristic:
==27413== newarray : 1,536 bytes in 16 blocks

master:

==5409== HEAP SUMMARY:
==5409== in use at exit: 293,294 bytes in 2,854 blocks
==5409== total heap usage: 6,737,240 allocs, 6,734,386 frees, 3,241,209,132 bytes allocated
==5409==
==5409== LEAK SUMMARY:
==5409== definitely lost: 8,905 bytes in 15 blocks
==5409== indirectly lost: 13,778 bytes in 77 blocks
==5409== possibly lost: 1,352 bytes in 18 blocks
==5409== still reachable: 269,259 bytes in 2,744 blocks
==5409== of which reachable via heuristic:
==5409== newarray : 1,536 bytes in 16 blocks

What is interesting is that the same number of blocks lost, so the memory leaks have always been there... they seem to be exasperated now by master.

@psi29a

This comment has been minimized.

Member

psi29a commented Nov 3, 2018

I did a test with vanilla Morrowind using latest master. I started fresh off the boat, then flew my butt to solstheim and picked a fight with two Rieklings and a Riekling Raider then went all god mode on them and left OpenMW to run for a few hours.

The end result is that OpenMW memory consumption never budged even while the little bastards were running around and attacking me. So I don't see any memory leak leak.

Loaded fresh off the boat:
bcurtis 14970 200 1.2 2241612 416516 pts/12 Rl+ 18:18 1:20 ./openmw
bcurtis 14970 201 1.2 2241612 416516 pts/12 Rl+ 18:18 1:26 ./openmw
bcurtis 14970 204 1.2 2241740 416516 pts/12 Sl+ 18:18 1:38 ./openmw
Flew to solstheim and picked a fight:
bcurtis 14970 219 2.4 2645044 807920 pts/12 Rl+ 18:18 71:15 ./openmw
bcurtis 14970 218 2.4 2645044 807920 pts/12 Rl+ 18:18 72:29 ./openmw
bcurtis 14970 212 2.4 2645044 807920 pts/12 Rl+ 18:18 128:28 ./openmw
bcurtis 14970 211 2.4 2645044 807920 pts/12 Rl+ 18:18 140:00 ./openmw
Did a quick save and quick load:
bcurtis 14970 211 2.4 2634196 812324 pts/12 Rl+ 18:18 142:41 ./openmw
bcurtis 14970 207 2.4 2635732 812324 pts/12 Rl+ 18:18 179:20 ./openmw
bcurtis 14970 205 2.4 2635732 812324 pts/12 Rl+ 18:18 232:43 ./openmw
bcurtis 14970 205 2.4 2635732 812324 pts/12 Rl+ 18:18 237:38 ./openmw

@elsid elsid referenced this pull request Nov 4, 2018

Merged

Recast global allocator #2021

@jvoisin

This comment has been minimized.

Contributor

jvoisin commented Nov 4, 2018

It seems that using recastnavigation will bump the memory requirement of openmw :/

@psi29a

This comment has been minimized.

Member

psi29a commented Nov 4, 2018

@jvoisin have a look at #2021 Looks like the memory requirement for recastnav is being dialed back to something less grotesque. ;)

@jvoisin

This comment has been minimized.

Contributor

jvoisin commented Nov 4, 2018

It looks like it still doubles the amount of used memory, doesn't it?

@psi29a

This comment has been minimized.

Member

psi29a commented Nov 4, 2018

I'm sure there is more work to be done.

@elsid

This comment has been minimized.

Contributor

elsid commented Nov 4, 2018

It doubles amount of allocated memory, but not used memory. Basically amount of used memory depends on max nav mesh tiles cache size setting. Used memory is about 3% more with disabled cache (max nav mesh tiles cache size = 0). Default cache size adds about 500 MiB. Default settings value is 256, but cache key is stored twice in cache. This should be fixed to avoid any confusion. I will make a pr. How much memory we consider affordable for navmesh cache size?

Memory consumption with disabled cache size:
disabled_cache

Memory consumption with default cache size:
default_cache_size

@elsid elsid referenced this pull request Nov 4, 2018

Merged

Navmesh cache limit #2023

@akortunov

This comment has been minimized.

Contributor

akortunov commented Nov 5, 2018

Clang analyzer complains about some issues in the RecastNavigation itself:

In file included from /home/travis/build/OpenMW/openmw/extern/recastnavigation/Detour/Source/DetourNavMeshQuery.cpp:24:
/home/travis/build/OpenMW/openmw/extern/recastnavigation/Detour/Include/DetourCommon.h:142:17: warning: The right operand of '-' is a garbage value
        dest[0] = v1[0]-v2[0];
                       ^~~~~~
/home/travis/build/OpenMW/openmw/extern/recastnavigation/Detour/Source/DetourNavMeshQuery.cpp:511:21: warning: Called C++ object pointer is null
        if (dtStatusFailed(m_nav->getTileAndPolyByRef(ref, &tile, &poly)))
                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/travis/build/OpenMW/openmw/extern/recastnavigation/Detour/Source/DetourNavMeshQuery.cpp:556:37: warning: Division by zero
                const float* vb = &verts[((imin+1)%nv)*3];
                                          ~~~~~~~~^~~
/home/travis/build/OpenMW/openmw/extern/recastnavigation/Recast/Source/RecastMeshDetail.cpp:586:2: warning: Function call argument is an uninitialized value
        tris.push(hull[start]);
        ^~~~~~~~~~~~~~~~~~~~~~
/home/travis/build/OpenMW/openmw/extern/recastnavigation/Recast/Source/RecastMeshDetail.cpp:587:2: warning: Function call argument is an uninitialized value
        tris.push(hull[left]);
        ^~~~~~~~~~~~~~~~~~~~~

/home/travis/build/OpenMW/openmw/extern/recastnavigation/DetourTileCache/Source/DetourTileCacheBuilder.cpp:485:7: warning: Value stored to 'ndir' during its initialization is never read
                int ndir = dir;
                    ^~~~   ~~~
/home/travis/build/OpenMW/openmw/extern/recastnavigation/DetourTileCache/Source/DetourTileCacheBuilder.cpp:1280:22: warning: The left operand of '&' is a garbage value
        *dst++ = indices[2] & 0x7fff;
                 ~~~~~~~~~~ ^

Probably we should fix these issues and send a PR to the RecastNavigation repo.

@psi29a psi29a referenced this pull request Nov 5, 2018

Open

Clang 3.6 warnings #362

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment