Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: allow prioritizing collider generation over visible meshes #87

Open
Chlorobyte-but-real opened this issue Nov 15, 2019 · 22 comments
Labels
enhancement Needs research Not trivial / needs prototyping / translating paper into code

Comments

@Chlorobyte-but-real
Copy link

I've been messing with the terrain for a while, speeding through it with a little controllable capsule. It seems that there is an issue where if the capsule is on terrain that is modified, there is a chance that it might end up inside of it.
So I have a mechanism set in place that would pause physics until the high-resolution collision is generated (together with collision LOD level set). The feature I'm requesting here is basically where the system would prioritize generating these colliders for example where this capsule is, so it takes less time for the game to unpause, or maybe I could even set up something to make sure it basically never happens. For this, I don't even need the visible mesh to exist yet, the goal is to make sure collision at a certain chunk is a thing. (That last statement could be toggleable, so visible meshes could be forced together with colliders as well?)
Anyway, this should be applied to the viewer object and potentially wherever else the script wants to force collision to be generated more quickly.

@TokisanGames
Copy link
Contributor

The terrain has collision as soon as its meshed. Collision is built along the current LOD mesh surface, not along what the mesh will eventually look like.

It's likely that you're falling through the terrain when you are standing on one LOD and it switches to another with a different enough shape that you're on the other side of the isosurface. Thus you're on the other side of the collision shape as well.

I think prioritizing building LOD 0 around the character is a better request. I think some of that is done, but maybe not enough.

@Chlorobyte-but-real
Copy link
Author

I think prioritizing building LOD 0 around the character is a better request. I think some of that is done, but maybe not enough.

Yeah, something like that. Is it not possible to generate the collider before the mesh, or does the mesh have no additional data?

@TokisanGames
Copy link
Contributor

The engine generates the collision mesh from the vertices of the display mesh. They are essentially equivalent.

I believe you are asking for the fine detailed LOD 0 collison mesh generated independent of the higher LODs display mesh (lower detail). So it will calculate the mesh data twice instead of reuse what has already been calculated.

That also means the collision is a different shape than what is shown in screen.

If the mesh data is known so it could have a collision mesh generated, then it could also be displayed.

Finally neither the engine nor Zylann's module is designed that way and it doesn't sound more efficient than what is currently done.

@Chlorobyte-but-real
Copy link
Author

Right, makes sense to me. Then, is it possible to force collision to be at a lower level near the character, e.g. LOD 1, so only the mesh exists in LOD 0? That would make generating terrain slightly faster and LOD 1->0 conversion, which would be pretty common, wouldn't cause gameplay issues.
This would not only fix a few issues, but would also make generating the block result in a larger area, making it more difficult to repeatedly trigger the force generation.

Side note: how much more RAM expensive is it to have both a mesh and a collider rather than just a mesh?

@TokisanGames
Copy link
Contributor

Side note: how much more RAM expensive is it to have both a mesh and a collider rather than just a mesh?

Marginal. You can look yourself by using the monitor inside Godot and enabling it disabling collision.

Only LOD0 should exist around the character at all times. The collision LOD should be identical to the mesh at all times. If you can run quickly outside of the LOD0 sphere and have other terrain sections be updated before the one your character is on, that is a scheduling issue that could probably be worked on. Perhaps it could track the velocity of the viewer path and prebuild in that direction.

I don't see how forcing LOD1 around the character or forcing collision and the display mesh to be different LODs is more efficient or provides a better player experience. Both require dual generation of the section in LOD1 and 0 when only LOD0 should be generated, once.

@Chlorobyte-but-real
Copy link
Author

Chlorobyte-but-real commented Nov 20, 2019

Both require dual generation of the section in LOD1 and 0 when only LOD0 should be generated, once.

Perhaps when LOD 1 collision is generated, it stores it in a way that makes it easy to transfer into LOD 0 with the quality of LOD 1. I'm not exactly sure about the specifics, but I'm aware of the blocks having a different size.
As for advantages, for one, this prevents issues just in case despite all the possible improvements, the player still manages to escape LOD 0.
And collision is expensive. Making it lower detailed would improve performance, especially with how LOD 0 just seems to add a few extra details. Unless the physics engine is already good at handling lots of tiny triangles.

@Zylann
Copy link
Owner

Zylann commented Jan 11, 2020

More control over which block LODs produce collision meshes is expected. However prioritization isn't really a thing right now, because there is nothing to prioritize them over from (i.e visual mesh and collision mesh are the same mesh, and if you can see a high-res mesh, the low-res mesh is always available already).
If a player manages to escape LOD0 then it has to teleport back or freeze, I don't see any other way. In a streaming world where chunks are loaded asynchronously, you can only tune allowed speeds, improve performance and keep timings in check (or, make your world smaller and fixed-size so it can be computed in advance if it fits all in memory).

@TokisanGames
Copy link
Contributor

Everytime I load up my noise_smooth_lod demo I see several LODs load around the player, before LOD0. Currently it takes up to 5s for several LODs to be built before LOD0 is finally finished. That looks like room for improvement. We only need LOD0 around the player.

What it looks like is the entire landscape is built at LOD5 (say). Then a smaller sphere is rebuilt at LOD4, then a smaller at LOD3, and so on until LOD0 is finally built around the player.

IMO it should be reversed, and the extra, not displayed LODs not generated:

LOD0 sphere built out first. Then outside that sphere is LOD1 only, then LOD2.... If the player moves, new LOD0 blocks are built and old LOD0 blocks are replaced with LOD1, and other rippling effects. No building multiple LODs for the same location. No building other LODs or locations until LOD0 is built around the player.

Also you could:

  • prebuild non displayed LODs, but this should be a low priority background task
  • cache LOD noise data and query that for lower res LODs rather than generate new noise data.

@Zylann
Copy link
Owner

Zylann commented Jan 12, 2020

In the first implementation I attemtped to load all lods preemptively in concentric areas, much like VoxelTerrain, just with many areas instead of one. The result was very slow due to having to handle much more blocks at once (some of which you'd never see), and produced flickering holes in the world due to "taking shortcuts" and things running pretty much in parallel without constraints.

The second attempt is the octree we currently have, which works in a very similar way to what Google maps does (not really a coincidence :p). Following the constraints of octrees, it cannot subdivide if the future children are not ready, and cannot unsubdivide if the future parent is not ready (if it did, you'd get a hole). This provides the property that if a block is loaded, then blocks of lower LOD will be guaranteed to be loaded. This property is used in editing, because we know if a LOD0 block changes, it can ripple down to all lower lods without having to defer anything.
This also guarantees that at any point, due to the chosen split policy, blocks will only be bordered by neighbors of LOD-1, LOD+1, or equal LOD, but never beyond. That rule cannot be broken.

The downside is what you described: you can't tell the loader to load LOD0 in advance if LOD4 is not even loaded in the first place, because that's how octree traversal works. The look-ahead is only one level. The initial phase then looks slower than it could be. It could indeed be optimized by traversing it unrestricted, and then reversing lods if you don't move (like a preload phase). But the case where you move as blocks are loading must be considered. I may take an improvement pass at this system at some point, to see how it can be done in practice, and what the side effects are.

Note: this issue is then about two different topics so far:

  • Deciding which range of LOD is used for collision
  • Researching a way for the octree traversal to prioritize high LODs (or at least have better look-ahead).

@Chlorobyte-but-real
Copy link
Author

Chlorobyte-but-real commented Jan 12, 2020

Back to this, I guess gmail stopped providing me notifications. Thanks, gmail! The discussion about how higher LOD levels have to be loaded before LOD0 is interesting. What would happen if you queued the mesh generation and just traversed to LOD0 quickly, then put LOD0 generation in front?

Edit 1: I already do the freezing the player thing. It's just pretty annoying to get it in gameplay.

Edit 2: Yeah, I'm editing this a lot. I'm trying to get back to this conversation after the time that passed. Anyway, having LOD1 for collision and LOD0 for the visible mesh would probably help a lot. It would mean that we have to worry less about weaker devices. And besides, LOD1 would be easier to calculate collision with to begin with.

@Zylann
Copy link
Owner

Zylann commented Jan 12, 2020

What would happen if you queued the mesh generation and just traversed to LOD0 quickly, then put LOD0 generation in front?

I'm not sure to understand.

Edit 1: I already do the freezing the player thing. It's just pretty annoying to get it in gameplay.

This should be a last resort situation. If you get it often you need to improve the other things (generator speed / player speed limitation). Computer has limited resources. Even if LOD0 were prioritized (or if there was no other lods) it won't get rid of this. You could even be playing this over network, so it has to be expected.

Also note that if LOD priority is reversed making LOD0 be calculated first, but only LOD1 is used for collision, it means collision will actually arrive later than LOD0 (quite a mess to handle if you still want it before somehow). Choosing LOD1 over LOD0 for collision would only bring benefit with the current octree.

@Zylann
Copy link
Owner

Zylann commented Jan 13, 2020

I started prototyping an alternate octree implementation, along with a fake reproduction of the terrain loading behavior.
As I predicted, the initial loading does properly populate blocks starting from LOD0 and goes outwards with lower LODs... as long as you don't move.
This approach causes problems down the line.

For example, if you wait long enough for the world to load around you, then move fast, we could think the loader should still attempt to prioritize LOD0 as fast as it can around you. The result is, when looking at LOD grids, you would see a "trail" of LOD0 blocks following you, in the middle of a background composed mostly of LOD3/4 blocks.

image

It means those LOD0 blocks cannot be made visible straight away, because it would break the neighboring rule: seams between LOD0 and LOD3 are not supported. The lower LODs around must split first into higher LODs, until they can safely border LOD0... which brings us back to the original design: you need to load LOD3, then LOD2, then LOD1 before you can show LOD0, not the other way around. Physics doesn't support any seams so it could turn on straight away, but there isn't much to do about visuals. Worse, they would "feel" delayed due to LOD0 being prioritized for something invisible / yet to become visible (if it ever does). If the goal was only to have a small island of early LOD0 blocks around the player just for physics, it would not even need to change the current octree and be done separately (only viable if only this LOD has physics).

I haven't fully explored the possibilities, but so far it's not really working out.

@TokisanGames
Copy link
Contributor

:/ Well hopefully FastNoiseSIMD and Godot 4 visual server rewrite will improve things.

@Chlorobyte-but-real
Copy link
Author

Chlorobyte-but-real commented Jan 13, 2020

It means those LOD0 blocks cannot be made visible straight away, because it would break the neighboring rule: seams between LOD0 and LOD3 are not supported. The lower LODs around must split first into higher LODs, until they can safely border LOD0... which brings us back to the original design: you need to load LOD3, then LOD2, then LOD1 before you can show LOD0, not the other way around.

Interesting. Now, as for visuals, I wouldn't say they matter that much since the player is travelling so fast. Is it possible to then do that forced LOD3 - 0 generation but in reverse order? So, in terms of generating the chunks, only a single chunk of loading LOD0-3 would be prioritized. To make it clearer, 0-1-2-3, 0-1-2-3, 0-1-2-3. This prevents the player from getting really deep, reaching even higher LOD levels. Perhaps something could be done so that you can keep that speed longer before you eventually catch up, while still preventing getting really deep... I might have to experiment with a similar system later.

Also note that if LOD priority is reversed making LOD0 be calculated first, but only LOD1 is used for collision, it means collision will actually arrive later than LOD0 (quite a mess to handle if you still want it before somehow). Choosing LOD1 over LOD0 for collision would only bring benefit with the current octree.

Adding LOD1 collision to the mix, perhaps since LOD1 is used as collision, it technically doesn't matter if the player is on LOD0 or LOD1. So we'd only need to force loading LOD1, which is then followed by LOD0 purely for visuals. This would increase the amount of speed you would need to break the system. At that point, since the player is so fast, it probably wouldn't be much of a surprise for the game to have to pause in order to generate the terrain - getting there so quickly would probably be worth a whole loading screen.

So, combining these two ideas: instead of generating LOD3-2-1-0, generate it in order 1-2-3-[0]. 1-2-3 is for being able to go deep, which then is followed by 0's visuals which don't affect gameplay, but only if the player didn't attempt to move out of LOD1 again.

@Zylann
Copy link
Owner

Zylann commented Jan 13, 2020

Is it possible to then do that forced LOD3 - 0 generation but in reverse order?

Again, this assumes you start from nothing. If not, it would slow down visuals even more. You would not see anything anymore if a larger LOD has been generated beforehand in such area, since visuals work by splitting to respect the neighboring rule. Letting the octree split to its target shape so we know in advance what to actually load sounds interesting, and doing that in initialization phase would be a good way to speed up spawning, but I could not find a proper approach for the rest yet.

So far the only thing that sounds feasible is to have two sources of block schedulings: one driven by the octree as usual, and another driven by an area around the player at LOD0 or 1, for which the purpose is to provide early collision. And yet, if you still move fast enough to go past LOD0, it would have very little use and slow down the lower lods when flying far above the surface for example.

Things might sound simple but changing the way the current octree works ripples through a lot of edge cases and emergent behaviors. IMO the real bottlenecks currently are discarding colliders for lods above 0, 3D noise generators (GDScript ones being the worst), and the upload of meshes which are embarrassingly done on the main thread.

@Chlorobyte-but-real
Copy link
Author

I'm pretty much brainstorming ideas here. Actual implementation is different, of course.

So I'm going to assume that it attempts to connect the LODx chunk with the LODx+1 chunk by modifying the vertices at the edge. In that case, could a system be designed that it doesn't only handle LODx+1, but LODx+y? To prevent it from looking too ugly, the vertices would slowly pass from x to x+y, let's say one row at a time.

If this is done, a better implementation of prioritizing LOD0 while making it look as good as possible might work. Once again, I'm bringing up how it wouldn't be 0-0-0-0-1-1-1-1-2-2-2-2-3-3-3-3 in the scheduler, but rather 0-1-2-3-0-1-2-3-0-1-2-3-0-1-2-3. I marked the latest added "chunk" in bold. So it would generate LOD 0 first, going through and eventually finishing LOD 3, then working on the next forced LOD 0 chunk afterwards. If we can connect theoretically any low LOD with any high one, if the player is moving fast enough, it could be modified to become, for example, 0-1-3-0-1-3-0-1-3-0-1-3 instead, to skip generating an unnecessary LOD level, and only generate LOD 2 when it seems alright.
So the numbers feel like they are even more powerful, let's introduce LOD 4. The system, instead of 0-1-2-3-4-0-1-2-3-4, would instead go 0-2-4-0-2-4. Or, my case, all the way to LOD 6. 0-1-2-3-4-5-6-0-1-2-3-4-5-6 becomes 0-2-4-6-0-2-4-6, or probably 0-3-6-0-3-6.
But oops, I'm forgetting about the collider LOD system. So, within this system, the collider LOD (in this case LOD1) would work as "LOD0". To it (using my case, going to "LOD5"), it looks like it goes 0-1-2-3-4-5-0-1-2-3-4-5, and it optimizes it to 0-2-5-0-2-5. Using real LOD levels, this would become 1-3-6-1-3-6-0-0, as the visuals go last.

Alright, but what happens to the remaining LOD levels? Well, they also get added to the back, with only a slightly lower priority than the one that's generating collision and a "neat enough map". Let's say it becomes 1-3-6-1-3-6-2-4-5-2-4-5-0-0. Yeah, something like that should work.

Just for fun, let's describe this situation. Now all the 1-3-6s necessary for the "collision and a neat enough map" are complete. The game is generating the rest of the LOD levels. But oh no, we cut it in the middle. The queue is currently 5-2-4-5-0-0. Since it currently isn't generating a necessary LOD, it proceeds to add the levels like this: 1-3-6-5-2-4-5-2-4-5-0-0-0, and the game hopefully doesn't have to freeze until it gets generated.

Well, that ended up being REALLY long. But you get the idea.

@Zylann Zylann added enhancement Needs research Not trivial / needs prototyping / translating paper into code labels Jan 14, 2020
@Zylann
Copy link
Owner

Zylann commented Jan 14, 2020

Could a system be designed that it doesn't only handle LODx+1, but LODx+y?

That surely lifts the constraint, but VoxelMesherTransvoxel doesn't go further away. Complexity for transition mesh tables would become atrocious. VoxelMesherDMC kinda goes a bit further but is much slower, looks worse and is not even guaranteed to hide all cracks. Also you cannot just place a LOD0 mesh if a LOD3 mesh is present in the same area, they will overlap and z-fight. It has to be split first to make room for it.
I don't understand the rest, sorry...

I got new ideas since, still without really changing octree evaluation order:

Intermediary blocks could be skipped

Starting from a given octree shape, if we want to move somewhere else, we may have to transition it to a different ideal shape. Reaching such a shape requires to join or split one or more levels of nodes, with the usual split policy ensuring the bordering rule (i.e "distance"). This is currently done step by step, but some intermediary steps are not really required. Indeed, if we look at the visible LOD maps after loading is complete, we can see concentric rings of blocks converging to the viewer. All blocks located in the ring's "holes" are not needed in the final result, because that space is occuppied by the next LOD. Yet, they are loaded anyways, and they will eventually be used when we start moving. Most obvious one is LODmax. We will never see LODmax if we are within the region of the octree, so it's pointess to calculate it.

To ignore them, we may first evaluate the octree to split every node we need to split, without affecting blocks yet. Now we know what to load down to LOD0, we do a second pass in which we will trigger the loading of all nodes that aren't split. This way, only the absolutely required ones will be scheduled for loading, skipping the others. They will still be sorted based on distance and LOD.

I don't know if this solution is good yet, because it has two caveats:

  • It may cause much more blocks to be scheduled if we move fast, since we would ask for all of them from top to bottom, rather than the "top level only". i.e the "ideal octree shape" is changing faster than the efforts we make to reach it. This might be tunable by limiting how deep the look-ahead is done, with some formula based on speed and block latency.
  • Because we skip intermediaries, loaded blocks will be unable to become visible until the whole "ideal octree shape" has been fulfilled, as only the full result will satisfy the bordering rule. Which means it may load faster, but visuals will "pop in" abrubtly the faster we move. This sounds like a show-stopper so again it needs prototyping to know if it's actually viable.

Joining a LOD does not need to invoke the generator again

If we have 8 blocks of LODn that need to be joined into a lower LOD, we can simply produce LODn+1 by downscaling the 8 blocks using nearest-neighbor samplingm, using the data we already have. It works because that's what the generator would do anyways, and is even assumed by Transvoxel.
Another advantage is that we don't need to have LODn+1 if we edit LODn, because LODn+1 is derivable from it. One caveats of this approach is that I'm not sure if it would still happen in the loader thread (which currently doesn't have access to the map) or on the main thread, assuming the operation is fast enough.


Those ideas remain centered on the LOD octree system, but I still believe streaming speed isn't only related to it. It's a compromise with pros and cons, designed to work relatively well in both static and dynamic scenarios. As I said earlier, there are other obvious sources of lag which need to be addressed (and are easier to fix).

I'm writing it down here for later reference, because I'm slowing down my activity. I've been very active for weeks mostly due to Transvoxel, and this is my free time which I'd like to spend on other things :p

@Chlorobyte-but-real
Copy link
Author

Chlorobyte-but-real commented Jan 14, 2020

You cannot just place a LOD0 mesh if a LOD3 mesh is present in the same area, they will overlap and z-fight. It has to be split first to make room for it.

Indeed, it has to. I guess it's easier to just deactivate 8 corners than have lots of tiny ones, right? Or, actually, could a shader be used to simply disable parts of the world, rather than deactivating them through code? Though this would likely only work for LOD levels that are higher than the max collision LOD level.

Indeed, if we look at the LOD maps after loading is complete, we can see concentric rings of blocks converging to the viewer. All blocks located in the ring's "holes" are not needed in the final result, because that space is occuppied by the next LOD. Yet, they are loaded anyways. Most obvious one is LODmax. We will never see LODmax if we are within the region of the octree, so it's pointess to calculate it.

So I'm going to assume this is to do with how you see the high LOD before the low one loads below you. Indeed, it's pointless to calculate this. In this case, it is an advantage to reverse the LOD loading order, since the terrain under the player will appear first, then the rings around it afterwards.
But then again, we run into the issue of LOD0 touching LOD2. Now, since presumably LODs are generated around the camera (or, in my case, an object placed in front of the player's movement), perhaps it's possible to just modify the edges of LOD0 temporarily so it hides gaps, until LOD1 generates and then they kind of combine together. This would likely be a lot faster than trying to calculate LODx to LODx+y rather than using a more efficient LODx to LODx+1 algorithm.

Another advantage is that we don't need to have LODn+1 if we edit LODn, because LODn+1 is derivable from it. One caveats of this approach is that I'm not sure if it would still happen in the loader thread (which currently doesn't have access to the map) or on the main thread, assuming the operation is fast enough.

Could it somehow be calculated using the vertices then? But making the map threadsafe might be the option to go for here instead.

Those ideas remain centered on the LOD octree system, but I still believe streaming speed isn't only related to it. As I said earlier, there are other obvious sources of lag which need to be addressed.

Of course. Every bit helps, though.

@Zylann
Copy link
Owner

Zylann commented Jan 26, 2020

I just realized there is already a setting to turn off collision generation beyond a given LOD, which speeds up meshing. It's called collision_lod_count. If -1, collision meshes will generate for all blocks. If you set it to 1, only LOD0 will have collision. If you set it to 2, only LOD0 and LOD1 will have collision, etc.

I added a note in #107 regarding the neighboring rule. It might be doable to produce higher LOD first regardless as an option, at the expense of producing z-fighting/overlapping meshes.

@Chlorobyte-but-real
Copy link
Author

Chlorobyte-but-real commented Jan 26, 2020

I just realized there is already a setting to turn off collision generation beyond a given LOD, which speeds up meshing. It's called collision_lod_count. If -1, collision meshes will generate for all blocks. If you set it to 1, only LOD0 will have collision. If you set it to 2, only LOD0 and LOD1 will have collision, etc.

Yeah, I have this active. The point is to also add a minimum bound, so LOD0 collision won't get generated.

Anyway, had this thought a while ago. Thinking of neighboring, what if we had something like this: try to generate LOD0 for a given area, which then checks the neighboring chunks and if they are a LOD level higher than 1, force generate those first? If we're using chunks the same size as LOD0, this would only force LOD1 to generate, as that LOD1 block is fully surrounded by LOD2. To demonstrate what I mean: (X is the currently force generated chunk)
22222222 22222222 22222222 22222222
22222222 22222222 22222222 22222222
22111122 22111122 22111112 22111112
22100122 22100X22 22100X12 22100012
22100122 22100122 22100112 22100112
22111122 22111122 22111122 22111122
22222222 22222222 22222222 22222222
22222222 22222222 22222222 22222222

After the force LODs are dealt with, the typical generation can continue.

(I love editing just because I don't put 2 newlines after a quote...)

@Anyeos
Copy link

Anyeos commented Mar 27, 2020

I think it is a mess trying to solve a LOD problem in realtime. There are not enough computational power to generate a lot of LODs layers.
I suggest implement another LOD solution using the full resulting mesh and doing unsubdivision based on distance from viewer. More distance, more subdivision. And doing that directly over the resulting mesh (not collision) as a post processing function. Maybe it dont need to be implemented here but on the mesh object of Godot.
Anyway I really dont know if that will be faster. I actually skip the use of LOD and my terrain generation is faster than the LOD version of the same terrain.

@Zylann
Copy link
Owner

Zylann commented Mar 27, 2020

I think it is a mess trying to solve a LOD problem in realtime. There are not enough computational power to generate a lot of LODs layers.

It's complicated but it's manageable. The current system has still ways to improve. Other voxel engines do it just fine with a similar approach (UE4 Voxel Plugin)

I suggest implement another LOD solution using the full resulting mesh and doing unsubdivision based on distance from viewer.

But this is not even possible. And if it was, you'd start killing performance of the mesher, which would have to first generate a high-density mesh up to kilometers ahead, only to be decimated later, which is a waste. Right now it doesn't need to generate high-density immediately, and it doesn't need to decimate either.
Another reason it can't be done like this is because you'd need access to high-resolution voxels at any distance, which is the actual bottleneck at the moment.
Either you need to get rid of LOD completely and see only a few hundred meters ahead, or make your terrain fixed-size (and not too big).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Needs research Not trivial / needs prototyping / translating paper into code
Projects
None yet
Development

No branches or pull requests

4 participants