Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Space partitioning optimization #3

Open
aevyrie opened this issue Jan 31, 2021 · 5 comments
Open

Space partitioning optimization #3

aevyrie opened this issue Jan 31, 2021 · 5 comments

Comments

@aevyrie
Copy link
Owner

aevyrie commented Jan 31, 2021

Further accelerate ray casting performance with some BVH/space partitioning

Thoughts on how to accomplish

Using bounding volumes, place entities into an octree. Maybe instead of a bounding sphere, find the oriented bounding box (OBB). This seems much better suited because we can simply apply the entity's GlobalTransform to the bounding box, and it will be exactly correct. We only need to recompute the OBB if the mesh changes. Checking if an OBB fits into an octree cell is trivial, we can just test each of the 8 vertices of the OBB to verify that they are within the 6 planes (faces) that bound a cell.

Placing entities in an octree

  1. Starting at the top level, recurse through the octree until the entity's bounding volume no longer fits into a node
  2. Place the entity into the last node it fit into
  3. Repeat for all entities (startup) and all changed entities (subsequent frames)

Ray casting into the octree

When casting a ray, now there will be two options - get all intersections, or just the topmost. If we are only finding the topmost intersection, we can speed up a ray cast by:

  1. Start in the topmost node that the ray's origin coordinate located in. (level 0)
  2. Recurse into the octree using the given coordinate, checking which child node it's located in, until the leaf node is reached. Record any entities that are found along the way.
  3. Check for intersection with all of the entities, first checking their boundary volumes before checking the meshes.
  4. Sort intersections and return the topmost intersection if one was found - all done!
  5. Go up a level at a time from the leaf node, until you find a node that has another (untested) child(ren)
  6. Test if the ray will intersect any of these child cells (up to 8-1 = 7) in this node. If there were no intersections, go to step 5.
  7. Go into the nearest intersected cell and repeat from step 2 using the intersection point plus some tiny epsilon in the ray direction, so we don't test cells we've already looked in.
@DGriffin91
Copy link

DGriffin91 commented Sep 11, 2022

@aevyrie I've been doing some raycasting on the CPU in bevy with the BVH crate: https://crates.io/crates/bvh, and it seems quite fast. I initially was only building a bvh for each entity mesh in local space, then iterating over each entity and checking them all. This was already very fast, but I recently transitioned to building a BVH in world space that contains the bounding boxes (generated from their AABB, but transformed into world space) for each entity and traversing that first, then traversing the individual mesh BVH (BLAS). Even in scenes with very few entities (under 10) this "scene BVH" (TLAS) is a significant improvement 20-50%, and scenes with around 100 entities are now around 10x faster.

I'm not sure anymore what the improvement over no BVH at all is, but currently on single thread of an intel 6700k in small scenes with around 100 entities, and 1500 triangles, I can cast around 10,000 rays per frame at 100fps, and in larger scenes with 200 objects and 5 million triangles I can cast about 1600 rays per frame.

For the game jam, I had to make a fork of bevy_editor_pls without picking, because it was making the editor unusable, which is what made me think of this.

Curious if you've looked into partitioning optimization any more, and what your thoughts are on using a BVH?

@aevyrie
Copy link
Owner Author

aevyrie commented Sep 13, 2022

I haven't worked on this, no. At this point, I assumed most people with the need for perf would graduate to using rapier anyway. It might be worth implementing though, for those who don't want to pull in all those deps.

Really appreciate this feedback, this is some great information. Thank you!

  1. How long does it take to build the mesh BVH? I could see this impacting startup times negatively, and this was one of the reasons I haven't tackled this. This could be done for tris in mesh space at compile time as a preprocessing step, potentially.
  2. How do you handle moving entities with a BVH, do you need to regularly rebuild the tree?
  3. Are you using an octree grid, or just grouping nearby objects?

I'd love to implement this if there is such a huge speedup on the table, especially seeing as bevy_editor_pls uses it.

@DGriffin91
Copy link

DGriffin91 commented Sep 13, 2022

In my implementation there are 2 separate BVH structures. One for meshes, and one for entity AABBs. Each individual mesh has a BVH built in local space at startup (BLAS). Then every frame a scene BVH (TLAS) is built, this only contains the world space bounding boxes of each entity made from their GlobalTransform. In the future, this step could take advantage of updating the BVH (also called optimization), updating the BVH can be much faster than building from scratch, given that the number of entities that need to be updated are a small proportion of the whole scene. The method I'm using for making the scene BVH AABBs could also probably be faster.

In one scene with 236 mesh entities and 5 million triangles it takes 9.5s to build the initial mesh BVHs, and 0.5ms each frame to rebuild the scene BVH.

In a smaller scene with 100 mesh entities and 20k triangles it takes 50ms to build the initial mesh BVHs, and 0.25ms each frame to rebuild the scene BVH.

Also note that in the scene with 5 million triangles every mesh/triangle is unique, so the BVH has to be built for everything, whereas a real game at this scale would likely have some form of instancing.

I could also potentially see building the initial mesh BVHs incrementally or in a separate thread. And picking just doesn't start working for a few seconds.

For clarity sake:

  1. 9.5 seconds for a large scene, 50ms for a small one.
  2. The scene BVH is currently rebuilt every frame (0.25-0.5ms)
  3. This grouping is just a second scene BVH rather than using some other method.

@aevyrie
Copy link
Owner Author

aevyrie commented Sep 14, 2022

Thanks for the details.

the BVH has to be built for everything, whereas a real game at this scale would likely have some form of instancing.

If the BVH is associated with the mesh Asset instead of every entity with a Handle<Mesh>, you should be able to have a single BVH per unique mesh, even if instanced rendering isn't implemented. This should significantly reduce memory use and reduce cache misses. This might have a significant impact on certain scenes, as you suggest.

I like the structure you laid out. I think I'll probably do the same. To recap:

scene
|-- BVH of all AABBs in the scene [Resource] [Every Frame]
    |-- AABB of a mesh [Entity] [BVH leaf]
        |-- BVH of mesh triangles in local space [Asset] [Precomputed]
            |-- Triangle [Indices] [BVH leaf]
            |-- Triangle [Indices] [BVH leaf]
    |-- AABB of a mesh [Entity]
        |-- BVH of mesh triangles in local space [Asset] [Precomputed]
            |-- ...
    |-- ...

One nice side effect of making the mesh BVH an asset, is that you can easily serialize the BVH, and load it the next time instead of calculating it from scratch every time. Maybe include a checksum so you can update it if the mesh changes.

@DGriffin91
Copy link

I think this is a good description. I like the idea of the BVH being an asset as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants