Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support GPU ray cast #3

Closed
erwincoumans opened this issue Jun 15, 2013 · 6 comments
Closed

Support GPU ray cast #3

erwincoumans opened this issue Jun 15, 2013 · 6 comments

Comments

@erwincoumans
Copy link
Member

GPU accelerated ray casting should be added.

It can be used for picking, line-of-sight queries and graphics related queries (shadow generation, ray tracer etc). There is a basic demo as a starting point. It supports GPU ray cast against spheres. It needs to be extended to support all collision shapes.

@erwincoumans
Copy link
Member Author

Latest trunk has GPU raycast for sphere and convex shapes. It can be used for picking and for basic raytracing. The App_BasicGpuDemo uses it for picking, and the App_Bullet3* demo includes a basic raytraced test.

@rtrius
Copy link
Contributor

rtrius commented Feb 21, 2014

I'm starting to make some effort on accelerating the GPU raycaster in this branch
(work in progress, not ready for review):
https://github.com/rtrius/bullet3/tree/plbvh_raycast

Presently, the branch implements a Linear BVH that runs on the GPU.
Only the PairBench demo, without large AABBs, is currently supported.

Using the LBVH for the broadphase is ~1-2 ms slower than the 1-axis GPU SAP broadphase. However, the BVH construction algorithm is of extremely low quality and it has yet to be optimized, so there should be a large amount of room for improvement.

@erwincoumans
Copy link
Member Author

That is very exciting! So you are building the tree on the GPU, and also the traversal?
Please keep me posted if you have something to merge, then j can review and pull/merge!

@rtrius
Copy link
Contributor

rtrius commented Mar 14, 2014

The plbvh_raycast branch is now ready for evaluation.

The latest commit includes slides in the /docs folder that explains the LBVH construction. Additionally, both construction and traversal should be functional and reasonably optimized.

The raycaster is also acclerated using the BVH. With the acceleration, most of the time in the raycasting demo is spent in ray generation and CPU-GPU data transfer. However, ray-triangle mesh and ray-compound tests are still not implemented.

The current method for raycasting is:
-Update BVH using AABBs from any broadphase
-BVH traversal generates {ray_index, rigid_index} pairs
-Sort {ray_index, rigid_index} pairs by ray_index
-Launch ray-rigid narrowphase/intersection kernel, 1 thread per ray

In my tests, the Linear BVH performs better than the 1-axis SAP for dense scenes, but worse for sparse scenes. Both broadphases are worse than the grid, due to the uniform size of objects. With 27000 AABBs, the break even point where the PLBVH starts to outperform the 1-axis SAP in the PairBench is about scale 76(~5700 overlapping pairs).

Some timings for the PairBench with 30x30x30 == 27000 objects, Radeon 5850HD:

At scale: 20 (~226000 overlapping pairs)
    GPU Grid: 9.9 ms
    GPU 1-SAP LDS: 13 ms
    PLBVH: 10.1 ms (3 ms construction, 7 ms traversal)

At scale: 300 (~100 overlapping pairs)
    GPU Grid: 1.9 ms
    GPU 1-SAP LDS: 2.5 ms
    PLBVH: 4 ms (2.8 ms construction, 1.2 ms traversal)

Please let me know if there are any issues or places for improvement.

@erwincoumans
Copy link
Member Author

Thanks a lot, that sounds good!
You provided the timings for finding the overlapping pairs, right?
Do you have any timings for the actual ray test? There is a noticeable slow-down when doing a ray test for many objects, when using the 1-SAP and Grid, because ray test is brute force.
How are the timings of the ray test using the BVH compare?

@rtrius
Copy link
Contributor

rtrius commented Mar 15, 2014

Here are some timings for the RaycastShadowDemo with the simulation paused,
and the same GPU as above. (Edit: texture size is 512 x 512 == 262 144 rays.)

  • ms/frame includes ray generation, CPU-GPU transfer, and writing the result to OpenGL.
  • 'primary rays' and 'shadow rays' is the time spent executing BVH/raycast kernels.

Raycast Scene 1

Timings for Scene 1 (5 x 50 x 5 = 1250 dynamic rigids):

  • Brute force: 716ms/frame (294ms primary rays, 313ms shadow rays)
  • BVH: 120 ms/frame (25ms primary rays, 26ms shadow rays)

Timings with same camera position as Scene 1, but fewer rigid bodies
(2 x 50 x 2 = 200 dynamic rigids):

  • Brute force: 211 ms/frame (48ms primary rays, 50ms shadow rays )
  • BVH: 106 ms/frame (14ms primary rays, 13ms shadow rays)
    raycastscene1

Raycast Scene 2

Timings for Scene2 (20 x 20 x 20 = 8000 dynamic rigids)

  • Brute force: 4100ms/frame (2100ms primary rays, 1900ms shadow rays)
  • BVH: 236 ms/frame (85ms primary rays, 34ms shadow rays)
    raycastscene2

Actual profiling output for 'Raycast Scene 2'

(The time for 'Root' over multiple frames varies from 221ms to 244ms):
Profiling: Root (total running time: 235.173 ms) ---
0 -- glFinish (0.00 %) :: 0.000 ms / frame (1 calls)
1 -- window->endRendering (0.22 %) :: 0.521 ms / frame (1 calls)
2 -- gui->draw (1.37 %) :: 3.215 ms / frame (1 calls)
3 -- renderScene (98.30 %) :: 231.175 ms / frame (1 calls)
4 -- texture (0.00 %) :: 0.000 ms / frame (0 calls)
Unaccounted: (0.111 %) :: 0.262 ms
...----------------------------------
...Profiling: renderScene (total running time: 231.175 ms) ---
...0 -- raytrace (99.98 %) :: 231.140 ms / frame (1 calls)
...Unaccounted: (0.015 %) :: 0.035 ms
......----------------------------------
......Profiling: raytrace (total running time: 231.140 ms) ---
......0 -- drawTexturedRect (0.29 %) :: 0.672 ms / frame (1 calls)
......1 -- glGetError (0.00 %) :: 0.001 ms / frame (2 calls)
......2 -- glTexImage2D (0.32 %) :: 0.739 ms / frame (1 calls)
......3 -- get error (0.00 %) :: 0.006 ms / frame (1 calls)
......4 -- write texels (7.16 %) :: 16.544 ms / frame (1 calls)
......5 -- cast shadow rays (19.94 %) :: 46.086 ms / frame (1 calls)
......6 -- init shadow rays (6.98 %) :: 16.132 ms / frame (1 calls)
......7 -- shadowHits.resize (5.88 %) :: 13.596 ms / frame (1 calls)
......8 -- shadowRays.resize (3.99 %) :: 9.223 ms / frame (1 calls)
......9 -- cast primary rays (39.27 %) :: 90.778 ms / frame (1 calls)
......10 -- init hits (1.56 %) :: 3.612 ms / frame (1 calls)
......11 -- hits.resize (3.72 %) :: 8.602 ms / frame (1 calls)
......12 -- Generate primary rays (7.90 %) :: 18.258 ms / frame (1 calls)
......13 -- readbackAllBodiesToCpu (0.47 %) :: 1.079 ms / frame (1 calls)
......14 -- update camera (0.00 %) :: 0.005 ms / frame (1 calls)
......Unaccounted: (2.512 %) :: 5.807 ms
.........----------------------------------
.........Profiling: cast shadow rays (total running time: 46.086 ms) ---
.........0 -- castRaysGPU (99.99 %) :: 46.082 ms / frame (1 calls)
.........Unaccounted: (0.009 %) :: 0.004 ms
............----------------------------------
............Profiling: castRaysGPU (total running time: 46.082 ms) ---
............0 -- raycast copyToHost (11.08 %) :: 5.108 ms / frame (1 calls)
............1 -- ray-rigid intersection (6.17 %) :: 2.845 ms / frame (1 calls)
............2 -- detect ray-rigid pair index ranges (9.96 %) :: 4.588 ms / frame (1 calls)
............3 -- sort ray-rigid pairs (0.26 %) :: 0.121 ms / frame (1 calls)
............4 -- PLBVH testRaysAgainstBvhAabbs() (46.53 %) :: 21.443 ms / frame (1 calls)
............5 -- b3ParallelLinearBvh::build() (6.45 %) :: 2.972 ms / frame (1 calls)
............6 -- raycast copyFromHost (19.11 %) :: 8.807 ms / frame (1 calls)
............Unaccounted: (0.430 %) :: 0.198 ms
...............----------------------------------
...............Profiling: detect ray-rigid pair index ranges (total running time: 4.588 ms) ---
...............0 -- reset ray-rigid pair index ranges (68.55 %) :: 3.145 ms / frame (1 calls)
...............Unaccounted: (31.452 %) :: 1.443 ms
...............----------------------------------
...............Profiling: PLBVH testRaysAgainstBvhAabbs() (total running time: 21.443 ms) ---
...............0 -- PLBVH ray test large AABB (19.36 %) :: 4.151 ms / frame (1 calls)
...............1 -- PLBVH ray test small AABB (78.53 %) :: 16.839 ms / frame (1 calls)
...............Unaccounted: (2.113 %) :: 0.453 ms
...............----------------------------------
...............Profiling: b3ParallelLinearBvh::build() (total running time: 2.972 ms) ---
...............0 -- m_findLeafIndexRangesKernel (4.41 %) :: 0.131 ms / frame (1 calls)
...............1 -- b3GpuParallelLinearBvh::constructBinaryRadixTree() (37.18 %) :: 1.105 ms / frame (1 calls)
...............2 -- Sort leaves by morton codes (23.82 %) :: 0.708 ms / frame (1 calls)
...............3 -- Assign morton codes (4.07 %) :: 0.121 ms / frame (1 calls)
...............4 -- Find AABB of merged nodes (9.35 %) :: 0.278 ms / frame (1 calls)
...............5 -- Separate large and small AABBs (20.83 %) :: 0.619 ms / frame (1 calls)
...............Unaccounted: (0.336 %) :: 0.010 ms
..................----------------------------------
..................Profiling: b3GpuParallelLinearBvh::constructBinaryRadixTree() (total running time: 1.105 ms) ---
..................0 -- m_buildBinaryRadixTreeAabbsRecursiveKernel (46.15 %) :: 0.510 ms / frame (1 calls)
..................1 -- m_findDistanceFromRootKernel (12.94 %) :: 0.143 ms / frame (1 calls)
..................2 -- m_buildBinaryRadixTreeInternalNodesKernel (19.00 %) :: 0.210 ms / frame (1 calls)
..................3 -- m_buildBinaryRadixTreeLeafNodesKernel (10.68 %) :: 0.118 ms / frame (1 calls)
..................4 -- m_computeAdjacentPairCommonPrefixKernel (11.04 %) :: 0.122 ms / frame (1 calls)
..................Unaccounted: (0.181 %) :: 0.002 ms
.....................----------------------------------
.....................Profiling: m_buildBinaryRadixTreeAabbsRecursiveKernel (total running time: 0.510 ms) ---
.....................0 -- copy maxDistanceFromRoot to CPU (30.39 %) :: 0.155 ms / frame (1 calls)
.....................Unaccounted: (69.608 %) :: 0.355 ms
.........----------------------------------
.........Profiling: cast primary rays (total running time: 90.778 ms) ---
.........0 -- castRaysGPU (100.00 %) :: 90.775 ms / frame (1 calls)
.........Unaccounted: (0.003 %) :: 0.003 ms
............----------------------------------
............Profiling: castRaysGPU (total running time: 90.775 ms) ---
............0 -- raycast copyToHost (5.46 %) :: 4.957 ms / frame (1 calls)
............1 -- ray-rigid intersection (7.01 %) :: 6.367 ms / frame (1 calls)
............2 -- detect ray-rigid pair index ranges (17.13 %) :: 15.552 ms / frame (1 calls)
............3 -- sort ray-rigid pairs (0.13 %) :: 0.122 ms / frame (1 calls)
............4 -- PLBVH testRaysAgainstBvhAabbs() (57.84 %) :: 52.507 ms / frame (1 calls)
............5 -- b3ParallelLinearBvh::build() (3.37 %) :: 3.059 ms / frame (1 calls)
............6 -- raycast copyFromHost (8.83 %) :: 8.014 ms / frame (1 calls)
............Unaccounted: (0.217 %) :: 0.197 ms
...............----------------------------------
...............Profiling: detect ray-rigid pair index ranges (total running time: 15.552 ms) ---
...............0 -- reset ray-rigid pair index ranges (49.18 %) :: 7.649 ms / frame (1 calls)
...............Unaccounted: (50.817 %) :: 7.903 ms
...............----------------------------------
...............Profiling: PLBVH testRaysAgainstBvhAabbs() (total running time: 52.507 ms) ---
...............0 -- PLBVH ray test large AABB (9.62 %) :: 5.053 ms / frame (1 calls)
...............1 -- PLBVH ray test small AABB (89.46 %) :: 46.975 ms / frame (1 calls)
...............Unaccounted: (0.912 %) :: 0.479 ms
...............----------------------------------
...............Profiling: b3ParallelLinearBvh::build() (total running time: 3.059 ms) ---
...............0 -- m_findLeafIndexRangesKernel (4.09 %) :: 0.125 ms / frame (1 calls)
...............1 -- b3GpuParallelLinearBvh::constructBinaryRadixTree() (36.81 %) :: 1.126 ms / frame (1 calls)
...............2 -- Sort leaves by morton codes (24.75 %) :: 0.757 ms / frame (1 calls)
...............3 -- Assign morton codes (3.89 %) :: 0.119 ms / frame (1 calls)
...............4 -- Find AABB of merged nodes (9.02 %) :: 0.276 ms / frame (1 calls)
...............5 -- Separate large and small AABBs (21.22 %) :: 0.649 ms / frame (1 calls)
...............Unaccounted: (0.229 %) :: 0.007 ms
..................----------------------------------
..................Profiling: b3GpuParallelLinearBvh::constructBinaryRadixTree() (total running time: 1.126 ms) ---
..................0 -- m_buildBinaryRadixTreeAabbsRecursiveKernel (46.80 %) :: 0.527 ms / frame (1 calls)
..................1 -- m_findDistanceFromRootKernel (12.70 %) :: 0.143 ms / frame (1 calls)
..................2 -- m_buildBinaryRadixTreeInternalNodesKernel (18.56 %) :: 0.209 ms / frame (1 calls)
..................3 -- m_buildBinaryRadixTreeLeafNodesKernel (10.66 %) :: 0.120 ms / frame (1 calls)
..................4 -- m_computeAdjacentPairCommonPrefixKernel (10.83 %) :: 0.122 ms / frame (1 calls)
..................Unaccounted: (0.444 %) :: 0.005 ms
.....................----------------------------------
.....................Profiling: m_buildBinaryRadixTreeAabbsRecursiveKernel (total running time: 0.527 ms) ---
.....................0 -- copy maxDistanceFromRoot to CPU (31.88 %) :: 0.168 ms / frame (1 calls)
.....................Unaccounted: (68.121 %) :: 0.359 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants