A comprehensive CUDA-based benchmark comparing multiple Bounding Volume Hierarchy (BVH) construction algorithms with integrated GPU ray tracing for visualization and quality evaluation.
This project implements and benchmarks three GPU-accelerated BVH construction algorithms:
- LBVH (Linear BVH)
- LBVH+ (Enhanced LBVH)
- PLOC (Parallel Locally-Ordered Clustering)
All implementations are optimized for NVIDIA GPUs using CUDA and provide detailed performance metrics including build times, SAH cost analysis, and traversal statistics.
- Multiple Construction Algorithms: Compare LBVH, LBVH+, and PLOC with configurable parameters
- CUB Radix Sort: High-performance radix sort for Morton code ordering (LBVH)
- Treelet Optimization: Surface area heuristic-based optimization pass (LBVH+)
- Configurable PLOC Radius: Test different search radii (e.g., 10, 25, 100) for PLOC
- OBJ File Support: Load and process triangle meshes from OBJ files (triangles, quads, n-gons)
- Synthetic Mesh Generation: Create random triangle meshes for scalability testing
- BVH Export: Export constructed BVH to binary or OBJ format for external visualization
- CSV Export: Export detailed statistics for analysis
- Detailed Timing Breakdown: Per-stage timing for each construction phase
- SAH Cost Evaluation: Surface Area Heuristic cost calculation
- Tree Statistics: Node count, leaf count, max depth, average leaf depth
- Throughput Metrics: Million triangles per second
- Four Shading Modes:
- Normal: Surface normals mapped to RGB colors
- Depth: Grayscale depth buffer visualization
- Diffuse: Simple directional lighting with ambient occlusion
- Heatmap: BVH traversal cost visualization (nodes visited per ray)
- Flexible Camera Control: Automatic scene framing or manual camera positioning
- Traversal Statistics: Average/max nodes visited, AABB tests, triangle intersections
- PPM Image Output: Standard image format for easy viewing
- JSON Configuration: Define test suites with multiple models and algorithms
- Statistical Aggregation: Mean, standard deviation, min, max across iterations
- Warmup Iterations: Optional warmup pass before timing measurements
- Progress Tracking: Monitor batch execution progress
- Integrated Rendering: Optionally render images during batch tests
BVH/
├── include/ # Header files
│ ├── bvh_builder.h # Abstract BVH builder interface
│ ├── bvh_node.h # Unified BVH node structure
│ ├── common.h # CUDA utilities and error checking
│ ├── mesh.h # Structure-of-Arrays triangle mesh
│ ├── evaluator.h # SAH calculation and quality metrics
│ ├── render_engine.h # GPU ray tracing engine interface
│ ├── batch_config.h # Batch configuration structures
│ └── batch_runner.h # Batch test runner
├── src/
│ ├── cuda/ # CUDA implementations
│ │ ├── lbvh_builder.cu # LBVH with CUB radix sort
│ │ ├── lbvh_builder.cuh
│ │ ├── lbvh_plus_builder.cu # LBVH+ with Thrust and optimization
│ │ ├── lbvh_plus_builder.cuh
│ │ ├── ploc_builder.cu # PLOC algorithm
│ │ └── ploc_builder.cuh
│ ├── main.cpp # Main benchmark driver
│ ├── loader.cpp # OBJ loading and mesh generation
│ ├── evaluator.cpp # BVH quality evaluation
│ ├── batch_config.cpp # JSON configuration parsing
│ ├── batch_runner.cpp # Batch testing orchestration
│ └── render_engine.cu # GPU ray tracing implementation
├── models/ # OBJ test models
│ ├── buddha.obj
│ ├── conference.obj
│ ├── dragon.obj
│ ├── gallery.obj
│ ├── hairball.obj
│ ├── monkey.obj
│ ├── powerplant.obj
│ ├── sibnek.obj
│ └── sponza.obj
├── docs/ # Documentation
│ └── render_engine.md
├── batch_config.json # Example batch configuration
├── CMakeLists.txt # Build configuration
└── README.md # This file
- CUDA Toolkit 11.0 or later
- CMake 3.18 or later
- C++ Compiler with C++17 support
- NVIDIA GPU with compute capability 6.0 or higher
- CUB (included with CUDA Toolkit)
- Thrust (included with CUDA Toolkit)
git clone <repository-url>
cd BVHmkdir build
cd buildcmake ..Note: CMake will automatically detect your GPU architecture. To specify manually:
cmake -DCMAKE_CUDA_ARCHITECTURES=86 .. # For RTX 30-series
cmake -DCMAKE_CUDA_ARCHITECTURES=89 .. # For RTX 40-seriescmake --build . --config ReleaseOr on Linux/macOS:
make -j$(nproc)./bvh_benchmark [options] -i, --input <file> Load OBJ file
-n, --triangles <count> Generate N random triangles (default: 1000000)
-a, --algorithm <name> Run specific algorithm: lbvh, lbvh+, ploc, all
-r, --radius <value> Set search radius for PLOC (default: 25)
-o, --output <file> Export BVH to file (OBJ or binary)
-c, --colab-export Export as binary format
-l, --leaves-only Export only leaf bounding boxes
--csv-export <file> Export statistics to CSV file
--render <prefix> Render each BVH to <prefix>_<algo>.ppm
--render-size <WxH> Image resolution (default: 1024x768)
--shading <mode> Shading mode: normal|depth|diffuse|heatmap (default: normal)
--camera <ex,ey,ez,lx,ly,lz> Camera position and look-at point
--camera-up <ux,uy,uz> Camera up vector (default: 0,1,0)
--fov <degrees> Vertical field of view (default: 60)
--batch <config.json> Run batch tests from JSON configuration
--batch-quiet Suppress output except errors
-h, --help Show help message
Run all algorithms on 10 million random triangles:
./bvh_benchmark -n 10000000Load OBJ file and test specific algorithm:
./bvh_benchmark -i models/dragon.obj -a lbvh+Generate BVH and export for visualization:
./bvh_benchmark -n 1000000 -o bvh.bin -cCompare PLOC with custom radius:
./bvh_benchmark -n 5000000 -a ploc -r 30Render with heatmap shading to visualize BVH quality:
./bvh_benchmark -i models/sponza.obj -a all --render output --shading heatmapTest with custom camera and export statistics:
./bvh_benchmark -i models/buddha.obj -a lbvh+ \
--render buddha --camera 0,2,5,0,1,0 --fov 45 \
--csv-export results.csvRun comprehensive batch tests:
./bvh_benchmark --batch batch_config.jsonRun batch tests quietly (errors only):
./bvh_benchmark --batch batch_config.json --batch-quietExample batch_config.json:
{
"iterations": 5,
"warmup": true,
"quiet": false,
"output": "benchmark_results.csv",
"algorithms": ["lbvh", "lbvh+", "ploc"],
"ploc_radius": [10, 25, 100],
"models": [
{
"type": "obj",
"path": "models/dragon.obj",
"name": "Dragon"
},
{
"type": "random",
"triangles": 1000000,
"name": "Random 1M"
}
],
"render": {
"enabled": true,
"prefix": "batch_render",
"size": "1920x1080",
"shading": "heatmap"
}
}The benchmark provides detailed output for each algorithm:
Building BVH with LBVH...
Build time: 12.34 ms
Morton codes: 1.23 ms
Sorting: 4.56 ms
Hierarchy: 3.45 ms
AABB: 2.10 ms
Tree Statistics:
Total nodes: 1999998
Leaf nodes: 1000000
Internal nodes: 999998
Max depth: 42
Avg leaf depth: 28.3
SAH Cost: 2.45e+06
Throughput: 81.04 Mtris/s
Rendered images are saved as PPM files:
output_lbvh.ppm- LBVH renderoutput_lbvh+.ppm- LBVH+ renderoutput_ploc.ppm- PLOC render
Heatmap mode shows BVH traversal efficiency:
- Blue/Cool colors: Efficient (fewer nodes traversed)
- Red/Hot colors: Inefficient (many nodes traversed)
Statistics are exported in CSV format with columns:
- Model name
- Triangle count
- Algorithm name
- Build time (ms)
- SAH cost
- Total nodes
- Leaf nodes
- Max depth
- Average leaf depth
- Throughput (Mtris/s)
The project includes several test models in the models/ directory:
- buddha.obj - Stanford Buddha
- conference.obj - Conference room scene
- dragon.obj - Stanford Dragon
- gallery.obj - Art gallery interior
- hairball.obj - Complex hairball geometry
- monkey.obj - Suzanne monkey head
- powerplant.obj - Industrial powerplant scene
- sibnek.obj - Sibenik Cathedral
- sponza.obj - Crytek Sponza Atrium
Additional models can be downloaded from:
McGuire Computer Graphics Archive
This archive contains a wide variety of high-quality 3D models suitable for BVH testing, including San Miguel, Bistro, and many more.
Place downloaded OBJ files in the models/ directory.
| Algorithm | Sorting Library | Optimization | Typical Speed | Quality (SAH) |
|---|---|---|---|---|
| LBVH | CUB | None | Fastest | Good |
| LBVH+ | Thrust | Treelet | Fast | Better |
| PLOC | Thrust | Clustering | Moderate | Best |
LBVH is the fastest construction method, suitable for dynamic scenes requiring frequent BVH rebuilds.
LBVH+ adds a post-processing optimization pass that restructures small subtrees (treelets) using SAH, improving quality with minimal performance overhead.
PLOC produces the highest quality BVHs by clustering nearby primitives during construction. The search radius parameter controls the trade-off between build time and quality.
- Use LBVH for dynamic scenes where BVH needs frequent rebuilding
- Use LBVH+ for balanced performance when you need better quality without much slowdown
- Use PLOC for static scenes where build time is less critical than render performance
- Test different PLOC radii (e.g., 10, 25, 100) to find optimal trade-offs
- Use batch mode for comprehensive benchmarking across multiple models
- Use heatmap rendering to visualize BVH quality and identify problem areas
This project is for academic and research purposes.
This project implements algorithms from the following papers:
-
Tero Karras. "Maximizing parallelism in the construction of BVHs, octrees, and k-d trees." Proceedings of the Fourth ACM SIGGRAPH/Eurographics Conference on High-Performance Graphics, 2012, pp. 33-37.
https://doi.org/10.2312/EGGH/HPG12/033-037 -
Tero Karras and Timo Aila. "Fast parallel construction of high-quality bounding volume hierarchies." Proceedings of the 5th High-Performance Graphics Conference, 2013, pp. 89-99.
https://doi.org/10.1145/2492045.2492055 -
Daniel Meister and Jiří Bittner. "Parallel locally-ordered clustering for bounding volume hierarchy construction." IEEE Transactions on Visualization and Computer Graphics, vol. 24, no. 3, 2017, pp. 1345-1353.
https://doi.org/10.1109/TVCG.2017.2669983