New features:
- Add coefficient of variance to bandwidth output statistics
- Add huge page support for host memory (disabled on Windows)
- Add option to sample pairs in device-to-device tests
- Add troubleshooting guide
- Unify multinode and single-node execution paths
Improvements:
- Improve CUDA architecture detection without requiring GPU access
- Deprecate Volta (sm_70/sm_72) support for CUDA toolkit >=13.0
Bug fixes:
- Fix JSON output aggregation
Platform:
- Skip Boost static libs on Azure Linux