v1.1.0
Changelog:
- Simplified Haloloop
- Added P2P support on AMD GPUs
- Added Marker API for Profiling for NVIDIA and AMD GPUs
- blocksize fix in runfunctor
- CMakeList.txt update
- Multi-RHS dslash improvements
- Fixed clang warnings
- Various bug fixes
Changelog: