This release contains new features, several notable changes, and some bug fixes.
Please download the RAJA-v0.10.0.tar.gz file below. The others will not work due to the way RAJA uses git submodules.
Notable changes include:
- Added CUDA block direct execution policies, which can be used to map loop iterations directly to CUDA thread blocks. These are analogous to the pre-existing thread direct policies. The new block direct policies can provide better performance for kernels than the block loop policies when load balancing may be an issue. Please see the RAJA User Guide for a description of all available RAJA execution policies.
- Added a plugin registry feature that will allow plugins to be linked into RAJA that can act before and after kernel launches. One benefit of this is that RAJA no longer has an explicit CHAI dependency if RAJA is used with CHAI. Future benefits will include integration with other tools for performance analysis, etc.
- Added a shift method to RAJA::View, which allows one to create a new view object from an existing one that is shifted in index space from the original. Please see the RAJA User Guide for details.
- Added support for RAJA::TypedView and RAJA::TypedOffsetLayout, so that the index type can be specified as a template parameter.
- Added helper functions to convert a RAJA::Layout object to a RAJA::OffsetLayout object and RAJA::TypedLayout to RAJA::TypedOffsetLayout. Please see the RAJA User Guide for details.
- Added a bounds checking option to RAJA Layout types as a debugging feature. This is a compile-time option that will report user errors when given View or Layout indices are out-of-bounds. See View/Layout section in the RAjA User Guide for instructions on enabling this and how this feature works.
- We've added a RAJA Template Project on GitHub, which shows how to use RAJA in an application, either as a Git submodule or as an externally installed library that you link your application against. It is available here: https://github.com/LLNL/RAJA-project-template. It is also linked to the main RAJA project page on GitHub.
- Various user documentation improvements.
- The type alias RAJA::IndexSet that was marked deprecated previously has been removed. Now, all index set usage must use the type RAJA::TypedIndexSet and specify all segment types (as template parameters) that the index set may potentially hold.
- Fix for issue in OpenMP target offload back-end that previously caused some RAJA Performance Suite kernels to seg fault when built with the XL compiler.
- Removed an internal RAJA class constructor to prevent users to do potentially incorrect, and very difficult to hunt down, things in their code that are technically not supported in RAJA, such as inserting RAJA::statement::CudaSyncThreads() in arbitrary places inside a lambda expression.
- RAJA now enforces a minimum CUDA compute capability of sm_35. Users can use the CMake variable 'CUDA_ARCH' to specify this. If not specified, the value of sm_35 will be used and an informational message will be emitted indicating this. If a user attempts to set the value lower than sm_35, CMake will error out and a message will be emitted indicating why this happened.
- Transition to using camp as a submodule after its open source release (https://github.com/llnl/camp).
- Made minimum required CMake version 3.9.
- Update BLT build system submodule to newer version (SHA-1 hash: 96419df).
- Cleaned up compiler warnings in OpenMP target back-end implementation.