-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attempt to simplify RAJA loops #2430
Attempt to simplify RAJA loops #2430
Conversation
Unused options, cases. Two alternative phi solvers, so mark the unused section as ConditionallyUsed.
Defined a RajaWrapper, which accepts a lambda function via an `operator<<`. The logic for handling indices is moved inside this wrapper. For now working in blob2d-outerloop; will move elsewhere if successful.
Compiles, but encounters illegal memory access in CUDA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
Turns out illegal memory errors not to do with lambda function funkiness, but just capturing class member variables. Forgot to test that it worked before changing code...
examples/blob2d-outerloop/blob2d.cxx
Outdated
f(_ob_i_ind_raw[id]); | ||
}); | ||
} | ||
// Note: This is private, but keyword not used due to CUDA limitation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've had a (very) quick attempt at reproducing this in plain CUDA and haven't managed it. I'd quite like to get to the bottom of this before it gets into next
-- is it a particular version of CUDA, RAJA, or some interaction with BOUT++?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is odd. The error is The enclosing parent function ("rhs") for an extended __device__ lambda cannot have private or protected access within its class
. It's something to do with "extended" (device) lambdas in CUDA:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/#extended-lambda
There's a comment here which linked to the docs NVIDIA/thrust#726 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that the class can have private members, it's just that the function where the lambda function is defined can't be private or protected.
BOUT_FOR_RAJA can be used with or without RAJA; if RAJA is not available (or DISABLE_RAJA is true) then it falls back to using BOUT_FOR. RajaForAll struct which wraps RAJA::forall only available if RAJA is available. Could implement a fall-back which uses BOUT_FOR if it turns out to be useful.
Now use the BOUT_FOR_RAJA macro. Needed to add an explicit cast from `SpecificInd` to `int`, so that the index could always be treated as an int for conversion to 2D index. Tested on laptop (no RAJA), not yet on Cori (with RAJA, CUDA)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
#if BOUT_HAS_RAJA && RUN_WITH_RAJA | ||
#include "RAJA/RAJA.hpp" // using RAJA lib | ||
#endif | ||
#define DISABLE_RAJA 0 // Disable RAJA here for testing? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: macro DISABLE_RAJA
used to declare a constant; consider using a constexpr
constant [cppcoreguidelines-macro-usage]
#define DISABLE_RAJA 0 // Disable RAJA here for testing?
^
@@ -11,7 +11,7 @@ | |||
* Based on model code, Yining Qin update GPU RAJA code since 1117-2020 | |||
*******************************************************************************/ | |||
|
|||
#define RUN_WITH_RAJA true // Use RAJA loops? | |||
#define DISABLE_RAJA 0 // Turn off RAJA in this file? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: macro DISABLE_RAJA
used to declare a constant; consider using a constexpr
constant [cppcoreguidelines-macro-usage]
#define DISABLE_RAJA 0 // Turn off RAJA in this file?
^
#endif | ||
|
||
#define RUN_WITH_RAJA 0 | ||
#define DISABLE_RAJA 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: macro DISABLE_RAJA
used to declare a constant; consider using a constexpr
constant [cppcoreguidelines-macro-usage]
#define DISABLE_RAJA 0
^
/// If no RAJA, BOUT_FOR_RAJA reverts to BOUT_FOR | ||
/// Note: Redundant ';' after closing brace should be ignored by compiler | ||
#define BOUT_FOR_RAJA(index, region) \ | ||
BOUT_FOR(index, region) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function-like macro BOUT_FOR_RAJA
used; consider a constexpr
template function [cppcoreguidelines-macro-usage]
#define BOUT_FOR_RAJA(index, region) \
^
Compiles, runs on Cori with RAJA, CUDA
When using RAJA with CUDA, the function containing the RAJA lambda function must be public, but the class can contain private members.
If class member variables are used inside a BOUT_FOR_RAJA loop, this can be used to capture them as local variables. This avoids needing to copy the member variables into local variables first.
Some unused options, incorrect capitalisation (All -> all), and laplace solver was set to hypre3d which is not always available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
examples/blob2d-outerloop/blob2d.cxx
Outdated
#else | ||
BOUT_FOR(i, region) { | ||
#endif | ||
auto _L_par = L_par; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: Value stored to _L_par
during its initialization is never read [clang-analyzer-deadcode.DeadStores]
auto _L_par = L_par;
^
examples/blob2d-outerloop/blob2d.cxx:176:12: note: Value stored to '_L_par' during its initialization is never read
@@ -40,6 +40,18 @@ | |||
#define _fe_9(_call, x, ...) _call(x); BOUT_EXPAND(_fe_8(_call, __VA_ARGS__)) | |||
#define _fe_10(_call, x, ...) _call(x); BOUT_EXPAND(_fe_9(_call, __VA_ARGS__)) | |||
|
|||
/// _ae_x set of macros expand a number of arguments with ',' between them | |||
#define _ae_1(_call, x) _call(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: function-like macro _ae_1
used; consider a constexpr
template function [cppcoreguidelines-macro-usage]
#define _ae_1(_call, x) _call(x)
^
@@ -40,6 +40,18 @@ | |||
#define _fe_9(_call, x, ...) _call(x); BOUT_EXPAND(_fe_8(_call, __VA_ARGS__)) | |||
#define _fe_10(_call, x, ...) _call(x); BOUT_EXPAND(_fe_9(_call, __VA_ARGS__)) | |||
|
|||
/// _ae_x set of macros expand a number of arguments with ',' between them | |||
#define _ae_1(_call, x) _call(x) | |||
#define _ae_2(_call, x, ...) _call(x), BOUT_EXPAND(_ae_1(_call, __VA_ARGS__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variadic macro _ae_2
used; consider using a constexpr
variadic template function [cppcoreguidelines-macro-usage]
#define _ae_2(_call, x, ...) _call(x), BOUT_EXPAND(_ae_1(_call, __VA_ARGS__))
^
/// _ae_x set of macros expand a number of arguments with ',' between them | ||
#define _ae_1(_call, x) _call(x) | ||
#define _ae_2(_call, x, ...) _call(x), BOUT_EXPAND(_ae_1(_call, __VA_ARGS__)) | ||
#define _ae_3(_call, x, ...) _call(x), BOUT_EXPAND(_ae_2(_call, __VA_ARGS__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variadic macro _ae_3
used; consider using a constexpr
variadic template function [cppcoreguidelines-macro-usage]
#define _ae_3(_call, x, ...) _call(x), BOUT_EXPAND(_ae_2(_call, __VA_ARGS__))
^
#define _ae_1(_call, x) _call(x) | ||
#define _ae_2(_call, x, ...) _call(x), BOUT_EXPAND(_ae_1(_call, __VA_ARGS__)) | ||
#define _ae_3(_call, x, ...) _call(x), BOUT_EXPAND(_ae_2(_call, __VA_ARGS__)) | ||
#define _ae_4(_call, x, ...) _call(x), BOUT_EXPAND(_ae_3(_call, __VA_ARGS__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variadic macro _ae_4
used; consider using a constexpr
variadic template function [cppcoreguidelines-macro-usage]
#define _ae_4(_call, x, ...) _call(x), BOUT_EXPAND(_ae_3(_call, __VA_ARGS__))
^
#define _ae_5(_call, x, ...) _call(x), BOUT_EXPAND(_ae_4(_call, __VA_ARGS__)) | ||
#define _ae_6(_call, x, ...) _call(x), BOUT_EXPAND(_ae_5(_call, __VA_ARGS__)) | ||
#define _ae_7(_call, x, ...) _call(x), BOUT_EXPAND(_ae_6(_call, __VA_ARGS__)) | ||
#define _ae_8(_call, x, ...) _call(x), BOUT_EXPAND(_ae_7(_call, __VA_ARGS__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variadic macro _ae_8
used; consider using a constexpr
variadic template function [cppcoreguidelines-macro-usage]
#define _ae_8(_call, x, ...) _call(x), BOUT_EXPAND(_ae_7(_call, __VA_ARGS__))
^
#define _ae_6(_call, x, ...) _call(x), BOUT_EXPAND(_ae_5(_call, __VA_ARGS__)) | ||
#define _ae_7(_call, x, ...) _call(x), BOUT_EXPAND(_ae_6(_call, __VA_ARGS__)) | ||
#define _ae_8(_call, x, ...) _call(x), BOUT_EXPAND(_ae_7(_call, __VA_ARGS__)) | ||
#define _ae_9(_call, x, ...) _call(x), BOUT_EXPAND(_ae_8(_call, __VA_ARGS__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variadic macro _ae_9
used; consider using a constexpr
variadic template function [cppcoreguidelines-macro-usage]
#define _ae_9(_call, x, ...) _call(x), BOUT_EXPAND(_ae_8(_call, __VA_ARGS__))
^
#define _ae_7(_call, x, ...) _call(x), BOUT_EXPAND(_ae_6(_call, __VA_ARGS__)) | ||
#define _ae_8(_call, x, ...) _call(x), BOUT_EXPAND(_ae_7(_call, __VA_ARGS__)) | ||
#define _ae_9(_call, x, ...) _call(x), BOUT_EXPAND(_ae_8(_call, __VA_ARGS__)) | ||
#define _ae_10(_call, x, ...) _call(x), BOUT_EXPAND(_ae_9(_call, __VA_ARGS__)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variadic macro _ae_10
used; consider using a constexpr
variadic template function [cppcoreguidelines-macro-usage]
#define _ae_10(_call, x, ...) _call(x), BOUT_EXPAND(_ae_9(_call, __VA_ARGS__))
^
/// expands to | ||
/// | ||
/// test(a), test(b), test(c) | ||
#define MACRO_FOR_EACH_ARG(arg, ...) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variadic macro MACRO_FOR_EACH_ARG
used; consider using a constexpr
variadic template function [cppcoreguidelines-macro-usage]
#define MACRO_FOR_EACH_ARG(arg, ...) \
^
/// Note: Redundant ';' after closing brace should be ignored by compiler | ||
/// Ignores any additional arguments | ||
#define BOUT_FOR_RAJA(index, region, ...) \ | ||
BOUT_FOR(index, region) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: variadic macro BOUT_FOR_RAJA
used; consider using a constexpr
variadic template function [cppcoreguidelines-macro-usage]
#define BOUT_FOR_RAJA(index, region, ...) \
^
Previously copied into local scope, now in lambda capture list [skip ci]
Explaining different ways to capture class member variables [skip ci]
Wraps up RAJA index setup, and eliminates #ifdefs from user code. Including:
This enables the physics model/user to write:
If RAJA is disabled,
BOUT_FOR_RAJA
is justBOUT_FOR
, and hopefully the redundant semicolon is ignored by the compiler.If RAJA is enabled, the above expands to
The
RajaForAll
constructor does all the work to create anArray<int>
of indices.RajaForAll
defines a templated<<
operator:which enables the lambda function to be passed in from the right (avoiding a closing bracket).
Currently implemented in
examples/blob2d-outerloop
,hasegawa-wakatani-3d
andelm-pb-outerloop
.