Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to simplify RAJA loops #2430

Merged
merged 13 commits into from
Sep 21, 2021

Conversation

bendudson
Copy link
Contributor

@bendudson bendudson commented Sep 17, 2021

Wraps up RAJA index setup, and eliminates #ifdefs from user code. Including:

#include "bout/rajalib.hxx"

This enables the physics model/user to write:

BOUT_FOR_RAJA(i, region) {
   /* ... */
};   /// <- Note extra semicolon

If RAJA is disabled, BOUT_FOR_RAJA is just BOUT_FOR, and hopefully the redundant semicolon is ignored by the compiler.
If RAJA is enabled, the above expands to

RajaForAll(region) << [=] RAJA_DEVICE(int index) {
  /* ... */
};

The RajaForAll constructor does all the work to create an Array<int> of indices. RajaForAll defines a templated << operator:

struct RajaForAll {
  ...
  template<typename F>
  const RajaForAll& operator<<(F f) const {

which enables the lambda function to be passed in from the right (avoiding a closing bracket).

Currently implemented in examples/blob2d-outerloop, hasegawa-wakatani-3d and elm-pb-outerloop.

Unused options, cases.
Two alternative phi solvers, so mark the unused section as
ConditionallyUsed.
Defined a RajaWrapper, which accepts a lambda function
via an `operator<<`. The logic for handling indices is
moved inside this wrapper.

For now working in blob2d-outerloop; will move elsewhere
if successful.
Compiles, but encounters illegal memory access in CUDA
@bendudson bendudson added the work in progress Not ready for merging label Sep 17, 2021
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

examples/blob2d-outerloop/blob2d.cxx Outdated Show resolved Hide resolved
examples/blob2d-outerloop/blob2d.cxx Outdated Show resolved Hide resolved
Turns out illegal memory errors not to do with lambda function
funkiness, but just capturing class member variables.
Forgot to test that it worked before changing code...
@bendudson bendudson removed the work in progress Not ready for merging label Sep 17, 2021
f(_ob_i_ind_raw[id]);
});
}
// Note: This is private, but keyword not used due to CUDA limitation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a (very) quick attempt at reproducing this in plain CUDA and haven't managed it. I'd quite like to get to the bottom of this before it gets into next -- is it a particular version of CUDA, RAJA, or some interaction with BOUT++?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is odd. The error is The enclosing parent function ("rhs") for an extended __device__ lambda cannot have private or protected access within its class. It's something to do with "extended" (device) lambdas in CUDA:
https://docs.nvidia.com/cuda/cuda-c-programming-guide/#extended-lambda
There's a comment here which linked to the docs NVIDIA/thrust#726 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the class can have private members, it's just that the function where the lambda function is defined can't be private or protected.

BOUT_FOR_RAJA can be used with or without RAJA; if RAJA is not
available (or DISABLE_RAJA is true) then it falls back to using
BOUT_FOR.

RajaForAll struct which wraps RAJA::forall only available if RAJA
is available. Could implement a fall-back which uses BOUT_FOR
if it turns out to be useful.
Now use the BOUT_FOR_RAJA macro. Needed to add an explicit
cast from `SpecificInd` to `int`, so that the index
could always be treated as an int for conversion to 2D index.

Tested on laptop (no RAJA), not yet on Cori (with RAJA, CUDA)
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

#if BOUT_HAS_RAJA && RUN_WITH_RAJA
#include "RAJA/RAJA.hpp" // using RAJA lib
#endif
#define DISABLE_RAJA 0 // Disable RAJA here for testing?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: macro DISABLE_RAJA used to declare a constant; consider using a constexpr constant [cppcoreguidelines-macro-usage]

#define DISABLE_RAJA 0  // Disable RAJA here for testing?
        ^

@@ -11,7 +11,7 @@
* Based on model code, Yining Qin update GPU RAJA code since 1117-2020
*******************************************************************************/

#define RUN_WITH_RAJA true // Use RAJA loops?
#define DISABLE_RAJA 0 // Turn off RAJA in this file?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: macro DISABLE_RAJA used to declare a constant; consider using a constexpr constant [cppcoreguidelines-macro-usage]

#define DISABLE_RAJA 0        // Turn off RAJA in this file?
        ^

#endif

#define RUN_WITH_RAJA 0
#define DISABLE_RAJA 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: macro DISABLE_RAJA used to declare a constant; consider using a constexpr constant [cppcoreguidelines-macro-usage]

#define DISABLE_RAJA 0
        ^

/// If no RAJA, BOUT_FOR_RAJA reverts to BOUT_FOR
/// Note: Redundant ';' after closing brace should be ignored by compiler
#define BOUT_FOR_RAJA(index, region) \
BOUT_FOR(index, region)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function-like macro BOUT_FOR_RAJA used; consider a constexpr template function [cppcoreguidelines-macro-usage]

#define BOUT_FOR_RAJA(index, region) \
        ^

Compiles, runs on Cori with RAJA, CUDA
When using RAJA with CUDA, the function containing the RAJA lambda
function must be public, but the class can contain private members.
If class member variables are used inside a BOUT_FOR_RAJA loop,
this can be used to capture them as local variables. This avoids
needing to copy the member variables into local variables first.
Some unused options, incorrect capitalisation (All -> all),
and laplace solver was set to hypre3d which is not always
available.
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

#else
BOUT_FOR(i, region) {
#endif
auto _L_par = L_par;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: Value stored to _L_par during its initialization is never read [clang-analyzer-deadcode.DeadStores]

auto _L_par = L_par;
           ^
examples/blob2d-outerloop/blob2d.cxx:176:12: note: Value stored to '_L_par' during its initialization is never read

@@ -40,6 +40,18 @@
#define _fe_9(_call, x, ...) _call(x); BOUT_EXPAND(_fe_8(_call, __VA_ARGS__))
#define _fe_10(_call, x, ...) _call(x); BOUT_EXPAND(_fe_9(_call, __VA_ARGS__))

/// _ae_x set of macros expand a number of arguments with ',' between them
#define _ae_1(_call, x) _call(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function-like macro _ae_1 used; consider a constexpr template function [cppcoreguidelines-macro-usage]

#define _ae_1(_call, x) _call(x)
        ^

@@ -40,6 +40,18 @@
#define _fe_9(_call, x, ...) _call(x); BOUT_EXPAND(_fe_8(_call, __VA_ARGS__))
#define _fe_10(_call, x, ...) _call(x); BOUT_EXPAND(_fe_9(_call, __VA_ARGS__))

/// _ae_x set of macros expand a number of arguments with ',' between them
#define _ae_1(_call, x) _call(x)
#define _ae_2(_call, x, ...) _call(x), BOUT_EXPAND(_ae_1(_call, __VA_ARGS__))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variadic macro _ae_2 used; consider using a constexpr variadic template function [cppcoreguidelines-macro-usage]

#define _ae_2(_call, x, ...) _call(x), BOUT_EXPAND(_ae_1(_call, __VA_ARGS__))
        ^

/// _ae_x set of macros expand a number of arguments with ',' between them
#define _ae_1(_call, x) _call(x)
#define _ae_2(_call, x, ...) _call(x), BOUT_EXPAND(_ae_1(_call, __VA_ARGS__))
#define _ae_3(_call, x, ...) _call(x), BOUT_EXPAND(_ae_2(_call, __VA_ARGS__))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variadic macro _ae_3 used; consider using a constexpr variadic template function [cppcoreguidelines-macro-usage]

#define _ae_3(_call, x, ...) _call(x), BOUT_EXPAND(_ae_2(_call, __VA_ARGS__))
        ^

#define _ae_1(_call, x) _call(x)
#define _ae_2(_call, x, ...) _call(x), BOUT_EXPAND(_ae_1(_call, __VA_ARGS__))
#define _ae_3(_call, x, ...) _call(x), BOUT_EXPAND(_ae_2(_call, __VA_ARGS__))
#define _ae_4(_call, x, ...) _call(x), BOUT_EXPAND(_ae_3(_call, __VA_ARGS__))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variadic macro _ae_4 used; consider using a constexpr variadic template function [cppcoreguidelines-macro-usage]

#define _ae_4(_call, x, ...) _call(x), BOUT_EXPAND(_ae_3(_call, __VA_ARGS__))
        ^

#define _ae_5(_call, x, ...) _call(x), BOUT_EXPAND(_ae_4(_call, __VA_ARGS__))
#define _ae_6(_call, x, ...) _call(x), BOUT_EXPAND(_ae_5(_call, __VA_ARGS__))
#define _ae_7(_call, x, ...) _call(x), BOUT_EXPAND(_ae_6(_call, __VA_ARGS__))
#define _ae_8(_call, x, ...) _call(x), BOUT_EXPAND(_ae_7(_call, __VA_ARGS__))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variadic macro _ae_8 used; consider using a constexpr variadic template function [cppcoreguidelines-macro-usage]

#define _ae_8(_call, x, ...) _call(x), BOUT_EXPAND(_ae_7(_call, __VA_ARGS__))
        ^

#define _ae_6(_call, x, ...) _call(x), BOUT_EXPAND(_ae_5(_call, __VA_ARGS__))
#define _ae_7(_call, x, ...) _call(x), BOUT_EXPAND(_ae_6(_call, __VA_ARGS__))
#define _ae_8(_call, x, ...) _call(x), BOUT_EXPAND(_ae_7(_call, __VA_ARGS__))
#define _ae_9(_call, x, ...) _call(x), BOUT_EXPAND(_ae_8(_call, __VA_ARGS__))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variadic macro _ae_9 used; consider using a constexpr variadic template function [cppcoreguidelines-macro-usage]

#define _ae_9(_call, x, ...) _call(x), BOUT_EXPAND(_ae_8(_call, __VA_ARGS__))
        ^

#define _ae_7(_call, x, ...) _call(x), BOUT_EXPAND(_ae_6(_call, __VA_ARGS__))
#define _ae_8(_call, x, ...) _call(x), BOUT_EXPAND(_ae_7(_call, __VA_ARGS__))
#define _ae_9(_call, x, ...) _call(x), BOUT_EXPAND(_ae_8(_call, __VA_ARGS__))
#define _ae_10(_call, x, ...) _call(x), BOUT_EXPAND(_ae_9(_call, __VA_ARGS__))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variadic macro _ae_10 used; consider using a constexpr variadic template function [cppcoreguidelines-macro-usage]

#define _ae_10(_call, x, ...) _call(x), BOUT_EXPAND(_ae_9(_call, __VA_ARGS__))
        ^

/// expands to
///
/// test(a), test(b), test(c)
#define MACRO_FOR_EACH_ARG(arg, ...) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variadic macro MACRO_FOR_EACH_ARG used; consider using a constexpr variadic template function [cppcoreguidelines-macro-usage]

#define MACRO_FOR_EACH_ARG(arg, ...)                                      \
        ^

/// Note: Redundant ';' after closing brace should be ignored by compiler
/// Ignores any additional arguments
#define BOUT_FOR_RAJA(index, region, ...) \
BOUT_FOR(index, region)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: variadic macro BOUT_FOR_RAJA used; consider using a constexpr variadic template function [cppcoreguidelines-macro-usage]

#define BOUT_FOR_RAJA(index, region, ...) \
        ^

Previously copied into local scope, now in lambda capture list

[skip ci]
Explaining different ways to capture class member variables

[skip ci]
@bendudson bendudson merged commit cb9b3d9 into next-hypre-outerloop-cuda-merged Sep 21, 2021
@bendudson bendudson deleted the merged-bout-for-raja branch September 21, 2021 01:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants