-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change RAJA/Umpire/CUDA defines #2422
Change RAJA/Umpire/CUDA defines #2422
Conversation
Change from defined / not-defined, to defined as 1 or 0 (CMake #cmakedefine01). Added constexpr variables to cast the value to bool. Not tested with configure/make.
BOUT_HAS_RAJA in particular now always defined. Should be 1 or 0 so can use `#if`
RAJA kernels can't be used in classes with private or protected members. This seems to be a restriction coming from CUDA. Note added to the manual.
All variables are captured by value in RAJA loops, and are const. The idea is that pointers are not modified, only the data those pointers point to. ddt(FieldAccessor) therefore needs to take the FieldAccessor as const, but return a non-const BoutRealArray reference. Somewhat gross const_cast used here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are failing for a couple of reasons:
configure.ac
/autoconf_build_defines.hxx.in
also need changing to always define these macrosarray.hxx
still has some#ifdef
s instead of#if
You could (temporarily) add these new macros to bin/bout-v5-macro-upgrader.py
and use that to fix pretty much all the locations at once
Not supporting RAJA/CUDA/Umpire/Caliper through configure, so just define these as always off.
For RAJA, CUDA, UMPIRE and CALIPER, these are always defined now. Still getting a strange error "Formal parameter space overflowed" when compiling elm-pb-outerloop, which is not present in Eric's branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
@@ -11,7 +11,7 @@ | |||
* Based on model code, Yining Qin update GPU RAJA code since 1117-2020 | |||
*******************************************************************************/ | |||
|
|||
#define RUN_WITH_RAJA false // Use RAJA loops? | |||
#define RUN_WITH_RAJA true // Use RAJA loops? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: macro RUN_WITH_RAJA
used to declare a constant; consider using a constexpr
constant [cppcoreguidelines-macro-usage]
#define RUN_WITH_RAJA true // Use RAJA loops?
^
Unsure if previous syntax is working correctly
Adding these components is needed for Div_par, but makes FieldAccessor too large. This results in compilation errors when compiling elm-pb-outerloop on Cori "Formal parameter space overflowed". A way to reduce the size of FieldAccessor objects seems to be needed.
Can use bout_add_example, but also needs to specify that the file is CUDA language, so that it is passed to nvcc.
- Class member variables can't be accessed on the device, because `this` is captured by value, and is not a valid pointer on the GPU device. -> These must be copied to local variables - The loop indices must also be copied to an array which can be accessed on the GPU device. This is done here by creating an Array, where the backing data is allocated using Umpire.
not used; came from typo in ddt(Jpar) equation.
Can rely on BOUT++ configuration without having to set flags.
Capturing `this` pointer leads to surprising results and hard-to-debug memory errors on GPU machines without ATS.
Temporarily removing Div_par because adding the B field components to FieldAccessor makes them too large for CUDA kernels.
Work on elm-pb-outerloop for RAJA
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
// local scope | ||
// | ||
|
||
const auto _delta_i = delta_i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: Value stored to _delta_i
during its initialization is never read [clang-analyzer-deadcode.DeadStores]
const auto _delta_i = delta_i;
^
examples/elm-pb-outerloop/elm_pb_outerloop.cxx:1596:16: note: Value stored to '_delta_i' during its initialization is never read
|
||
const auto _delta_i = delta_i; | ||
const auto _hyperresist = hyperresist; | ||
const auto _relax_j_tconst = relax_j_tconst; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: Value stored to _relax_j_tconst
during its initialization is never read [clang-analyzer-deadcode.DeadStores]
const auto _relax_j_tconst = relax_j_tconst;
^
examples/elm-pb-outerloop/elm_pb_outerloop.cxx:1598:16: note: Value stored to '_relax_j_tconst' during its initialization is never read
const auto _delta_i = delta_i; | ||
const auto _hyperresist = hyperresist; | ||
const auto _relax_j_tconst = relax_j_tconst; | ||
const auto _dnorm = dnorm; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: Value stored to _dnorm
during its initialization is never read [clang-analyzer-deadcode.DeadStores]
const auto _dnorm = dnorm;
^
examples/elm-pb-outerloop/elm_pb_outerloop.cxx:1599:16: note: Value stored to '_dnorm' during its initialization is never read
const auto _hyperresist = hyperresist; | ||
const auto _relax_j_tconst = relax_j_tconst; | ||
const auto _dnorm = dnorm; | ||
const auto _ehyperviscos = ehyperviscos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: Value stored to _ehyperviscos
during its initialization is never read [clang-analyzer-deadcode.DeadStores]
const auto _ehyperviscos = ehyperviscos;
^
examples/elm-pb-outerloop/elm_pb_outerloop.cxx:1600:16: note: Value stored to '_ehyperviscos' during its initialization is never read
BOUT_HOST_DEVICE inline BoutRealArray& ddt(const FieldAccessor<location, FieldType> &fa) { | ||
// Note: FieldAccessor captured by value is const in RAJA kernel. | ||
// Need to cast to non-const so that ddt() data can be assigned to | ||
return const_cast<BoutRealArray&>(fa.ddt); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: do not use const_cast [cppcoreguidelines-pro-type-const-cast]
return const_cast<BoutRealArray&>(fa.ddt);
^
Changes
BOUT_HAS_RAJA
,BOUT_HAS_UMPIRE
andBOUT_USE_CUDA
to always be defined, but either 1 or 0. The effect is that#if
should be used rather than#ifdef
. This makes these switches the same as other similar switches in BOUT++.