Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change RAJA/Umpire/CUDA defines #2422

Merged
merged 16 commits into from
Sep 9, 2021

Conversation

bendudson
Copy link
Contributor

Changes BOUT_HAS_RAJA, BOUT_HAS_UMPIRE and BOUT_USE_CUDA to always be defined, but either 1 or 0. The effect is that #if should be used rather than #ifdef. This makes these switches the same as other similar switches in BOUT++.

Change from defined / not-defined, to defined as 1 or 0
(CMake #cmakedefine01). Added constexpr variables to
cast the value to bool.

Not tested with configure/make.
BOUT_HAS_RAJA in particular now always defined.
Should be 1 or 0 so can use `#if`
RAJA kernels can't be used in classes with private or protected
members. This seems to be a restriction coming from CUDA.

Note added to the manual.
All variables are captured by value in RAJA loops, and are const.
The idea is that pointers are not modified, only the data those
pointers point to.

ddt(FieldAccessor) therefore needs to take the FieldAccessor as const,
but return a non-const BoutRealArray reference. Somewhat gross
const_cast used here.
Copy link
Member

@ZedThree ZedThree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are failing for a couple of reasons:

  • configure.ac/autoconf_build_defines.hxx.in also need changing to always define these macros
  • array.hxx still has some #ifdefs instead of #if

You could (temporarily) add these new macros to bin/bout-v5-macro-upgrader.py and use that to fix pretty much all the locations at once

Not supporting RAJA/CUDA/Umpire/Caliper through configure,
so just define these as always off.
For RAJA, CUDA, UMPIRE and CALIPER, these are always defined now.

Still getting a strange error "Formal parameter space overflowed"
when compiling elm-pb-outerloop, which is not present in Eric's branch.
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -11,7 +11,7 @@
* Based on model code, Yining Qin update GPU RAJA code since 1117-2020
*******************************************************************************/

#define RUN_WITH_RAJA false // Use RAJA loops?
#define RUN_WITH_RAJA true // Use RAJA loops?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: macro RUN_WITH_RAJA used to declare a constant; consider using a constexpr constant [cppcoreguidelines-macro-usage]

#define RUN_WITH_RAJA true   // Use RAJA loops?
        ^

bendudson and others added 10 commits September 8, 2021 11:35
Unsure if previous syntax is working correctly
Adding these components is needed for Div_par, but makes
FieldAccessor too large. This results in compilation errors
when compiling elm-pb-outerloop on Cori
"Formal parameter space overflowed".

A way to reduce the size of FieldAccessor objects seems to be
needed.
Can use bout_add_example, but also needs to specify
that the file is CUDA language, so that it is passed to nvcc.
- Class member variables can't be accessed on the device, because `this`
  is captured by value, and is not a valid pointer on the GPU device.
  -> These must be copied to local variables

- The loop indices must also be copied to an array which can be accessed
  on the GPU device. This is done here by creating an Array, where the
  backing data is allocated using Umpire.
not used; came from typo in ddt(Jpar) equation.
Can rely on BOUT++ configuration without having to set flags.
Capturing `this` pointer leads to surprising results
and hard-to-debug memory errors on GPU machines without ATS.
Temporarily removing Div_par because adding the B field
components to FieldAccessor makes them too large for CUDA kernels.
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

// local scope
//

const auto _delta_i = delta_i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: Value stored to _delta_i during its initialization is never read [clang-analyzer-deadcode.DeadStores]

const auto _delta_i = delta_i;
               ^
examples/elm-pb-outerloop/elm_pb_outerloop.cxx:1596:16: note: Value stored to '_delta_i' during its initialization is never read


const auto _delta_i = delta_i;
const auto _hyperresist = hyperresist;
const auto _relax_j_tconst = relax_j_tconst;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: Value stored to _relax_j_tconst during its initialization is never read [clang-analyzer-deadcode.DeadStores]

const auto _relax_j_tconst = relax_j_tconst;
               ^
examples/elm-pb-outerloop/elm_pb_outerloop.cxx:1598:16: note: Value stored to '_relax_j_tconst' during its initialization is never read

const auto _delta_i = delta_i;
const auto _hyperresist = hyperresist;
const auto _relax_j_tconst = relax_j_tconst;
const auto _dnorm = dnorm;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: Value stored to _dnorm during its initialization is never read [clang-analyzer-deadcode.DeadStores]

const auto _dnorm = dnorm;
               ^
examples/elm-pb-outerloop/elm_pb_outerloop.cxx:1599:16: note: Value stored to '_dnorm' during its initialization is never read

const auto _hyperresist = hyperresist;
const auto _relax_j_tconst = relax_j_tconst;
const auto _dnorm = dnorm;
const auto _ehyperviscos = ehyperviscos;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: Value stored to _ehyperviscos during its initialization is never read [clang-analyzer-deadcode.DeadStores]

const auto _ehyperviscos = ehyperviscos;
               ^
examples/elm-pb-outerloop/elm_pb_outerloop.cxx:1600:16: note: Value stored to '_ehyperviscos' during its initialization is never read

BOUT_HOST_DEVICE inline BoutRealArray& ddt(const FieldAccessor<location, FieldType> &fa) {
// Note: FieldAccessor captured by value is const in RAJA kernel.
// Need to cast to non-const so that ddt() data can be assigned to
return const_cast<BoutRealArray&>(fa.ddt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: do not use const_cast [cppcoreguidelines-pro-type-const-cast]

return const_cast<BoutRealArray&>(fa.ddt);
       ^

@bendudson bendudson merged commit 7968eab into next-hypre-outerloop-cuda-merged Sep 9, 2021
@bendudson bendudson deleted the merged-cmake-defines branch September 9, 2021 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants