-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge Array with/without Umpire #2427
Conversation
When Umpire and CUDA are enabled, needs to be able to run on GPU device (at least limited functionality). Would like to keep memory pool feature, to minimise allocation overhead.
Some compilers (GCC 11.1.0) don't seem to need it; some do (Github actions environment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
include/bout/array.hxx
Outdated
iterator<T> begin() const { return data; } | ||
iterator<T> end() const { return data + len; } | ||
int size() const { return len; } | ||
void operator=(ArrayData<T>& in) { std::copy(std::begin(in), std::end(in), begin()); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: operator=() should return ArrayData&
[misc-unconventional-assign-operator]
include/bout/array.hxx
Outdated
iterator<T> begin() const { return data; } | ||
iterator<T> end() const { return data + len; } | ||
int size() const { return len; } | ||
void operator=(ArrayData<T>& in) { std::copy(std::begin(in), std::end(in), begin()); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: operator=() should take ArrayData const&
, ArrayData&&
or ArrayData
[misc-unconventional-assign-operator]
include/bout/array.hxx
Outdated
data = new T[len]; | ||
counter = new int; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: assigning newly created gsl::owner<>
to non-owner int *
[cppcoreguidelines-owning-memory]
counter = new int;
^
include/bout/array.hxx
Outdated
#endif | ||
} | ||
|
||
BOUT_HOST_DEVICE ArrayData(const ArrayData& other) { | ||
/// Move. No need to update counter | ||
BOUT_HOST_DEVICE SharedArrayData(SharedArrayData&& other) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: move constructors should be marked noexcept [performance-noexcept-move-constructor]
BOUT_HOST_DEVICE SharedArrayData(SharedArrayData&& other) {
^
noexcept
include/bout/array.hxx
Outdated
|
||
BOUT_HOST_DEVICE inline ArrayData<T>& operator=(const ArrayData<T>& in) { | ||
/// Assign, deep copying the underlying data | ||
BOUT_HOST_DEVICE inline SharedArrayData<T>& operator=(const SharedArrayData<T>& in) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: operator=() does not handle self-assignment properly [bugprone-unhandled-self-assignment]
BOUT_HOST_DEVICE inline SharedArrayData<T>& operator=(const SharedArrayData<T>& in) {
^
include/bout/array.hxx
Outdated
/// Return the number of references to this data | ||
/// Name chosen to emulate std::shared_ptr<ArrayData> | ||
int use_count() { | ||
if (!counter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: implicit conversion int *
-> bool [readability-implicit-bool-conversion]
if (!counter) {
~^
== nullptr
include/bout/array.hxx
Outdated
#else | ||
template <typename T, typename Backing = ArrayData<T>, typename CreationPolicy = CreateSharedPointer> | ||
#endif | ||
class Array : private CreationPolicy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: class Array
defines a non-default destructor, a copy constructor, a copy assignment operator and a move constructor but does not define a move assignment operator [cppcoreguidelines-special-member-functions]
class Array : private CreationPolicy {
^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
include/bout/array.hxx
Outdated
iterator<T> begin() const { return data; } | ||
iterator<T> end() const { return data + len; } | ||
int size() const { return len; } | ||
ArrayData<T>& operator=(const ArrayData<T>& in) { std::copy(std::begin(in), std::end(in), begin()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: operator=() does not handle self-assignment properly [bugprone-unhandled-self-assignment]
ArrayData<T>& operator=(const ArrayData<T>& in) { std::copy(std::begin(in), std::end(in), begin());
^
Previously void, now ArrayData<T>& Recommended by Clang Tidy
CMAKE_BUILD_TYPE "DEBUG" is much slower (order or orders of magnitude), so I would prefer that this was not the default out of the box.
Clang-tidy
Ambiguous overload with assignment operator
Compiles with nvcc on Cori, but segfault at runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
iterator<T> begin() const { return data; } | ||
iterator<T> end() const { return data + len; } | ||
int size() const { return len; } | ||
ArrayData<T>& operator=(const ArrayData<T>& in) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: operator=() does not handle self-assignment properly [bugprone-unhandled-self-assignment]
ArrayData<T>& operator=(const ArrayData<T>& in) {
^
include/bout/array.hxx
Outdated
|
||
BOUT_HOST_DEVICE inline T& operator[](int ind) { return data[ind]; }; | ||
BOUT_HOST_DEVICE inline const T& operator[](int ind) const { return data[ind]; }; | ||
T& operator[](int ind) { return data[ind]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: do not use pointer arithmetic [cppcoreguidelines-pro-bounds-pointer-arithmetic]
T& operator[](int ind) { return data[ind]; }
^
By passing the Umpire allocator to `std::allocate_shared`, Umpire should be used for all internal `shared_ptr` storage. This should hopefully enable `std::shared_ptr` to be used on both CPU and GPU builds. Previous segfaults were due to conflating the data assignment operator with pointer assignment in the `SharedArrayData` class. This commit deletes that class and always uses `std::shared_ptr`. Includes some documentation, links.
Setting CMAKE_BUILD_TYPE here doesn't seem to affect the default build type.
This is here as a warning to others. Implemented custom allocator which calls Umpire 1. There is umpire::TypedAllocator which does this already 2. It doesn't work; using an Array in a RAJA loop fails. I suspect the copy constructor but am not certain.
Umpire already includes a C++ Allocator which can be passed to STL containers. Unfortunately there seem to be many gotchas (see https://umpire.readthedocs.io/en/develop/sphinx/tutorial/typed_allocators.html) Gives same error (unspecified launch failure) as previous allocator. [skip ci]
Ensuring that only a raw pointer which has been allocated with Umpire is passed into the RAJA lambda function.
These Arrays can't be passed into RAJA loops. Instead the raw data pointer must be extracted before the loop. This enables memory pools, shared pointers etc. to be used in CPU-only code.
Possible future improvements, for later/when have time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
BOUT_HOST_DEVICE inline T& operator[](int ind) { return data[ind]; }; | ||
BOUT_HOST_DEVICE inline const T& operator[](int ind) const { return data[ind]; }; | ||
|
||
inline T& operator[](int ind) { return data[ind]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: do not use pointer arithmetic [cppcoreguidelines-pro-bounds-pointer-arithmetic]
inline T& operator[](int ind) { return data[ind]; }
^
BOUT_HOST_DEVICE inline const T& operator[](int ind) const { return data[ind]; }; | ||
|
||
inline T& operator[](int ind) { return data[ind]; } | ||
inline const T& operator[](int ind) const { return data[ind]; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warning: do not use pointer arithmetic [cppcoreguidelines-pro-bounds-pointer-arithmetic]
inline const T& operator[](int ind) const { return data[ind]; }
^
Now tentatively declaring victory: This version of
|
When Umpire and CUDA are enabled, needs to be able to run on GPU device (at least limited functionality).
Would like to keep memory pool feature, to minimise allocation overhead.
Working on possible performance issues.
elm-pb
Comparing
next
,next-hypre-outerloop-cuda-merged
and this branchmerged-array
, with default input from this branch,nout=1
andtimestep=0.2
next
2.000e-01 19 5.44e+02 99.6 0.3 0.0 0.0 0.1
Run time : 9 m 37 s
next-hypre-outerloop-cuda-merged
2.000e-01 19 3.41e+02 99.4 0.3 0.0 0.0 0.2
Run time : 6 m 1 s
merged-array
2.000e-01 19 5.51e+02 99.6 0.3 0.0 0.0 0.1
Run time : 9 m 37 s
With CMAKE_BUILD_TYPE=RELEASE:
2.000e-01 19 1.07e+01 96.2 1.7 0.1 0.4 1.6
Run time : 11 s
[elm-pb-outerloop gives
2.000e-01 19 7.14e+00 94.4 2.4 0.1 0.7 2.5
Run time : 8 ]
Note: elm-pb changed significantly between
master
andnext
, so can't compare directly. Instead useblob2d
.blob2d
master
2.000e+01 52 1.64e+00 61.6 17.4 0.1 1.2 19.7
Run time : 2 s
merged-array
2.000e+01 52 1.60e+02 97.1 2.1 0.0 0.0 0.8
Run time : 2 m 43 s
(100x slower??)
Both branches using check level 2 and
-g
flag.master
compiling with-O2
.With -DCMAKE_BUILD_TYPE=RELEASE (adds -O3)
2.000e+01 52 8.69e+00 91.4 4.2 0.0 0.3 4.0
Run time : 9 s
** Make this the CMAKE_BUILD_TYPE default? **
Same result when
array.hxx
is copied frommaster
into themerged-array
branch.=> Slowdown not related to changes to
Array
.** Why slowdown in
next
? **