-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement field::for_each capabilities #139
Conversation
Please review @odlomax @ytremolet |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## develop #139 +/- ##
===========================================
+ Coverage 78.18% 78.22% +0.04%
===========================================
Files 817 802 -15
Lines 59325 59645 +320
===========================================
+ Hits 46385 46660 +275
- Misses 12940 12985 +45
☔ View full report in Codecov by Sentry. |
3260b3b
to
48e31b4
Compare
src/atlas/field/detail/for_each.h
Outdated
constexpr auto dims = std::make_index_sequence<Rank>(); | ||
if constexpr (valid_value_function<Function,Value>()) | ||
{ | ||
// Surely this can be improved to be more generic w.r.t. num_fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely we can!
template <int FieldIdx, int NumFields, typename Value, int Rank, typename... Field>
auto make_view_tuple(std::tuple<Field...>&& fields) {
if constexpr (FieldIdx < NumFields) {
return std::tuple_cat(std::make_tuple(
array::make_view<Value, Rank>(std::get<FieldIdx>(fields))),
make_view_tuple<FieldIdx + 1, NumFields, Value, Rank>(std::move(fields)));
} else {
return std::make_tuple();
}
}
template <typename Value, int Rank, typename Config, typename Mask, typename... Field, typename Function>
void for_each_value_masked_rank(const Config& config, const Mask& mask, std::tuple<Field...>&& fields, const Function& function ) {
constexpr auto num_fields = std::tuple_size_v<std::tuple<Field...>>;
constexpr auto dims = std::make_index_sequence<Rank>();
if constexpr (valid_value_function<Function,Value>())
{
// Surely this can be improved to be more generic w.r.t. num_fields
// We have the technology!
if constexpr (num_fields <= 4) {
for_each_value_masked_view(dims, config, mask, function, make_view_tuple<0, num_fields, Value, Rank>(std::move(fields)));
return;
}
ATLAS_THROW_EXCEPTION("Unsupported number of arguments in function");
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here I thought we'd seen the last of tuple_cats 😉
Thanks for your suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's my new favourite thing!
I don't think we need to pass NumFields
into the make_view_tuple
function, we can just use the size of the fields tuple.
Also, if we want to be really clever, I think we can pass in Function
as a template parameter and use your template kung fu (function_traits
) to get the Value
type from each of the functor args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem with the function_traits is that we cannot use auto
arguments in a "Function" lambda. This is related to runtime dispatch of fields based on ranks and data types, and it cannot infer this from auto
.
So one has to be explicit w.r.t. data types and ranks in a view, and has to match what is encoded in the Field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to wrap my head around this. You'd basically end up with thousands of compiled loop functions if you tried to dispatch every template combination. I interpreted your non-generic lambdas as trying to sneak in the template arguments through the back door ;-)
|
||
static constexpr std::size_t arity = sizeof...(Arguments); | ||
}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took me a while to get my head around this. I am in awe.
I am getting build failures when building against oops (JCSDA) with this: In file included from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/ArrayView.h:18:0, |
@DJDavies2 Thanks for verifying compilation. You have observed this in multiple PRs now, so could you see if it is triggered in develop branch as well? If so create a GitHub issue specifically for the develop branch. Please then mention the compiler, version, and compile line including include directories and flags. I have added to develop branch the public propagation of the -std=c++17 flag to downstream projects, as I believe that's what is missing. A reintegration of this branch on develop should then hopefully fix this. |
48e31b4
to
9e19a7a
Compare
@ytremolet when you have time could you comment? Otherwise I'll just merge this in soon. |
I have tested against head of develop and it seems to work now, there are been changes updating to std=c++17 in some downstream repositories so that has probably fixed this issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I can review all the advanced C++. As others I am in awe...
So I mostly looked at the test and how this would be used. It looks great, I think we are going to get good use of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should a test with a non-default execution policy be added? I'd like to see that for the for_each_column
, for example the tests starting at line 425 and 483.
I have added performance tests with OMP_NUM_THREADS=4 with following output:
The difference between normal for-loops and the field::for_each is pretty dramatic (field::for_each is factor 20-24 slower)! This should foremost not stop using this capability. |
The new test fails when I run it: Running case 2: test field::for_each_value_masked; horizontal_dimension {0,2} mask_rank1 ... and similar for rank2. |
…each * origin/develop: Workaround compiler bug in nvhpc-22.11-release build Update GHA nvidia-21.9 to nvidia-22.11 Avoid and suppress some compiler warnings with nvhpc Fix bug where DelaunayMeshGenerator with 1 partition was not setting the grid in the mesh (#143) Use Eigen 3.4 Disable floating point signals for tests on nvidia Add nvidia compiler specific HPC build config
@DJDavies2 Is this with gnu 7.3? Which build-type? (You could print output of This failure should not have anything to do with the new test. Not sure how this occurs. It seems to pass on so many other compilers and build-types, including gnu 7.5. @odlomax could you perhaps try to investigate or debug this on this particular configuration? |
Yes, this is gnu 7.3. |
I'm just having a look. Will get back to you later today. |
I might have found a bug. If I can confirm it, shall I PR a solution into this branch? |
I think I've fixed it #145 tl;dr: GCC 7.3 is old and rubbish. |
With new changes from d97f973 the field::for_each_value timings did not seem to improve:
|
Strange. I got the following when I ran the test in release:
Admittedly, there's a fair bit of variation between runs. |
Apologies @odlomax I indeed do recover the raw for loop performance when I have optimisations on. I had hacked my compile flags in my build and forgot to reset them. |
@DJDavies2 Please let me know if you still have issues with updated branch, and thanks once more for spotting this! |
The test works now, thanks. |
Great success! |
Finally merged. Thanks all your help! |
* develop: (56 commits) Implement field::for_each capabilities (ecmwf#139) Avoid harmless FE_INVALID with nvhpc build Avoid harmless FE_INVALID with nvhpc build Remove ATLAS_FPE=0 environment variable as not needed anymore now Avoid harmless FE_DIVBYZERO with nvhpc build Optimize modules and dependencies for caching Add HPC build options matrix Workaround compiler bug in nvhpc-22.11-release build Update GHA nvidia-21.9 to nvidia-22.11 Avoid and suppress some compiler warnings with nvhpc Fix bug where DelaunayMeshGenerator with 1 partition was not setting the grid in the mesh (ecmwf#143) Use Eigen 3.4 Disable floating point signals for tests on nvidia Add nvidia compiler specific HPC build config Add function to build mesh from imported connectivity data (ecmwf#135) Disable GHA "linux gnu-12" OpenMP for CXX as "omp.h" header is not found :( Add gnu-12 ci to github actions (github-hosted runners) Add gnu-7 ci to github actions with github-hosted runners (ecmwf#136) Setup horizontal_dimensions() for BlockStructuredColumns fields Add function Field::horizontal_dimension() -> std::vector<idx_t> ...
Develop field::for_each functionality. For instance:
We can also loop over
Field::horizontal_dimension()
only, hence extracting column-views.Note that
Field::horizontal_dimension()
returns astd::vector<idx_t>
to cater for multiple horizontal dimensions in a field which don't need to be following eachother (e.g. BlockStructuredColumns fields of dimensions{nblock,nlev,nproma}
wherenblock
andnproma
are the horizontal dimensions).In case there are extra dimensions (e.g. a "variable" dimension), then the column views could be multidimensional:
There are further functions that accept a masking function or masking field (such as a ghost field).
The ghost field is typically a one-dimensional field corresponding to the horizontal_dimension.
If there are multiple horizontal_dimensions, then either the ghost field can be multi-dimensional, or
a one-dimensional view of the horizontal_dimensions with indexing like (ni*j+i).
If a masking function is supplied instead of a ghost field, and only the first field dimension is to be masked, then define it as: