Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement field::for_each capabilities #139

Merged
merged 11 commits into from
Jun 6, 2023
Merged

Conversation

wdeconinck
Copy link
Member

@wdeconinck wdeconinck commented May 16, 2023

Develop field::for_each functionality. For instance:

Field f1, f2, f3;
// ... assign f1, f2 ...
field::for_each_value(f1, f2, f3, [](const double& x1, const double& x2, double& x3) {
    x3 = x1 + x2;
});

We can also loop over Field::horizontal_dimension() only, hence extracting column-views.
Note that Field::horizontal_dimension() returns a std::vector<idx_t> to cater for multiple horizontal dimensions in a field which don't need to be following eachother (e.g. BlockStructuredColumns fields of dimensions {nblock,nlev,nproma} where nblock and nproma are the horizontal dimensions).

Field f1, f2, f3;
// ... assign f1, f2 ...
field::for_each_column(f1, f2, f3, [](array::View<const double,1>&& x1,
                                      array::View<const double,1>&& x2,
                                      array::View<double,1>&& x3
    const idx_t nlev = x1.size();
    for (idx_t jlev=0; jlev<nlev; ++jlev) {
        x3(jlev) = x1(jlev) + x2(jlev);
    }
});

In case there are extra dimensions (e.g. a "variable" dimension), then the column views could be multidimensional:

Field f1, f2, f3;
// ... assign f1, f2 ...
field::for_each_column(f1, f2, f3, [](array::View<const double,2>&& x1,
                                      array::View<const double,2>&& x2,
                                      array::View<double,2>&& x3) {
    const idx_t nlev = x1.shape(0);
    const idx_t nvar = x1.shape(1);
    for (idx_t jlev=0; jlev<nlev; ++jlev) {
        for (idx_t jvar=0; jvar<nvar; ++jvar) {
            x3(jlev,jvar) = x1(jlev,jar) + x2(jlev,jvar);
        }
    }
});

There are further functions that accept a masking function or masking field (such as a ghost field).
The ghost field is typically a one-dimensional field corresponding to the horizontal_dimension.
If there are multiple horizontal_dimensions, then either the ghost field can be multi-dimensional, or
a one-dimensional view of the horizontal_dimensions with indexing like (ni*j+i).

Field f1, f2, f3; 
Field ghost;
// ... assign f1, f2, ghost ...
field::for_each_value_masked(ghost, f1, f2, f3, [](const double& x1, const double& x2, double& x3) {
    x3 = x1 + x2;
});
Field f1, f2, f3;
Field ghost;
// ... assign f1, f2, ghost ...
field::for_each_column_masked(ghost, f1, f2, f3, [](array::View<const double,1>&& x1,
                                                    array::View<const double,1>&& x2,
                                                    array::View<double,1>&& x3
    const idx_t nlev = x1.size();
    for (idx_t jlev=0; jlev<nlev; ++jlev) {
        x3(jlev) = x1(jlev) + x2(jlev);
    }
});

If a masking function is supplied instead of a ghost field, and only the first field dimension is to be masked, then define it as:

auto mask = [](idx_t i, auto&&... args) {
    // mask depending on i only.
};

@wdeconinck wdeconinck self-assigned this May 16, 2023
@wdeconinck
Copy link
Member Author

Please review @odlomax @ytremolet

@codecov-commenter
Copy link

codecov-commenter commented May 16, 2023

Codecov Report

Patch coverage: 88.80% and project coverage change: +0.04 🎉

Comparison is base (9a2def8) 78.18% compared to head (a61d667) 78.22%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #139      +/-   ##
===========================================
+ Coverage    78.18%   78.22%   +0.04%     
===========================================
  Files          817      802      -15     
  Lines        59325    59645     +320     
===========================================
+ Hits         46385    46660     +275     
- Misses       12940    12985      +45     
Impacted Files Coverage Δ
src/atlas/field/detail/for_each.h 53.08% <53.08%> (ø)
src/atlas/field/for_each.h 79.51% <79.51%> (ø)
src/tests/field/test_field_foreach.cc 99.69% <99.69%> (ø)
src/atlas/array/helpers/ArrayForEach.h 96.29% <100.00%> (+3.11%) ⬆️

... and 155 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

constexpr auto dims = std::make_index_sequence<Rank>();
if constexpr (valid_value_function<Function,Value>())
{
// Surely this can be improved to be more generic w.r.t. num_fields
Copy link
Contributor

@odlomax odlomax May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely we can!

template <int FieldIdx, int NumFields, typename Value, int Rank, typename... Field>
auto make_view_tuple(std::tuple<Field...>&& fields) {

    if constexpr (FieldIdx < NumFields) {
        return std::tuple_cat(std::make_tuple(
            array::make_view<Value, Rank>(std::get<FieldIdx>(fields))),
            make_view_tuple<FieldIdx + 1, NumFields, Value, Rank>(std::move(fields)));
    } else {
        return std::make_tuple();
    }
}

template <typename Value, int Rank, typename Config,  typename Mask, typename... Field, typename Function>
void for_each_value_masked_rank(const Config& config, const Mask& mask, std::tuple<Field...>&& fields, const Function& function ) {
    constexpr auto num_fields = std::tuple_size_v<std::tuple<Field...>>;
    constexpr auto dims = std::make_index_sequence<Rank>();
    if constexpr (valid_value_function<Function,Value>())
    {
        // Surely this can be improved to be more generic w.r.t. num_fields
        // We have the technology!
        if constexpr (num_fields <= 4) {
            for_each_value_masked_view(dims, config, mask, function, make_view_tuple<0, num_fields, Value, Rank>(std::move(fields)));
            return;
        }
        ATLAS_THROW_EXCEPTION("Unsupported number of arguments in function");
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here I thought we'd seen the last of tuple_cats 😉
Thanks for your suggestion!

Copy link
Contributor

@odlomax odlomax May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's my new favourite thing!

I don't think we need to pass NumFields into the make_view_tuple function, we can just use the size of the fields tuple.

Also, if we want to be really clever, I think we can pass in Function as a template parameter and use your template kung fu (function_traits) to get the Value type from each of the functor args.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One problem with the function_traits is that we cannot use auto arguments in a "Function" lambda. This is related to runtime dispatch of fields based on ranks and data types, and it cannot infer this from auto.
So one has to be explicit w.r.t. data types and ranks in a view, and has to match what is encoded in the Field

Copy link
Contributor

@odlomax odlomax May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to wrap my head around this. You'd basically end up with thousands of compiled loop functions if you tried to dispatch every template combination. I interpreted your non-generic lambdas as trying to sneak in the template arguments through the back door ;-)


static constexpr std::size_t arity = sizeof...(Arguments);
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took me a while to get my head around this. I am in awe.

@DJDavies2
Copy link
Contributor

I am getting build failures when building against oops (JCSDA) with this:

In file included from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/ArrayView.h:18:0,
from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/functionspace/CubedSphereColumns.h:13,
from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/functionspace.h:16,
from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/oops/src/oops/base/GeometryData.h:15,
from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/oops/src/oops/base/Geometry.h:23,
from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/oops/src/oops/assimilation/ControlVariable.h:20,
from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/oops/src/oops/assimilation/ControlIncrement.h:21,
from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/oops/src/oops/assimilation/MinimizerUtils.h:16,
from /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/oops/src/oops/assimilation/MinimizerUtils.cc:8:
/home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/native/NativeArrayView.h:138:66: error: 'is_convertible_v' is not a member of 'std'
template <typename ValueTp, typename = std::enable_if_t<std::is_convertible_v<ValueTp*,value_type*>>>
^~~~~~~~~~~~~~~~
/home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/native/NativeArrayView.h:138:66: note: suggested alternative: 'is_convertible'
template <typename ValueTp, typename = std::enable_if_t<std::is_convertible_v<ValueTp*,value_type*>>>
^~~~~~~~~~~~~~~~
is_convertible
/home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/native/NativeArrayView.h:138:66: error: 'is_convertible_v' is not a member of 'std'
/home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/native/NativeArrayView.h:138:66: note: suggested alternative: 'is_convertible'
template <typename ValueTp, typename = std::enable_if_t<std::is_convertible_v<ValueTp*,value_type*>>>
^~~~~~~~~~~~~~~~
is_convertible
/home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/native/NativeArrayView.h:138:102: error: template argument 1 is invalid
template <typename ValueTp, typename = std::enable_if_t<std::is_convertible_v<ValueTp*,value_type*>>>
^
/home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/native/NativeArrayView.h:138:105: error: expected unqualified-id before '>' token
template <typename ValueTp, typename = std::enable_if_t<std::is_convertible_v<ValueTp*,value_type*>>>
^
gmake[2]: *** [oops/src/CMakeFiles/oops.dir/oops/generic/UnstructuredInterpolator.cc.o] Error 1
gmake[2]: *** [oops/src/CMakeFiles/oops.dir/oops/assimilation/MinimizerUtils.cc.o] Error 1

@wdeconinck
Copy link
Member Author

@DJDavies2 Thanks for verifying compilation. You have observed this in multiple PRs now, so could you see if it is triggered in develop branch as well? If so create a GitHub issue specifically for the develop branch. Please then mention the compiler, version, and compile line including include directories and flags.

I have added to develop branch the public propagation of the -std=c++17 flag to downstream projects, as I believe that's what is missing. A reintegration of this branch on develop should then hopefully fix this.

@wdeconinck
Copy link
Member Author

@ytremolet when you have time could you comment? Otherwise I'll just merge this in soon.

@DJDavies2
Copy link
Contributor

@DJDavies2 Thanks for verifying compilation. You have observed this in multiple PRs now, so could you see if it is triggered in develop branch as well? If so create a GitHub issue specifically for the develop branch. Please then mention the compiler, version, and compile line including include directories and flags.

I have added to develop branch the public propagation of the -std=c++17 flag to downstream projects, as I believe that's what is missing. A reintegration of this branch on develop should then hopefully fix this.

I have tested against head of develop and it seems to work now, there are been changes updating to std=c++17 in some downstream repositories so that has probably fixed this issue.

Copy link
Contributor

@ytremolet ytremolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I can review all the advanced C++. As others I am in awe...
So I mostly looked at the test and how this would be used. It looks great, I think we are going to get good use of this.

Copy link
Contributor

@ytremolet ytremolet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should a test with a non-default execution policy be added? I'd like to see that for the for_each_column, for example the tests starting at line 425 and 483.

@wdeconinck
Copy link
Member Author

wdeconinck commented May 31, 2023

I have added performance tests with OMP_NUM_THREADS=4 with following output:

timing with for loops seq  = 0.00333232  (triple nested for-loop, sequential)
timing with execution::seq = 0.0790008   (field::for_each_value, sequential)
timing with for loops par  = 0.00102567  (triple nested for-loop with OpenMP for for outermost loop)
timing with execution::par = 0.0215601   (field::for_each_value, parallel)

The difference between normal for-loops and the field::for_each is pretty dramatic (field::for_each is factor 20-24 slower)!
I don't know why with the ArrayForEach tests we had no difference in result, but we do have a lot of difference here.

This should foremost not stop using this capability.
The benefit is that optimisations can and WILL be applied in following developments.

@DJDavies2
Copy link
Contributor

The new test fails when I run it:

Running case 2: test field::for_each_value_masked; horizontal_dimension {0,2} mask_rank1 ...
[0] �[31mTest "test field::for_each_value_masked; horizontal_dimension {0,2} mask_rank1" failed with unhandled eckit::Exception: Assertion failed: (std::is_invocable_v<Mask,MaskArgs...>) in apply, line 265 of /home/d03/frwd/cylc-run/atlas-139/share/mo-bundle/atlas/src/atlas/array/helpers/ArrayForEach.h @ �[0m
[0] Stack trace: backtrace [1] stack has 38 addresses
[0] (/home/d03/frwd/cylc-run/atlas-139/share/build-mo-cray_gnu_debug/lib/libeckit.so+eckit::BackTrace::dumpabi:cxx11)0x91
[0] (/home/d03/frwd/cylc-run/atlas-139/share/build-mo-cray_gnu_debug/lib/libeckit.so+eckit::Exception::Exception())0xb6
[0] (/home/d03/frwd/cylc-run/atlas-139/share/build-mo-cray_gnu_debug/lib/libeckit.so+eckit::AssertionFailed::AssertionFailed(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, eckit::CodeLocation const&))0x41
[0] (/home/d03/frwd/cylc-run/atlas-139/share/build-mo-cray_gnu_debug/lib/libeckit.so+eckit::handle_assert(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, eckit::CodeLocation const&))0x2a5
[0] (/home/d03/frwd/cylc-run/atlas-139/share/build-mo-cray_gnu_debug/lib/libatlas.so.0.33+atlas::throw_AssertionFailed(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, eckit::CodeLocation const&))0x53
[0] (/home/d03/frwd/cylc-run/atlas-139/share/build-mo-cray_gnu_debug/atlas/src/tests/field/atlas_test_field_foreach+atlas::detail::Assert(bool, char const*, char const*, int, char const*))0xa1

and similar for rank2.

…each

* origin/develop:
  Workaround compiler bug in nvhpc-22.11-release build
  Update GHA nvidia-21.9 to nvidia-22.11
  Avoid and suppress some compiler warnings with nvhpc
  Fix bug where DelaunayMeshGenerator with 1 partition was not setting the grid in the mesh (#143)
  Use Eigen 3.4
  Disable floating point signals for tests on nvidia
  Add nvidia compiler specific HPC build config
@wdeconinck
Copy link
Member Author

@DJDavies2 Is this with gnu 7.3? Which build-type? (You could print output of atlas --info)

This failure should not have anything to do with the new test. Not sure how this occurs. It seems to pass on so many other compilers and build-types, including gnu 7.5.

@odlomax could you perhaps try to investigate or debug this on this particular configuration?

@DJDavies2
Copy link
Contributor

Yes, this is gnu 7.3.

@odlomax
Copy link
Contributor

odlomax commented Jun 2, 2023

@DJDavies2 Is this with gnu 7.3? Which build-type? (You could print output of atlas --info)

This failure should not have anything to do with the new test. Not sure how this occurs. It seems to pass on so many other compilers and build-types, including gnu 7.5.

@odlomax could you perhaps try to investigate or debug this on this particular configuration?

I'm just having a look. Will get back to you later today.

@odlomax
Copy link
Contributor

odlomax commented Jun 2, 2023

@DJDavies2 Is this with gnu 7.3? Which build-type? (You could print output of atlas --info)
This failure should not have anything to do with the new test. Not sure how this occurs. It seems to pass on so many other compilers and build-types, including gnu 7.5.
@odlomax could you perhaps try to investigate or debug this on this particular configuration?

I'm just having a look. Will get back to you later today.

I might have found a bug. If I can confirm it, shall I PR a solution into this branch?

@odlomax
Copy link
Contributor

odlomax commented Jun 5, 2023

I think I've fixed it #145

tl;dr: GCC 7.3 is old and rubbish.

@wdeconinck
Copy link
Member Author

With new changes from d97f973 the field::for_each_value timings did not seem to improve:

timing with for loops seq  = 0.0035155
timing with for loops par  = 0.00106079
timing with execution::seq = 0.092391
timing with execution::par = 0.0240693

@odlomax
Copy link
Contributor

odlomax commented Jun 5, 2023

With new changes from d97f973 the field::for_each_value timings did not seem to improve:

timing with for loops seq  = 0.0035155
timing with for loops par  = 0.00106079
timing with execution::seq = 0.092391
timing with execution::par = 0.0240693

Strange. I got the following when I ran the test in release:

37: timing with for loops seq  = 0.000203149
37: timing with for loops par  = 5.73367e-05
37: timing with execution::seq = 0.000196719
37: timing with execution::par = 5.94674e-05

Admittedly, there's a fair bit of variation between runs.

@wdeconinck
Copy link
Member Author

Apologies @odlomax I indeed do recover the raw for loop performance when I have optimisations on. I had hacked my compile flags in my build and forgot to reset them.

@wdeconinck
Copy link
Member Author

@DJDavies2 Please let me know if you still have issues with updated branch, and thanks once more for spotting this!

@DJDavies2
Copy link
Contributor

The test works now, thanks.

@odlomax
Copy link
Contributor

odlomax commented Jun 5, 2023

Great success!

@wdeconinck wdeconinck merged commit d0babe3 into develop Jun 6, 2023
65 of 66 checks passed
@wdeconinck
Copy link
Member Author

Finally merged. Thanks all your help!

wdeconinck added a commit to JCSDA-internal/atlas that referenced this pull request Jun 6, 2023
* develop: (56 commits)
  Implement field::for_each capabilities (ecmwf#139)
  Avoid harmless FE_INVALID with nvhpc build
  Avoid harmless FE_INVALID with nvhpc build
  Remove ATLAS_FPE=0 environment variable as not needed anymore now
  Avoid harmless FE_DIVBYZERO with nvhpc build
  Optimize modules and dependencies for caching
  Add HPC build options matrix
  Workaround compiler bug in nvhpc-22.11-release build
  Update GHA nvidia-21.9 to nvidia-22.11
  Avoid and suppress some compiler warnings with nvhpc
  Fix bug where DelaunayMeshGenerator with 1 partition was not setting the grid in the mesh (ecmwf#143)
  Use Eigen 3.4
  Disable floating point signals for tests on nvidia
  Add nvidia compiler specific HPC build config
  Add function to build mesh from imported connectivity data (ecmwf#135)
  Disable GHA "linux gnu-12" OpenMP for CXX as "omp.h" header is not found :(
  Add gnu-12 ci to github actions (github-hosted runners)
  Add gnu-7 ci to github actions with github-hosted runners (ecmwf#136)
  Setup horizontal_dimensions() for BlockStructuredColumns fields
  Add function Field::horizontal_dimension() -> std::vector<idx_t>
  ...
@wdeconinck wdeconinck deleted the feature/field_for_each branch September 19, 2023 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants