Added OpenMP to the cubed-sphere interpolator and the matching cubed sphere partitioner. by odlomax · Pull Request #293 · ecmwf/atlas

odlomax · 2025-06-06T10:22:23Z

This adds some OpenMP that was skipped in the initial implementation. Similar in intent to #292

I'd like to eventually retire this interpolator in favour of finite-element interpolation, but we're not quite ready yet.

I've also removed a nasty metadata code smell which was needed by the old cubed-sphere wind interpolation scheme. I've accordingly removed the test that used the metadata.

wdeconinck

Thanks @odlomax this is a good idea. I have a few suggestions below.

Also I'm wondering why the complexity of the variant and visitor. You can directly insert into an oversized triplet vector, like done in #292 . The allocated memory is anyway the same because the variant takes the size of the largest type. Then you also avoid the extra visitor and copy into a new vector of triplets. The eckit::SparseMatrix constructor skips over empty triplets during construction.

wdeconinck · 2025-06-06T14:15:47Z

src/atlas/grid/detail/partitioner/MatchingMeshPartitionerCubedSphere.cc

    // Loop over grid and set partioning[].
    auto lonlatIt = grid.lonlat().begin();
-    for (gidx_t i = 0; i < grid.size(); ++i) {
+    atlas_omp_parallel_for(gidx_t i = 0; i < grid.size(); ++i) {


This will incur a race condition on the lonlatIt iterator.
Probably what you want is this (not tested):

const size_t num_threads = atlas_omp_get_max_threads(); const size_t size = grid.size(); atlas_omp_parallel { // This is now executed concurrently in each available OpenMP threads // All thread-private data here: const size_t thread_num = atlas_omp_get_thread_num(); const size_t thread_begin = (thread_num * size) / num_threads; const size_t thread_end = (thread_num + 1) * size) / num_threads; auto lonlatIt = grid.lonlat().begin() + thread_begin; // Move iterator to thread_begin for (size_t i = thread_begin; i<thread_end; ++i, ++lonlatIt) { // regular for, already within an OpenMP thread const auto& lonlat = *lonlatIt; partitioning[i] = finder.getCell(lonlat, listSize, edgeEpsilon, epsilon).isect ? mpi_rank : -1; } }

The lon-lat iterator is random access, so const auto& lonLat = *(lonlatIt + i) should work, right? Feels a bit tidier if I can avoid direct omp library calls.

This won't work. A small test program shows this, with comments inside.

#include "atlas/library.h" #include "atlas/runtime/Log.h" #include "atlas/grid.h" using namespace atlas; int main(int argc, char* argv[]) { atlas::initialize(argc,argv); Grid grid{"CS-LFR-24"}; // Following bounds can be chunked per thread. int ibegin = 10; int iend = 11; int isize = iend - ibegin; const auto begin = grid.xy().begin() + ibegin; // ------------------------------------------------------ // 1) Dereference temporary iterator: Works... **NOT** // AND possibly very expensive to do the random access // finding tile, finding row, finding columns, ...) // versus "+1" which increments the already contained state. for (int i=0; i<isize; ++i) { const PointXY& p = *(begin+i); // creates temporary iterator and goes out of scope // p is now dangling. ATLAS_DEBUG_VAR(p); } // OUTPUT: DEBUG( p : {0,0} ) // ------------------------------------------------------ // 2) Copy iterator: **WORKS** but possibly very expensive (see (1)) // PLUS ADDITIONAL copy construction of iterator in inner loop. for (int i=0; i<isize; ++i) { auto it = begin + i; const PointXY& p = *it; ATLAS_DEBUG_VAR(p); } // OUTPUT: DEBUG( p : {39.375,-43.125} ) // ------------------------------------------------------ // 3) Standard iterator moved into start position: **WORKS** // The iterator is only constructed once per OpenMP thread, // when we apply the manual chunking (not one loop iteration per thread!) auto it = begin; for (int i=0; i<isize; ++i, ++it) { const PointXY& p = *(it); ATLAS_DEBUG_VAR(p); } // OUTPUT: DEBUG( p : {39.375,-43.125} ) // ------------------------------------------------------ atlas::finalize(); return 0; }

Yikes. Consider me educated.

More worrying, if we're having this debate, then the affected code is clearly untested...
I'd better sort that out!

I was thinking that too 👍

wdeconinck · 2025-06-06T14:16:52Z

src/atlas/interpolation/method/cubedsphere/CubedSphereBilinear.cc

                    std::to_string(i) + ".");
            }

-            tileIndex.push_back(tijView(cell.idx, 0));


tileIndex is not used elsewhere? dead code?

It's used in our model interface, but I plan on removing the code that before we advance our current Atlas tag.

odlomax · 2025-06-06T15:47:22Z

Thanks @odlomax this is a good idea. I have a few suggestions below.

Also I'm wondering why the complexity of the variant and visitor. You can directly insert into an oversized triplet vector, like done in #292 . The allocated memory is anyway the same because the variant takes the size of the largest type. Then you also avoid the extra visitor and copy into a new vector of triplets. The eckit::SparseMatrix constructor skips over empty triplets during construction.

Ah, I forgot about the triplet skipping! I'll do that then.

odlomax · 2025-06-09T09:39:08Z

Thanks @odlomax this is a good idea. I have a few suggestions below.

Also I'm wondering why the complexity of the variant and visitor. You can directly insert into an oversized triplet vector, like done in #292 . The allocated memory is anyway the same because the variant takes the size of the largest type. Then you also avoid the extra visitor and copy into a new vector of triplets. The eckit::SparseMatrix constructor skips over empty triplets during construction.

I've simplified the Triplets vector now. I'm getting sloppy.

wdeconinck

A test is needed to illustrate the problem in the lonlat iterator of the matchingmeshpartitionercubesphere.
Once failing, fix it, by updating the lonlat iterator in the MatchingMeshPartitionerCubedSphere.

… Let's get rid of it.

…ces.

odlomax · 2025-06-23T19:32:53Z

src/tests/interpolation/test_interpolation_cubedsphere.cc

+        EXPECT_APPROX_EQ(lonLat.lat(), refLonLat.lat(), 1e-14);
+
+        // Only now, *(begin + i), do you have my permission to die.
+    }


@wdeconinck

So, funny thing, const lvalue refs to temporary object members can extend the lifetime of the host object...

https://quuxplusone.github.io/blog/2020/11/16/lifetime-extension-tidbit/

Not that this is okay! But it's a little tricky to show that technically valid C++ is broken.

This seems like the worst code practice and should not be relied on for sure :)

Indeed! It's a code stink!

I'll remove the test on the next push.

src/atlas/interpolation/method/cubedsphere/CubedSphereBilinear.cc

odlomax · 2025-06-24T08:15:16Z

I've removed the redundant headers and the spooky C++ test.

I went with the const auto lonLat = *(lonlatIt + i) assignment in the end. The presence of the + operator on the iterator expresses intent that it should be random access. I think std::random_access_iterator_tag is actually set someone underneath the layers.

I'm happy to go with the other solution, but it seems like it just moves the complexity from an odd dereferencing pattern to more explicit thread management.

wdeconinck · 2025-06-24T09:45:49Z

I've removed the redundant headers and the spooky C++ test.

I went with the const auto lonLat = *(lonlatIt + i) assignment in the end. The presence of the + operator on the iterator expresses intent that it should be random access. I think std::random_access_iterator_tag is actually set someone underneath the layers.

I'm happy to go with the other solution, but it seems like it just moves the complexity from an odd dereferencing pattern to more explicit thread management.

The only issue is that this could be much more expensive than you think. Perhaps it's worth to benchmark the difference between:

not using any openmp, with the old formulation.
using the *(lonlatIt+i)
using explicit chunking

odlomax · 2025-06-24T12:39:11Z

I've removed the redundant headers and the spooky C++ test.
I went with the const auto lonLat = *(lonlatIt + i) assignment in the end. The presence of the + operator on the iterator expresses intent that it should be random access. I think std::random_access_iterator_tag is actually set someone underneath the layers.
I'm happy to go with the other solution, but it seems like it just moves the complexity from an odd dereferencing pattern to more explicit thread management.

The only issue is that this could be much more expensive than you think. Perhaps it's worth to benchmark the difference between:

not using any openmp, with the old formulation.

using the *(lonlatIt+i)

using explicit chunking

Oh.

I've just found the implementation of the += operator for the iterator in the CubedSphereGrid implementation...

🤦

odlomax · 2025-06-24T13:37:03Z

I present option 4, because why not? 🙃

wdeconinck · 2025-06-24T13:45:06Z

I present option 4, because why not? 🙃

I'm still not vouching for this solution, but if you're happy with the performance, and it's an improvement over the non-threaded version then all good for me.

odlomax · 2025-06-24T13:56:44Z

I present option 4, because why not? 🙃

I'm still not vouching for this solution, but if you're happy with the performance, and it's an improvement over the non-threaded version then all good for me.

There's time for it to go wrong yet! nvhpc hates lambdas with OpenMP!

wdeconinck · 2025-06-24T16:44:21Z

So my understanding with this proposal (4), is that essentially this iterates sequentially, stores it in a vector of points as unstructured grid, and then uses OpenMP to compute the partitioning, using faster random access in the vector.

odlomax · 2025-06-24T19:17:34Z

So my understanding with this proposal (4), is that essentially this iterates sequentially, stores it in a vector of points as unstructured grid, and then uses OpenMP to compute the partitioning, using faster random access in the vector.

Yeah, that's the gist of it!

I don't think we ever actually use this partitioner on other cubed spheres. I didn't realise (but should have expected) that the cubed sphere iterator doesn't do random access properly.

It's all such a hacky mess, but I think I've got a credible plan to safely dispose of it...

* release/0.43.0: (61 commits) Update Changelog Version 0.43.0 Change defaults of structured interpolation methods, originally modified with [ed0996d - Deprecate factory builders for structured interpolation methods] Added OpenMP to the cubed-sphere interpolator and the matching cubed sphere partitioner. (#293) Update ci-hpc-config.yml: turn off floating-point trapping due to openmpi problem Update ci-hpc-config.yml Fix warning on using uninitialized variable Fix warnings and memory leak in test Add tests that verifies running code referencing device_data on CPU Fix stridesf when host_data == device_data, e.g. when running on CPU only Only assert an arrays device is allocated when we have devices during make_device_view Add atlas_acc_pragma Disable ECKIT_GEO on for ci-hpc for now pluto: Fortran API for 'integer pluto%devices()' Turn off ATLAS_DEPRECATION_WARNINGS for the moment, to be enabled after upcoming release Deprecate factory builders for structured interpolation methods Add Factory deprecation mechanism Apply interolation::NonLinear to arrays using another field's metadata PointCloud: delay setup of halo_exchange and gather Remove scheduling keywords from omp pragma ...

Added OpenMP to cubed sphere interpolator and matching mesh partitioner.

b74edb1

github-actions bot added the contributor label Jun 6, 2025

odlomax added 2 commits June 6, 2025 11:24

Merge branch 'develop' into feature/cubed_sphere_interp_omp

c049976

Allow integer narrowing in triplet construction.

642f58c

wdeconinck reviewed Jun 6, 2025

View reviewed changes

odlomax added 2 commits June 6, 2025 17:08

Addressed race condition.

30c3ea5

Simplified weight triplet vector creation.

2b35711

Added const to lon-lat iterator declaration in partitioner.

930c10f

wdeconinck requested changes Jun 11, 2025

View reviewed changes

odlomax added 7 commits June 23, 2025 17:11

This might fail some CI...

010ac88

This should pass CI...

c1d03f4

Offending test case was an esoteric use case that wasn't that useful.…

c323717

… Let's get rid of it.

Removed redundant includes.

4c6837d

Removed redundant helper function.

8c24603

Added test to demonstrate unusual lifetime features for const referen…

c00e925

…ces.

typo in case name

e5dcf97

odlomax commented Jun 23, 2025

View reviewed changes

wdeconinck reviewed Jun 24, 2025

View reviewed changes

src/atlas/interpolation/method/cubedsphere/CubedSphereBilinear.cc Outdated Show resolved Hide resolved

wdeconinck reviewed Jun 24, 2025

View reviewed changes

src/atlas/interpolation/method/cubedsphere/CubedSphereBilinear.cc Outdated Show resolved Hide resolved

Tidied up after review.

5d0f545

Added check to deal with poorly performing CubedSphereIterator.

833a631

wdeconinck added the approved-for-ci label Jun 24, 2025

Merge branch 'develop' into feature/cubed_sphere_interp_omp

0b05fca

github-actions bot removed the approved-for-ci label Jun 24, 2025

wdeconinck added the approved-for-ci label Jun 24, 2025

wdeconinck approved these changes Jun 25, 2025

View reviewed changes

wdeconinck merged commit e4daac1 into ecmwf:develop Jun 25, 2025
171 of 187 checks passed

Conversation

odlomax commented Jun 6, 2025

Uh oh!

wdeconinck left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

odlomax commented Jun 6, 2025

Uh oh!

odlomax commented Jun 9, 2025

Uh oh!

wdeconinck left a comment

Choose a reason for hiding this comment

Uh oh!

odlomax Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

odlomax commented Jun 24, 2025

Uh oh!

wdeconinck commented Jun 24, 2025

Uh oh!

odlomax commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤦

Uh oh!

odlomax commented Jun 24, 2025

Uh oh!

wdeconinck commented Jun 24, 2025

Uh oh!

odlomax commented Jun 24, 2025

Uh oh!

wdeconinck commented Jun 24, 2025

Uh oh!

odlomax commented Jun 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

odlomax Jun 23, 2025 •

edited

Loading

odlomax commented Jun 24, 2025 •

edited

Loading

odlomax commented Jun 24, 2025 •

edited

Loading