Give FlashFluxRegisters ways to accumulate data in registers #3597

kweide · 2023-10-18T05:41:47Z

I introduce variant methods that allow adding to previously stored data in Flash(-X) flux registers, rather than just copying over the older data.

For Flash-X'ers: Something like this is needed to get flux correction working in the nontelescoping variant of the current Spark Hydro implementation.

Summary

Additional background

Checklist

The proposed changes:

fix a bug or incorrect behavior in AMReX
add new capabilities to AMReX
changes answers in the test suite to more than roundoff level
are likely to significantly affect the results of downstream AMReX users
include documentation in the code and/or rst files, if appropriate

I introduce variant methods that allow adding to previously stored data in Flash(-X) flux registers, rather than just copying over the older data. For Flash-X'ers: Something like this is needed to get flux correction working in the nontelescoping variant of the current Spark Hydro implementation.

WeiqunZhang · 2023-10-18T17:49:10Z

@kweide It might be worth adding a function template and two help functors to reduce code duplication. Something like

template <typename T>
struct AddOp {
    constexpr void operator() (T& destination, T source) const { destination += source; }
};

tempalte <typename T>
struct AssignOp {
    constexpr void operator() (T& destination, T source) const { destionation = source; } 
};

class FlashFluxRegister
{
public: // for cuda
template <typename F>
void StoreOrAdd (F&& f, int fine_global_index, ...);
};

Then store and add can simply call StoreOrAdd(AssignOp<Real>(), fine_global_index, ....) and StoreOrAdd(AddOp<Real>(), ...).

kweide · 2023-10-18T18:16:16Z

It might be worth adding a function template and two help functors

I am willing to try this. My templating foo is minimal, I'd basically be following your pattern without full understanding...
Is there any change this approach could make execution slower?

struct AssignOp {
    constexpr void operator() (T& destination, T source) const { destionation = source; } 
};

Should the constness of the rhs be expressed in some way, or is that already implied somehow?

WeiqunZhang · 2023-10-18T18:26:35Z

The functors will be inlined by the compiler. So I don't think there will be any performance penalty. As for constness, since it's copy by value with T source, it's not necessary to make it T const source. Even if the function modifies source, it will not affect the caller in any way because it's just a copy.

Yes lhs and rhs might be better names than destination and source.

kweide · 2023-10-19T19:32:10Z

Does the line

public: // for cuda

also have to go into AMReX_FlashFluxRegister.cpp or just AMReX_FlashFluxRegister.H ?

kweide · 2023-10-19T19:41:18Z

So with this choice of naming:

class FlashFluxRegister
{
public: // for cuda
template <typename F>
void StoreOrAdd (F&& f, int fine_global_index, ...);
...

what do the lines in the implementtions that actually do the cell-by-cell copying (or incrementing) turn into?
I assume that, e.g.,

#if (AMREX_SPACEDIM == 2)
                AMREX_HOST_DEVICE_PARALLEL_FOR_4D (b, ncomp, i, j, k, n,
                {
                    amrex::ignore_unused(k);
                    dest(i,j,0,n) += (src(2*i,2*j  ,0,n) +
                                      src(2*i,2*j+1,0,n)) * (Real(0.5)*sf);
                });
#endif

would turn into

#if (AMREX_SPACEDIM == 2)
                AMREX_HOST_DEVICE_PARALLEL_FOR_4D (b, ncomp, i, j, k, n,
                {
                    amrex::ignore_unused(k);
                    f(dest(i,j,0,n), (src(2*i,2*j  ,0,n) +
                                      src(2*i,2*j+1,0,n)) * (Real(0.5)*sf));
                });
#endif

Is that correct?
I'd likely use a different name for the first formal parameter then than just f...
Is there a syntactically more pleasing way that makes the assignment sort-of stand out as such?

WeiqunZhang · 2023-10-19T20:18:41Z

Does the line

public: // for cuda

also have to go into AMReX_FlashFluxRegister.cpp or just AMReX_FlashFluxRegister.H ?

You just need to make sure the function declaration inside the class is in the public part (i.e., making it a public function). This is a cuda limitation, thus the comment in the example. Otherwise, we would make it a private function. That public: is irrelevant for cpp file. Also since this is a member function template, it shoudl be in .H file. Something like

// AMReX_FlashFluxRegister.H

class A
{
public:
    template <typename F>
    void f (F && f) const;
};

template <typename F>
void A::f (F && f) const
{
}

WeiqunZhang · 2023-10-19T20:35:36Z

Yes feel free to change f to a more meaningful name.

If you want the code to be more explicit, you should be able to do something like this. (Not tested).

class FlashFluxRegister
{
public:

    enum struct OpType { Store, Add };

    template <OpType op>
    void StoreOrAdd (....);
};

template <FlashFluxRegister::OpType op>
FlashFluxRegister::StoreOrAdd (...)
{
    AMREX_HOST_DEVICE_PARALLEL_FOR_4D (.....
    {
        auto tmp = src(....) + ....;
        if constexpr (op == OpType::Store) {
            dest(i,j,0,n) = tmp;
        } else {
            dest(i,j,0,n) += tmp;
        }
    });
}

WeiqunZhang · 2023-10-19T20:47:59Z

Then Add can be implemented as StoreOrAdd<OpType::Add>(fine_global_index, ....).

WeiqunZhang · 2023-10-25T16:38:33Z

@kweide If you need help, I am happy to do it and then you can check if it works.

kweide · 2023-10-25T17:36:56Z

@WeiqunZhang Thank you for the offer. I would appreciate you going ahead with it.

I expect that this will not change the methods that are visible to Fortran code at all, they will have the new additional addit argument as already added in this PR and the same subroutine interfaces.

I was about to ask whether you would be willing to merge the PR as it is now, without functors and with the existing code duplication, perhaps as an intermediate step. I did not get around to work on this, and probably would not before the end of the month; on the other hand, I would really like this feature to be included in the upcoming (23.11) AMReX tag.

As far as wanting "the code to be more explicit" (by having = and += stand out explicitly), I don't care all that much about it, especially if it requires significantly more lines of code; as long as the "assignment function" (probably the wrong term!) has a somewhat meaningful name other than f() that should be fine.

WeiqunZhang · 2023-10-25T18:31:40Z

@kweide Could you take a look and give it a try? If it works, we will get this merged today.

kweide · 2023-10-26T00:03:48Z

@kweide Could you take a look and give it a try? If it works, we will get this merged today.

I will do some testing tomorrow.

Just wondering: How many of the automatic tests are actually testing compilation of these source files?
I assume there are no CI tests of the FlashFluxRegister stuff, is that so?

WeiqunZhang · 2023-10-26T00:27:01Z

I just counted. There are five GitHub CI tests with the fortran interface on. So it will probably catch most of the compilation errors in the FlahFluxRegister stuff. We also have three nightly regression tests for the fortran interface (2D amr, 3D amr and linear solver), in which the bindary plotfile results are compared with benchmarks. (But none of them uses FlashFluxRegister because we don't have a test using it.)

kweide · 2023-10-30T14:31:48Z

@kweide Could you take a look and give it a try? If it works, we will get this merged today.

I will do some testing tomorrow.

That is taking a bit longer than expected, still working on it today.

kweide · 2023-10-31T16:53:06Z

I have convinced myself now that this feature, with @WeiqunZhang,'s restructuring, works properly, and should be merged.

I got essentially the same results with Flash-X Sedov tests using this code (and relying on the accumulation feature) as with a comparison run that used Paramesh4 as the underlying Grid implementation.

WeiqunZhang · 2023-10-31T19:07:02Z

I just ran some regressions. All passed.

…odes#3597) I introduce variant methods that allow adding to previously stored data in Flash(-X) flux registers, rather than just copying over the older data. For Flash-X'ers: Something like this is needed to get flux correction working in the nontelescoping variant of the current Spark Hydro implementation. ## Summary ## Additional background ## Checklist The proposed changes: - [ ] fix a bug or incorrect behavior in AMReX - [x] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate --------- Co-authored-by: Weiqun Zhang <weiqunzhang@lbl.gov>

Reduce code duplication

3f9081c

For CUDA

78b712a

WeiqunZhang approved these changes Oct 31, 2023

View reviewed changes

WeiqunZhang merged commit 601cc4e into AMReX-Codes:development Oct 31, 2023
69 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Give FlashFluxRegisters ways to accumulate data in registers #3597

Give FlashFluxRegisters ways to accumulate data in registers #3597

kweide commented Oct 18, 2023

WeiqunZhang commented Oct 18, 2023

kweide commented Oct 18, 2023

WeiqunZhang commented Oct 18, 2023

kweide commented Oct 19, 2023

kweide commented Oct 19, 2023

WeiqunZhang commented Oct 19, 2023

WeiqunZhang commented Oct 19, 2023 •

edited

Loading

WeiqunZhang commented Oct 19, 2023

WeiqunZhang commented Oct 25, 2023

kweide commented Oct 25, 2023

WeiqunZhang commented Oct 25, 2023

kweide commented Oct 26, 2023

WeiqunZhang commented Oct 26, 2023

kweide commented Oct 30, 2023

kweide commented Oct 31, 2023

WeiqunZhang commented Oct 31, 2023

Give FlashFluxRegisters ways to accumulate data in registers #3597

Give FlashFluxRegisters ways to accumulate data in registers #3597

Conversation

kweide commented Oct 18, 2023

Summary

Additional background

Checklist

WeiqunZhang commented Oct 18, 2023

kweide commented Oct 18, 2023

WeiqunZhang commented Oct 18, 2023

kweide commented Oct 19, 2023

kweide commented Oct 19, 2023

WeiqunZhang commented Oct 19, 2023

WeiqunZhang commented Oct 19, 2023 • edited Loading

WeiqunZhang commented Oct 19, 2023

WeiqunZhang commented Oct 25, 2023

kweide commented Oct 25, 2023

WeiqunZhang commented Oct 25, 2023

kweide commented Oct 26, 2023

WeiqunZhang commented Oct 26, 2023

kweide commented Oct 30, 2023

kweide commented Oct 31, 2023

WeiqunZhang commented Oct 31, 2023

WeiqunZhang commented Oct 19, 2023 •

edited

Loading