Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give FlashFluxRegisters ways to accumulate data in registers #3597

Merged
merged 3 commits into from
Oct 31, 2023

Conversation

kweide
Copy link
Contributor

@kweide kweide commented Oct 18, 2023

I introduce variant methods that allow adding to previously stored data in Flash(-X) flux registers, rather than just copying over the older data.

For Flash-X'ers: Something like this is needed to get flux correction working in the nontelescoping variant of the current Spark Hydro implementation.

Summary

Additional background

Checklist

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • include documentation in the code and/or rst files, if appropriate

I introduce variant methods that allow adding to previously
stored data in Flash(-X) flux registers, rather than just
copying over the older data.

For Flash-X'ers: Something like this is needed to get flux
correction working in the nontelescoping variant of the current
Spark Hydro implementation.
@WeiqunZhang
Copy link
Member

@kweide It might be worth adding a function template and two help functors to reduce code duplication. Something like

template <typename T>
struct AddOp {
    constexpr void operator() (T& destination, T source) const { destination += source; }
};

tempalte <typename T>
struct AssignOp {
    constexpr void operator() (T& destination, T source) const { destionation = source; } 
};

class FlashFluxRegister
{
public: // for cuda
template <typename F>
void StoreOrAdd (F&& f, int fine_global_index, ...);
};

Then store and add can simply call StoreOrAdd(AssignOp<Real>(), fine_global_index, ....) and StoreOrAdd(AddOp<Real>(), ...).

@kweide
Copy link
Contributor Author

kweide commented Oct 18, 2023

It might be worth adding a function template and two help functors

I am willing to try this. My templating foo is minimal, I'd basically be following your pattern without full understanding...
Is there any change this approach could make execution slower?

struct AssignOp {
    constexpr void operator() (T& destination, T source) const { destionation = source; } 
};

Should the constness of the rhs be expressed in some way, or is that already implied somehow?

@WeiqunZhang
Copy link
Member

The functors will be inlined by the compiler. So I don't think there will be any performance penalty. As for constness, since it's copy by value with T source, it's not necessary to make it T const source. Even if the function modifies source, it will not affect the caller in any way because it's just a copy.

Yes lhs and rhs might be better names than destination and source.

@kweide
Copy link
Contributor Author

kweide commented Oct 19, 2023

Does the line

public: // for cuda

also have to go into AMReX_FlashFluxRegister.cpp or just AMReX_FlashFluxRegister.H ?

@kweide
Copy link
Contributor Author

kweide commented Oct 19, 2023

So with this choice of naming:

class FlashFluxRegister
{
public: // for cuda
template <typename F>
void StoreOrAdd (F&& f, int fine_global_index, ...);
...

what do the lines in the implementtions that actually do the cell-by-cell copying (or incrementing) turn into?
I assume that, e.g.,

#if (AMREX_SPACEDIM == 2)
                AMREX_HOST_DEVICE_PARALLEL_FOR_4D (b, ncomp, i, j, k, n,
                {
                    amrex::ignore_unused(k);
                    dest(i,j,0,n) += (src(2*i,2*j  ,0,n) +
                                      src(2*i,2*j+1,0,n)) * (Real(0.5)*sf);
                });
#endif

would turn into

#if (AMREX_SPACEDIM == 2)
                AMREX_HOST_DEVICE_PARALLEL_FOR_4D (b, ncomp, i, j, k, n,
                {
                    amrex::ignore_unused(k);
                    f(dest(i,j,0,n), (src(2*i,2*j  ,0,n) +
                                      src(2*i,2*j+1,0,n)) * (Real(0.5)*sf));
                });
#endif

Is that correct?
I'd likely use a different name for the first formal parameter then than just f...
Is there a syntactically more pleasing way that makes the assignment sort-of stand out as such?

@WeiqunZhang
Copy link
Member

Does the line

public: // for cuda

also have to go into AMReX_FlashFluxRegister.cpp or just AMReX_FlashFluxRegister.H ?

You just need to make sure the function declaration inside the class is in the public part (i.e., making it a public function). This is a cuda limitation, thus the comment in the example. Otherwise, we would make it a private function. That public: is irrelevant for cpp file. Also since this is a member function template, it shoudl be in .H file. Something like

// AMReX_FlashFluxRegister.H

class A
{
public:
    template <typename F>
    void f (F && f) const;
};

template <typename F>
void A::f (F && f) const
{
}

@WeiqunZhang
Copy link
Member

WeiqunZhang commented Oct 19, 2023

Yes feel free to change f to a more meaningful name.

If you want the code to be more explicit, you should be able to do something like this. (Not tested).

class FlashFluxRegister
{
public:

    enum struct OpType { Store, Add };

    template <OpType op>
    void StoreOrAdd (....);
};

template <FlashFluxRegister::OpType op>
FlashFluxRegister::StoreOrAdd (...)
{
    AMREX_HOST_DEVICE_PARALLEL_FOR_4D (.....
    {
        auto tmp = src(....) + ....;
        if constexpr (op == OpType::Store) {
            dest(i,j,0,n) = tmp;
        } else {
            dest(i,j,0,n) += tmp;
        }
    });
}

@WeiqunZhang
Copy link
Member

Then Add can be implemented as StoreOrAdd<OpType::Add>(fine_global_index, ....).

@WeiqunZhang
Copy link
Member

@kweide If you need help, I am happy to do it and then you can check if it works.

@kweide
Copy link
Contributor Author

kweide commented Oct 25, 2023

@WeiqunZhang Thank you for the offer. I would appreciate you going ahead with it.

I expect that this will not change the methods that are visible to Fortran code at all, they will have the new additional addit argument as already added in this PR and the same subroutine interfaces.

I was about to ask whether you would be willing to merge the PR as it is now, without functors and with the existing code duplication, perhaps as an intermediate step. I did not get around to work on this, and probably would not before the end of the month; on the other hand, I would really like this feature to be included in the upcoming (23.11) AMReX tag.

As far as wanting "the code to be more explicit" (by having = and += stand out explicitly), I don't care all that much about it, especially if it requires significantly more lines of code; as long as the "assignment function" (probably the wrong term!) has a somewhat meaningful name other than f() that should be fine.

@WeiqunZhang
Copy link
Member

@kweide Could you take a look and give it a try? If it works, we will get this merged today.

@kweide
Copy link
Contributor Author

kweide commented Oct 26, 2023

@kweide Could you take a look and give it a try? If it works, we will get this merged today.

I will do some testing tomorrow.

Just wondering: How many of the automatic tests are actually testing compilation of these source files?
I assume there are no CI tests of the FlashFluxRegister stuff, is that so?

@WeiqunZhang
Copy link
Member

I just counted. There are five GitHub CI tests with the fortran interface on. So it will probably catch most of the compilation errors in the FlahFluxRegister stuff. We also have three nightly regression tests for the fortran interface (2D amr, 3D amr and linear solver), in which the bindary plotfile results are compared with benchmarks. (But none of them uses FlashFluxRegister because we don't have a test using it.)

@kweide
Copy link
Contributor Author

kweide commented Oct 30, 2023

@kweide Could you take a look and give it a try? If it works, we will get this merged today.

I will do some testing tomorrow.

That is taking a bit longer than expected, still working on it today.

@kweide
Copy link
Contributor Author

kweide commented Oct 31, 2023

I have convinced myself now that this feature, with @WeiqunZhang,'s restructuring, works properly, and should be merged.

I got essentially the same results with Flash-X Sedov tests using this code (and relying on the accumulation feature) as with a comparison run that used Paramesh4 as the underlying Grid implementation.

@WeiqunZhang
Copy link
Member

I just ran some regressions. All passed.

@WeiqunZhang WeiqunZhang merged commit 601cc4e into AMReX-Codes:development Oct 31, 2023
69 checks passed
guj pushed a commit to guj/amrex that referenced this pull request Dec 13, 2023
…odes#3597)

I introduce variant methods that allow adding to previously stored data
in Flash(-X) flux registers, rather than just copying over the older
data.

For Flash-X'ers: Something like this is needed to get flux correction
working in the nontelescoping variant of the current Spark Hydro
implementation.

## Summary

## Additional background

## Checklist

The proposed changes:
- [ ] fix a bug or incorrect behavior in AMReX
- [x] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate

---------

Co-authored-by: Weiqun Zhang <weiqunzhang@lbl.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants