Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add comments to FluxRegister::OverwriteFlux #3216

Merged

Conversation

WeiqunZhang
Copy link
Member

No description provided.

@WeiqunZhang WeiqunZhang enabled auto-merge (squash) March 23, 2023 16:25
@WeiqunZhang WeiqunZhang merged commit 7aad6ca into AMReX-Codes:development Mar 23, 2023
WeiqunZhang added a commit to WeiqunZhang/amrex that referenced this pull request Mar 28, 2023
* SENSIE CI: run on ubuntu-20.04. 18.04 is deprecated. (AMReX-Codes#3131)

* PODVector: Add some stream synchronization (AMReX-Codes#3130)

The correctness of some functions depends on the asynchronous behavior
of memcpy (especially when pageable host memory is involved). It seems
that that is not well defined in the SYCL standard. So for safety,
stream synchronization is added to a few places.

* Fix AMReX-Codes#3105: missing include (AMReX-Codes#3134)

* Initialize TinyProfiler earlier only for memory (AMReX-Codes#3137)

Initialize TinyProfiler earlier only for memory and finalize it later
only for memory as well.

* Functional: Less -> Minimum, Greater -> Maximum (AMReX-Codes#3138)

The functional classes for returning the lesser and greater of two
arguments have been misnamed, because std::less and std::greater are for
comparison and they return bool. To avoid confusion, they have been
renamed.

* Test: Reinit AMReX (AMReX-Codes#3141)

## Summary

Run a test that inits and finalizes AMReX multiple times.

```console
cmake -S . -B build -DAMReX_GPU_BACKEND=CUDA -DAMReX_ENABLE_TESTS=ON -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build build -j 8
ctest --test-dir build --verbose -R Reinit
```
```console
build/Tests/Reinit
cuda-gdb -ex r -ex bt --args ./Test_Reinit inputs
```

## Additional background

This is often done in pytest tests, where the same process is reused for
multiple tests.
Needed for pyAMReX, ImpactX, WarpX et al.

This also demonstrates a bug for GPU testing.

* Add a compute face-centered velocity gradient in MLEBTensor (AMReX-Codes#3133)

* Fix bug in Device::Finalize: Forgot to clear streams (AMReX-Codes#3143)

## Summary

We forgot to clear the non-owning gpu_stream vector. They still held the
old streams after we destroyed the streams in Device::Finalize. This bug
affects codes that call amrex::Initialize and amrex::Finalize
repeatedly.

Note that we have the non-owning gpu stream vector for OMP CPU thread
safety.

The bug was introduced in AMReX-Codes#1064.

## Additional background

AMReX-Codes#3141 

## Checklist

The proposed changes:
- [x] fix a bug or incorrect behavior in AMReX
- [ ] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate

* Fix nl test (AMReX-Codes#3142)

This fixes a bug introduced in PR AMReX-Codes#3060 related to building neighbor
lists between particles of different types. I also made it so that CI
tests would have caught this bug.

Closes AMReX-Codes#3127

* Fix FabArray::shift (AMReX-Codes#3147)

## Summary

The cached metadata used by fused GPU kernel launches needs to be
updated in FabArray::shift because the BoxArray has changed.

## Additional background

Close AMReX-Codes#3145

* AmrLevel::FillPatch: Abort if grids are not properly nested for GPU or EB (AMReX-Codes#3098)

* FillPatcher: Update interpolation in time (AMReX-Codes#3106)

The previous implementation was susceptible to roundoff errors. It was
observed in 7 levels PeleC runs that the code could fail because the
times in StateData were out of sync. The updated logic is closer to that
in the FillPatch function and the StateData class.

* Runtime parameters for Signal handling controls (AMReX-Codes#3125)

Add amrex.handle_sigsegv, handle_sigint, handle_sigabrt, and
handle_sigfpe for more fine controls. Disable sigsegv handling for
certain Intel GPUs because it's incompatible with the managed memory.

* Adjust subgroup size for Intel GPUs (AMReX-Codes#3139)

We can now use 32 as the subgroup size for Intel GPUs. This improves
performance in most situations.

* SYCL: Fix the order of destruction (AMReX-Codes#3146)

Although we have not observed any issues, we should destroy the SYCL
context, device, and queues in the reverse order of their construction.

* GNU Make: Add support for Intel LLVM compiler in CPU builds (AMReX-Codes#3126)

icpc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is
deprecated and will be removed from product release in the second half
of 2023.

* Add MPI support to GpuComplex (AMReX-Codes#3148)

This creates a specialization of Mpi_typemap for GpuComplex<T>.

Needed by ECP-WarpX/WarpX#3618

## Checklist

The proposed changes:
- [ ] fix a bug or incorrect behavior in AMReX
- [x] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate

* DPCPP -> SYCL (AMReX-Codes#3140)

The name dpcpp has been deprecated by Intel. This PR replaces DPCPP with
SYCL. For backward compatibility, we have kept the AMREX_USE_DPCPP
macro. However, build options like AMReX_DPCPP_AOT for cmake and
DPCPP_AOT for GNU Make have been replaced by AMReX_SYCL_AOT and
SYCL_AOT.

* Fix bug in async IO for particles (AMReX-Codes#3150)

Without this fix the following will fail:

> amrex/Tests/Particles/NeighborParticles(development)$ make -j8
TINY_PROFILE=TRUE BL_NO_FORT=TRUE USE_MPI=FALSE DEBUG=TRUE
> $ ./main3d.gnu.DEBUG.TPROF.CUDA.ex inputs amrex.async_out=1
amrex.the_arena_is_managed=0
> Initializing CUDA...
> CUDA initialized with 1 device.
> AMReX (23.02-21-g858e72dcc651) initialized
> 0::Assertion `index >= 0 && index < NReal +
static_cast<int>(m_runtime_rdata.size())' failed, file
"../../..//Src/Particle/AMReX_StructOfArrays.H", line 43 !!!
> SIGABRT

The proposed changes:
- [x] fix a bug or incorrect behavior in AMReX
- [ ] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate

* Fix case where a norm is 1e-15 but not technically zero (AMReX-Codes#3149)

* SUNDIALS: Replace SUNDIALS math macros with std math functions (AMReX-Codes#3151)

This fixes xsdk-project/xsdk-issues#211.

* Add static_assert to Mpi_typemap<GpuComplex> (AMReX-Codes#3154)

* CI: Add sundials (AMReX-Codes#3153)

Also fixes warnings.

* SYCL: Fix AOT (AMReX-Codes#3155)

This reverts part of AMReX-Codes#3123 because the AOT argument is a compile option.
Also we must provide AMReX_INTEL_ARCH when SYCL AOT is on.

* GNU Make: make it an error if intel gpu arch for aot is unknown (AMReX-Codes#3157)

This is a follow-up on AMReX-Codes#3155 to make it consistent with CMake, and also
because `-device *` does not actually work on the testbed.

* SYCL AOT: Fix warning (AMReX-Codes#3159)

Finally set the proper flags without generating warnings about unused
compiler arguments.

* SYCL Device Info (AMReX-Codes#3161)

Print out more SYCL device information if available. Currently, free
memory and max memory band width are not available yet.

* gitignore: Editors & IDEs (AMReX-Codes#3163)

## Summary

Add a few typical editor and IDE project directories to `.gitignore`.

## Additional background

## Checklist

The proposed changes:
- [ ] fix a bug or incorrect behavior in AMReX
- [ ] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate

* amrex::Math: No longer need to use sycl math functions (AMReX-Codes#3164)

We introduced `amrex::Math::floor`, etc. because their std versions were
not supported in device code by the Intel compiler at that time.
Otherwise, the user would have to call either sycl::floor or std::floor
with the help of macros. This is no longer the case. Now we can switch
to use the std versions. The amrex::Math versions are kept for backward
compatibility. The particle code is not updated because it's undergoing
some large changes.

* SYCL: Optimization of small reduction (AMReX-Codes#3167)

For small reductions, it's faster to do the second pass on CPU. The
change is for SYCL only.

* New functions for getting data in cell or line. (AMReX-Codes#3166)

Add get_cell_data and get_line_data to get the data in a cell or a line,
respectively. Note that we already had get_slice_data.

## Checklist

The proposed changes:
- [ ] fix a bug or incorrect behavior in AMReX
- [x] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate

* Add Make.local-pre to gitignore (AMReX-Codes#3171)

Given that `Make.local` files are in the `.gitignore`, this should
probably be there too.

* SYCL: Group and subgroup size (AMReX-Codes#3172)

* Use [[sycl::reqd_work_group_size]] when we know the group size at
compile time. I have not seen perfromance difference, but it should not
hurt.

* Replace [[intel::reqd_sub_group_size]] with
[[sycl::reqd_sub_group_size]] since the Intel extension is in the
standard now.

* TableData: Relax the static assertion (AMReX-Codes#3174)

Relax the requirement of T to trivially copyable (so that htod memcpy
will work) and trivially destructible. After this change,
TableData<GpuComplex<Real>> should work.

* CMake: Parallel HDF5 Detection (AMReX-Codes#3170)

## Summary

It turns out the words in the settings summary of HDF5 came from the
words used when HDF5 was configured with CMake. The issue is `h5pcc
-showconfig` may show `Parallel HDF5: ON`, but the current versions of
CMake uses `Parallel HDF5: yes` to detect Parallel HDF5. This commit
adds a more reliable way of detecting Parallel HDF5.

## Additional background

- spack/spack#35546
- HDFGroup/hdf5#2488
- https://gitlab.kitware.com/cmake/cmake/-/merge_requests/8234

## Checklist

The proposed changes:
- [x] fix a bug or incorrect behavior in AMReX
- [ ] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate

* Hypre setup and solve profile (AMReX-Codes#3168)

I've wrapped the Hypre setup and solve functions so they can be more
easily profiled.

Co-authored-by: Paul Mullowney <Paul.Mullowney@nrel.gov>

* GNU Make: Allow choice of Intel compiler variant (AMReX-Codes#3175)

## Summary
This allows the user to specify `COMP = intel-classic` or `COMP =
intel-llvm` if they wish to use a specific one.

## Additional background
Note that the existing behaviour is maintained if `COMP = intel` (i.e.
intel-llvm is chosen if the `icpx` command exists and intel-classic is
chosen otherwise following AMReX-Codes#3126). However, the `AMREX_FCOMP`,
`AMREX_CCOMP` and `lowercase_comp` makefile variables are changed to one
of `intel-llvm` or `intel-classic` depending on which is used. This
further allows the user to use information about the used variant in
their `Make.local`. For example they may wish to set something like
```
CXX = mpiicpc -cxx=icpx
```
in the case `AMREX_CCOMP = intel-llvm`, `USE_MPI = TRUE` and they are
using Intel MPI.

## Checklist

The proposed changes:
- [ ] fix a bug or incorrect behavior in AMReX
- [x] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [x] include documentation in the code and/or rst files, if appropriate

* GNU Make: Allow customizing tmp_build_dir (AMReX-Codes#3177)

## Summary
This allows the user to change the default temporary build directory to
something other than `tmp_build_dir`.

## Additional background
This would allow for users to build multiple AMReX applications using
the same build directory and thus reduce compile times without having to
use `ccache`. Of course, there might be a small risk in terms of users
potentially accidentally mixing build configurations hence I would
suggest not advertising this feature or at least advising caution.

* fix recently introduced tmp_build_dir build bug (AMReX-Codes#3179)

the make variable was renamed from TmpBuildDir to TMP_BUILD_DIR breaking
application codes. This adds an alias to the old make variable name so
code compile again.

addresses bug in AMReX-Codes#3177 
closes AMReX-Codes#3178

* SYCL: Fix AMReX-Codes#3172 (AMReX-Codes#3181)

This may not be our bug. But using sycl::reqd_work_group_size does not
work for the 2d range cases. So we have to remove that.

* Update CHANGES for 23.03 (AMReX-Codes#3183)

* Fix race condition with TinyProfiler for memory (AMReX-Codes#3169)

Fix AMReX-Codes#3156 by giving every Arena its own map for profiling.

* SYCL: Tweak AMReX-Codes#3164 and switch to multi_ptr for sincos (AMReX-Codes#3185)

* Use std::isnan and std::isinf instead of ::isnan and ::isinf for HIP and CUDA.

* Add sycl::isinf and sycl::isfinite back to amrex::Math because Intel compiler's aggressive fp mode makes std::isinf and std::isfinite always return false in release build.

* Use sycl::isinf and sycl::isnan in amrex::isnan and amrex::isinf when SYCL is on for the same reason as above.

* Fix call to sycl::sincos.  According to the SYCL standard the pointer should be a multi_ptr instead of a raw pointer.

* ParticleContainer: Resize runtime components (AMReX-Codes#3186)

Add two new functions ResizeRuntimeRealComp and ResizeRuntimeIntComp to
ParticelContainer.

* Clang-Tidy (AMReX-Codes#3176)

Add a configuration file .clang-tidy for clang-tidy. Add support to it
in CMake. It can be enabled by `-DAMReX_CLANG_TIDY=ON`.

Fix many warnings raised by clang-tidy. Most of them are fixed by
`clang-tidy --fix-errors`.

To bypass clang-tidy, some unfixable codes have been moved to
`*.nolint.H`.

A number of issues have been found and fixed.
    
* There was use-after-move in testing OpenMP thread safety for
communication.

* The MultiFab tags were not handled correctly during vector resize.

* Parentheses were missing in the definition of AMREX_REAL_LOWEST.

* ReduceToPlane (AMReX-Codes#3187)

* Add a function template for reducing MultiFab data to a plane.

* Add KeyValuePair as an alias to ValLocPair.

* Add more versions of MPI reduce functions for KeyValuePair/ValLocPair.

* Fix an issue in AMReX-Codes#3176 (AMReX-Codes#3188)

We need to add another version of SetTag now, otherwise it breaks WarpX.

* Clang-Tidy II (AMReX-Codes#3190)

This is the second batch of changes fixing clang-tidy warnings.

Add CMake option AMReX_CLANG_TIDY_WERROR to update warnings to errors.

Use clang-tidy in CI: GNU@8.4 C++17 Release [lib].

* Update SUNDIALS NVector constructors (AMReX-Codes#3165)

## Summary

Update MultiFab NVector wrappers to accept a sundials context object.

## Additional background

Profiling (and in the future error handling) in sundials requires
objects to be create with a non-NULL context. This PR updates the
MultiFab NVector wrappers with an optional input for the sundials
context that defaults to the context created at initialization for the
corresponding OpenMP thread.

* Sorting for faster current deposition (AMReX-Codes#3198)

## Summary

For current deposition in codes such as HiPACE++ and WarpX, it is mush
better to sort by `x -> y -> z -> ppc` instead of `ppc -> x -> y -> z`.

## Additional background

To use this, pass in {0,0,0} (sort by cell), {-1,-1,-1} (sort by node)
or {-2, -2, -2} (sort by cell and node) as a bin_size into
SortParticlesByBin.

## Checklist

The proposed changes:
- [ ] fix a bug or incorrect behavior in AMReX
- [x] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate

* Add assert that we are not inside threaded region when InitRandom is called. (AMReX-Codes#3204)

* Clang-Tidy III (AMReX-Codes#3197)

This is the third batch of changes fixing clang-tidy warnings. Most of
the warnings are in EB code.

Use Clang 14 in one of the Clang CI tests that used to use Clang 7.

* Clang-Tidy IV (AMReX-Codes#3200)

This is the fourth batch of changes fixing clang-tidy warnings. Most of
them are in particle code.

* Fix AMReX-Codes#3200 (AMReX-Codes#3206)

It was a mistake to delete the move assignment operator of
ParticleIDWrapper and ParticleCPUWrapper.

* Fix issues in ParticleContainerBase (AMReX-Codes#3208)

* The move constructor and assignment operator were not safe, because
the m_gdb member could be the address of m_gdb_object. After the move
that can no longer be true. This is fixed by making m_gdb_object a
unique_ptr. Moving a unique_ptr does not change its pointer inside.

* Fix SetParticleBoxArray, SetParticleDistributionMapping and
SetParticleGeometry. The issue was these functions took a reference,
which could be aliasing what was inside the m_gdb_object. ParGDB has a
move assignment operator that could render the reference pointing to an
invalid object. So now these functions take by value. This was not an
issue before AMReX-Codes#3200, because the move assignment of ParGDB did copy
instead of move.

* Add Jean's name to the development team (AMReX-Codes#3209)

* fix typo in debugging docs (AMReX-Codes#3210)

## Summary

## Additional background

## Checklist

The proposed changes:
- [ ] fix a bug or incorrect behavior in AMReX
- [ ] add new capabilities to AMReX
- [ ] changes answers in the test suite to more than roundoff level
- [ ] are likely to significantly affect the results of downstream AMReX
users
- [ ] include documentation in the code and/or rst files, if appropriate

* Clang-Tidy V (AMReX-Codes#3201)

This is the fifth batch of changes fixing clang-tidy warnings. The
changes are most related to single precision, 1D, 2D, and OMP.

Enable clang-tidy in GCC and Clang CIs.

Switch to Ubuntu 22.04 and GCC 12 in one test.

* Clang-Tidy VI (AMReX-Codes#3202)

* Update GNU Make. One can use `USE_CLANG_TIDY=TRUE` to enable
clang-tidy check. For example, to enable clang-tidy check, specify the
clang-tidy command (if it's the default clang-tidy), and treat warnings
as errors, one can run

make USE_CLANG_TIDY=TRUE CLANG_TIDY=clang-tidy-12
CLANG_TIDY_WARN_ERROR=TRUE

By default amrex/.clang-tidy is used as the config file. One can change
it with CLANG_TIDY_CONFIG_FILE=my_clang-tidy-conf. If you want to let
clang-tidy fix errors for you, you can run

      make USE_CLANG_TIDY=TRUE CLANG_TIDY="clang-tidy --fix-errors"

To avoid race conditions, the make job should not use `-j` (unless it's
`-j1`) when fixing errors.

* Add clang-tidy to GNU Make CIs.

* Fix warnings in Fortran interface, particle and plotfile tools.

* Clang-Tidy VII (AMReX-Codes#3203)

Fix warnings in the interfaces for Hypre, PETSc and SUNDIALS.

* Warning about CPU core oversubscription (AMReX-Codes#3211)

If the user forgets to set OMP_NUM_THREADS, OMP might be oversubscribing
CPU cores. If that's the case (to the best of our knowledge) and
verbosity is on, we print out a warning.

We use `std::thread::hardware_concurrency` to detect the number of cores
available. Note that this is only a hint according to the C++ standard.
To detect the number of MPI ranks per node, we use OMP_COMM_TYPE_NODE
for Open MPI and MPI_COMM_TYPE_SHARED for others. Since
MPI_COMM_TYPE_SHARED might group processes across nodes, the return
value of `ParallelDescriptor::NProcsPerNode` might also be wrong.
Nevertheless, both `std::thread::hardware_concurrency` and
`ParallelDescriptor::NProcsPerNode` seem to work well.

* SYCL sub-group size (AMReX-Codes#3180)

Add option to specify SYCL sub-group size. The default is 32. It can be
changed to 16 with `-DAMReX_SYCL_SUB_GROUP_SIZE=16` in CMake and
`SYCL_SUB_GROUP_SIZE=16` in GNU Make.

* Add build option for Intel SYCL AOT GRF mode (AMReX-Codes#3173)

By default, the Intel SYCL compiler uses up to 128 registers. In the
large register file mode, it's up to 256 registers. There is also the
auto-large mode.

For CMake, one can use `-DAMReX_SYCL_AOT_GRF_MODE=[Large|AutoLarge]` to
choose the large register file or auto-large mode. AOT must also be
enabled with `-DAMReX_SYCL_AOT=TRUE`.

For GNU Make, one can use `SYCL_AOT_GRF_MODE=[Large|AutoLarge]`, and it
depends on AOT being enabled by `SYCL_AOT=TRUE`.

* Add comments to FluxRegister::OverwriteFlux (AMReX-Codes#3216)

* Use ccache in Github Actions

---------

Co-authored-by: AlexanderSinn <64009254+AlexanderSinn@users.noreply.github.com>
Co-authored-by: Axel Huebl <axel.huebl@plasma.ninja>
Co-authored-by: Lucas Esclapez <13371051+esclapez@users.noreply.github.com>
Co-authored-by: Andrew Myers <atmyers2@gmail.com>
Co-authored-by: asalmgren <asalmgren@lbl.gov>
Co-authored-by: Miren Radia <32646026+mirenradia@users.noreply.github.com>
Co-authored-by: PaulMullowney <60452402+PaulMullowney@users.noreply.github.com>
Co-authored-by: Paul Mullowney <Paul.Mullowney@nrel.gov>
Co-authored-by: Michael Zingale <michael.zingale@stonybrook.edu>
Co-authored-by: Nuno Miguel Nobre <nuno.nobre@stfc.ac.uk>
Co-authored-by: David Gardner <gardner48@llnl.gov>
guj pushed a commit to guj/amrex that referenced this pull request Jul 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants