Export GpuDevice Globals #2918

ax3l · 2022-08-15T17:13:27Z

Summary

Implement symbol export via AMREX_EXPORT for the global variables in Src/Base/AMReX_GpuDevice.H.

Additional background

Follow-up to #1847 #1847

Fix #2917

Checklist

The proposed changes:

fix a bug or incorrect behavior in AMReX
add new capabilities to AMReX
changes answers in the test suite to more than roundoff level
are likely to significantly affect the results of downstream AMReX users
include documentation in the code and/or rst files, if appropriate

Implement symbol export via `AMREX_EXPORT` for the global variables in `Src/Base/AMReX_GpuDevice.H`. Follow-up to AMReX-Codes#1847 AMReX-Codes#1847 Fix AMReX-Codes#2917

@afanfa

commit 10e99fb Merge: d03045d f1e1d6f Author: Andrew Myers <atmyers2@gmail.com> Date: Wed Nov 2 14:06:00 2022 -0700 Merge branch 'particle_soa_refactor' of github.com:Thierry992/amrex into HEAD commit d03045d Author: Andrew Myers <atmyers2@gmail.com> Date: Wed Nov 2 14:04:23 2022 -0700 fix buffer pack / unpack commit d771fc8 Author: Andrew Myers <atmyers2@gmail.com> Date: Wed Nov 2 14:04:08 2022 -0700 revert to one int for each id for now commit f1e1d6f Merge: 4dbfbac c4a4811 Author: Axel Huebl <axel.huebl@plasma.ninja> Date: Tue Nov 1 15:18:54 2022 -0500 Merge remote-tracking branch 'mainline/development' into particle_soa_refactor commit c4a4811 Author: Axel Huebl <axel.huebl@plasma.ninja> Date: Tue Nov 1 14:08:38 2022 -0500 C++17 Transition (AMReX-Codes#2992) ## Summary Update AMReX to require C++17 or newer. - [x] docs - [x] CMake - [x] GNUmake - [x] CI ## Additional background Requires a mature [C++17](https://en.wikipedia.org/wiki/C%2B%2B17) compiler, e.g., GCC 8, Clang 7, NVCC 11.0, MSVC 19.15 or newer. Already used since 1+ year in production by downstream codes such as Castro and WarpX. Needed for modernization and new features such as AMReX-Codes#2878 Co-authored-by: Weiqun Zhang <weiqunzhang@lbl.gov> commit d2b8293 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Tue Nov 1 09:01:54 2022 -0700 Update CHANGES for 22.11 (AMReX-Codes#3006) commit 5ec270b Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Tue Nov 1 08:59:44 2022 -0700 Fix compilation for PETSc (AMReX-Codes#3005) We cannot include PETSc headers too early because it might redefine MPI routines as macros (https://github.com/petsc/petsc/blob/main/include/petsclog.h#L441). They break MPI calls like below, MPI_Allreduce(&tmp, &vi, 1, ParallelDescriptor::Mpi_typemap<T>::type(), ParallelDescriptor::Mpi_op<T,amrex::Greater<T>>(), comm); because of the `,` in `<T,amrex::Greater<T>>`. commit 735c351 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Sat Oct 29 10:57:23 2022 -0700 MPI Reduce for ValLocPair (AMReX-Codes#3003) Add ParallelReduce::Min, ParallelReduce::Max, ParallelAllReduce::Min, and ParallelAllReduce::Max for ValLocPair<TV,TI>, where TV and TI are types that have corresponding MPI types (e.g., int, Real, IntVect, Box, etc.). commit 3ec0768 Author: Axel Huebl <axel.huebl@plasma.ninja> Date: Wed Oct 26 16:49:40 2022 -0700 `FabArray::isDefined` (AMReX-Codes#2997) ## Summary Add a new query to `define_function_called`. ## Additional background This is a cheaper check than `ok()` for finding out if a MultiFab has been allocated or not yet, assuming that the calling code follows the convention that `define()` is called collectively. Update: It turns out you can also call `empty` inherited from `FabArrayBase`. The new API is quite explicit, which is ok, too. Co-authored-by: Weiqun Zhang <WeiqunZhang@lbl.gov> commit 7f3c908 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Oct 26 16:40:16 2022 -0700 Make The_Device_Arena non-managed (AMReX-Codes#2998) The_Device_Arena used to be a separate Arena. We changed it to be an alias of The_Arena to avoid memory fragmentation. However, the issue is we don't have an Arena that can allocate non-managed memory unless The_Arena is not managed. Because of performance concerns, we sometimes want to allocate non-managed memory. Therefore, we make The_Device_Arena an alias if and only if The_Arena is not managed. commit ab8c892 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Oct 26 15:59:39 2022 -0700 Add alias template Gpu::NonManagedDeviceVector (AMReX-Codes#2999) commit b3e0a62 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Oct 26 15:02:13 2022 -0700 Pre- and Post-interpolation hook interface (AMReX-Codes#2991) Support both Fab and MultiFab versions of pre- and post-interpolation hooks. Because the pre-interp hook might modify the data, we need to make a copy to avoid modifying cached coarse data. Close AMReX-Codes#2989. commit 3082028 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Oct 19 19:24:10 2022 -0700 Update GitHub Actions (AMReX-Codes#2996) https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/ ## Summary ## Additional background ## Checklist The proposed changes: - [ ] fix a bug or incorrect behavior in AMReX - [ ] add new capabilities to AMReX - [ ] changes answers in the test suite to more than roundoff level - [ ] are likely to significantly affect the results of downstream AMReX users - [ ] include documentation in the code and/or rst files, if appropriate commit 0b88bfd Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Oct 19 13:39:18 2022 -0700 Add user defined BC types (AMReX-Codes#2995) Add BCType::user_1, BCType::user_2 and BCType::user_3. Previously the only "user" type is ext_dir (external Dirichlet). The BC types are passed from the user's code to FillPatch, which in turn passes them back to the user provided BC filling function. These new types will make it easy for the user to determine the user defined BC types in their BC filling functions. commit 9502b99 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Tue Oct 18 10:20:06 2022 -0700 Add BCRec::set for convenience (AMReX-Codes#2993) commit 4dbfbac Author: Thierry Antoun <thierry.antoun@ensta-paris.fr> Date: Mon Oct 17 15:05:54 2022 -0700 Adding AMReX_RESTRICT for GPU Test commit 7051a6c Author: Thierry Antoun <thierry.antoun@ensta-paris.fr> Date: Mon Oct 17 15:03:19 2022 -0700 Modyfing RedistributeMPI to make it work with 2 ranks commit 56b6402 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Sat Oct 15 14:59:38 2022 -0700 ParallelFor with compile time optimization of kernels with run time parameters (AMReX-Codes#2954) Branches inside ParallelFor can be very expensive. If a branch uses a lot of resources (e.g., registers), it can significantly affect the performance even if at run time the branch is never executed because it affects the GPU occupancy. For CPUs, it can affect vectorization of the kernel. The new ParallelFor functions use C++17 fold expression to generate kernel launches for all run time variants. Only one will be executed. Which one is chosen at run time depends the run time parameters. The kernel function can use constexpr if to discard unused code blocks for better run time performance. Here are two examples of how to use them. int runtime_option = ...; enum All_options : int { A0, A1, A2, A3}; // Four ParallelFors will be generated. ParallelFor(TypeList<CompileTimeOptions<A0,A1,A2,A3>>{}, {runtime_option}, box, [=] AMREX_GPU_DEVICE (int i, int j, int k, auto control) { ... if constexpr (control.value == A0) { ... } else if constexpr (control.value == A1) { ... } else if constexpr (control.value == A2) { ... else { ... } ... }); and int A_runtime_option = ...; int B_runtime_option = ...; enum A_options : int { A0, A1, A2, A3}; enum B_options : int { B0, B1 }; // 4*2=8 ParallelFors will be generated. ParallelFor(TypeList<CompileTimeOptions<A0,A1,A2,A3>, CompileTimeOptions<B0,B1> > {}, {A_runtime_option, B_runtime_option}, N, [=] AMREX_GPU_DEVICE (int i, auto A_control, auto B_control) { ... if constexpr (A_control.value == A0) { ... } else if constexpr (A_control.value == A1) { ... } else if constexpr (A_control.value == A2) { ... else { ... } if constexpr (A_control.value != A3 && B_control.value == B1) { ... } ... }); Note that that due to a limitation of CUDA's extended device lambda, the constexpr if block cannot be the one that captures a variable first. If nvcc complains about it, you will have to manually capture it outside constexpr if. The data type for the parameters is int. Thank Maikel Nadolski and Alex Sinn for showing us the meta-programming techniques used here. commit bcbf17f Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Fri Oct 14 19:48:14 2022 -0700 2D RZ solver for WarpX: Arbitrary coefficient (AMReX-Codes#2986) The assumption in the 2D RZ solver for WarpX used to be there was no sigma_r (i.e., sigma_r == 1). In this PR, we allow arbitrary sigma_r coefficient. commit 9a3cd5d Author: Axel Huebl <axel.huebl@plasma.ninja> Date: Fri Oct 14 17:27:41 2022 -0700 CMake Docs: Fix User-Guidance (Link) (AMReX-Codes#2990) Update the user-guidance on CMake dependency linking to CMake 3.0+ (anno. 2014+). Seen in AMReX-Codes#2978 commit 1ad4144 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Fri Oct 14 10:36:17 2022 -0700 Runge-Kutta support for AMR (AMReX-Codes#2974) This adds RK2, RK3 and RK4 in a new namespace RungeKutta. Together with the enhanced FillPatcher class, these functions can be used for RK time stepping in AMR simulations. A new function AmrLevel::RK is added for AmrLevel based codes. See CNS::advance in Tests/GPU/CNS/CNS_advance.cpp for an example of using the new AmrLevel::RK function. The main motivation for this PR is that ghost cell filling for high order (> 2) RK methods at coarse/fine boundary is non-trivial when there is subcycling. Co-authored-by: Jean M. Sexton <jmsexton@lbl.gov> commit c841ae8 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Fri Oct 14 10:03:34 2022 -0700 Fourth-order interpolation from fine to coarse level (AMReX-Codes#2987) For fourth-order finite-difference methods with data at cell centers, we cannot use the usual averageDown function to overwrite coarse level data with fine data. We actually need to do interpolation. commit 975b830 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Fri Oct 14 09:53:22 2022 -0700 Fix EB data inconsistency when fixing small cells and multiple cuts (AMReX-Codes#2943) ## Summary For consistency, we need to call the function that zeros out the level set even if that box does not have any small cells or multiple cuts. This is because a node could exist in multiple boxes. Furthermore, a covered cell or covered face may have a node with a level set < 0. ## Additional background This is usually not an issue. However, in WarpX, we use the level set to decide whether a node is an unknown in the linear system. The inconsistency makes the solver fail in some cases. commit 9c2264b Author: Axel Huebl <axel.huebl@plasma.ninja> Date: Fri Oct 14 07:41:06 2022 -0700 `MFIter::Finalize`: Free `m_fa` (AMReX-Codes#2988) This `free` should potentially not be delayed until the destructor is called. Follow-up to AMReX-Codes#2985 AMReX-Codes#2983 commit f84c7a8 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Oct 12 10:44:11 2022 -0700 Fix MLMG::getGradSolution & getFluxes for inhomogeneous Neumann and Robin BC (AMReX-Codes#2984) Because of the way how inhomogeneous and Robin BC are handled, we must add the inhomogeneous fluxes back, otherwise they would be zero at those boundaries. commit ed1ecd6 Author: Axel Huebl <axel.huebl@plasma.ninja> Date: Wed Oct 12 08:46:34 2022 -0700 MFIter: Make Finalize Public (AMReX-Codes#2985) Follow-up to AMReX-Codes#2983 commit 5acfe07 Author: Axel Huebl <axel.huebl@plasma.ninja> Date: Tue Oct 11 14:51:48 2022 -0700 MFIter::Finalize (AMReX-Codes#2983) Add a Finalize function to MFIter. The idea about this is, that we can call this already before destruction in Python, where `for` loops do not create scope. This function must be robust enough to be called again in the constructor (or we need to add an extra bool to guard that it is not called again in the destructor). Co-authored-by: Weiqun Zhang <WeiqunZhang@lbl.gov> commit 53e34d1 Author: Andy Nonaka <AJNonaka@lbl.gov> Date: Tue Oct 11 12:00:34 2022 -0700 fix docs; Robin BC's for MLMG (AMReX-Codes#2982) Update the MLMG Robin BC description in the docs. commit 0019b3a Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Tue Oct 11 11:00:13 2022 -0700 MLLinOp::postSolve (AMReX-Codes#2981) Add a virtual function MLLinOp::postSolve. This allows WarpX to set EB covered nodes to prescribed values in the solver's output for visualization purpose. commit 2d87a4c Author: Brandon Runnels <brunnels@uccs.edu> Date: Mon Oct 10 09:49:29 2022 -0600 add templating for the cell bilinear interpolators (AMReX-Codes#2979) This templates the `mf_cell_bilin_interp` functions so that the interpolators can be used with `BaseFab`s of arbitrary type. commit e4ab048 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Oct 5 12:03:41 2022 -0700 FillPatcher class (AMReX-Codes#2972) This adds a class FillPatcher for filling fine level data. It's not as general as the various FillPatch functions (e.g., FillPatchTwoLevels). However, it can reduce the amount of communication data. Suppose we use RK2 with subcycling and the refinement ratio is 2. For each step on level 0, there are two steps on level 1. With RK2, each fine step needs to call FillPatch twice. So the total number of FillPatch calls is 4 in the two fine steps. Using the free function, one ParallelCopy per FillPatch call is needed for copying coarse data for spatial interpolation. With the FillPatcher class, two ParallelCopy calls will be done to copy old and new coarse data. Then these data will be used in the four FillPatcher::fill calls. This new approach saves two ParallelCopy calls per coarse step for a two levels run. It could save more if the time stepping requires more substeps or the refinement ratio is higher. Note that many of our AMReX codes use a time stepping algorithm that needs only one FillPatch call per step. For those codes, this new approach will not save any communication for a refinement ratio of 2. However, it will save communication when the refinement ratio is 4. commit 1bc4e4e Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Mon Oct 3 16:50:45 2022 -0700 Remove sycl namespace alias (AMReX-Codes#2971) This causes a conflict with new compilers. commit de7b7f4 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Mon Oct 3 14:06:58 2022 -0700 Fix Tensor Solver BC (AMReX-Codes#2930) This fixes some bugs in the physical domain BC of tensor linear solver. At the corner of two no-slip walls (e.g., (0,0)), we have u(-1,0) = -u(0,0) and u(0,-1) = -u(0,0). It's incorrect to fill the corner ghost cell with u(-1,-1) = u(-1,0) + u(0,-1) - u(0,0), because it will result in u(-1,-1) = -3 * u(0,0). In the old approach, to avoid branches in computing transverse derivatives on cell faces, we fill the ghost cells first. For example, to compute du/dy at the lo-x boundary, we use the data in i = -1 and 0, just like we compute du/dy(i) using u(i-1) and u(i) for interior faces. The problem is the normal velocity in the ghost cells outside a wall is filled with extrapolation of the Dirichlet value (which is zero) and more than 1 interior cells. Because of the high-order extrapolation, u(-1) != -u(0). This is the desired approach for computing du/dx on the wall. However, this produces incorrect results in dudy. In the new approach, we explicitly handle the boundaries in the derivative stencil. For example, to compute transverse derivatives on an inflow face, we use the boundary values directly. Co-authored-by: cgilet <cgilet@gmail.com> commit 13aa4df Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Fri Sep 30 17:48:22 2022 -0700 Disable host device for macros for SYCL/DPC++ (AMReX-Codes#2969) The host part of the AMREX_HOST_DEVICE_FOR_* macros is disabled for SYCL/DPC++. It's really slow for compilation. commit 62379fb Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Fri Sep 30 15:37:35 2022 -0700 Update CHANGES for 22.10 (AMReX-Codes#2968) commit d65e09e Author: Roberto Porcu <53792251+rporcu@users.noreply.github.com> Date: Thu Sep 29 15:46:19 2022 -0400 Solve an issue with particles async IO when having runtime added variables (AMReX-Codes#2966) commit cd07b0d Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Sep 28 09:20:42 2022 -0700 Fix int overflow in amrex::bisect (AMReX-Codes#2964) Change from (lo+hi)/2 to lo+(hi-lo)/2. Although it's very unlikely, it's possible (lo+hi), where both lo and hi are integers, could overflow. commit e55d6b4 Author: Junghyeon Park <j824h@outlook.com> Date: Thu Sep 29 01:20:15 2022 +0900 Update the SWFFT project site (AMReX-Codes#2965) commit b84d7c0 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Mon Sep 26 16:05:10 2022 -0700 Fix MLEBNodeFDLaplacian bottom solver (AMReX-Codes#2963) MLEBNodeFDLaplacian is never singular because it has Dirichlet boundary on the EB surface. We did set the singular flag to false, but forgot about the bottom solver used a different function to query. This fixes it by overriding the isBottomSingular function. commit 5e84f43 Author: asalmgren <asalmgren@lbl.gov> Date: Sun Sep 25 09:38:51 2022 -0700 make tagging routines EB_aware (AMReX-Codes#2962) commit 8b367b0 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Sun Sep 25 09:22:13 2022 -0700 Volume weighted sum (AMReX-Codes#2961) Add a new function doing volume weighted sum across AMR levels. This may not be exactly what amrex application codes want. But it should work for many cases. commit 2a3cc05 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Fri Sep 23 12:24:05 2022 -0700 CellData: data in a single cell (AMReX-Codes#2959) This adds struct CellData that allows for accessing data in a single cell in Array4. This is convenient sometimes because one can omit the i, j and k indices. It might also be faster sometimes because it can skip the repeated index calculation involving i,j,k. commit 27ef106 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Fri Sep 23 12:23:34 2022 -0700 Quartic interpolation for cell centered data (AMReX-Codes#2960) New Interpolator for interpolation of cell centered data using a fourth-degreee polynomial. Note that the interpolation is not conservative and does not do any slope limiting. commit c4b7982 Author: Luca Fedeli <luca.fedeli@cea.fr> Date: Fri Sep 23 21:17:12 2022 +0200 Add GPU-compatible upper bound and lower bound algorithms to AMReX_Algorithm (AMReX-Codes#2958) commit 3e5cc77 Author: Don E. Willcox <dwillcox@users.noreply.github.com> Date: Tue Sep 20 17:59:48 2022 -0700 add option for makebuildsources to specify the style arguments for 'git describe'. (AMReX-Codes#2957) commit a6e0c11 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Tue Sep 20 10:01:21 2022 -0700 Add more warnings (AMReX-Codes#2956) * Add -Wnon-virtual-dtor -Wlogical-op -Wmisleading-indentation -Wduplicated-cond -Wduplicated-branches to gcc. * Add -Wnon-virtual-dtor to clang. * Add more warnings to CI. * Fix some non-virtual dtors and some other warnings. commit 826cd37 Author: Phil Miller <phil.miller@intensecomputing.com> Date: Thu Sep 15 17:26:00 2022 -0700 Add roundoff_lo corresponding to roundoff_hi for domains that don't start at 0 (AMReX-Codes#2950) * Lay groundwork for roundoff_lo * Add dummy implementation of roundoff_lo computation * implement bisect_prob_lo * change idx -> dxinv * use rlo instead of plo in locateParticle Co-authored-by: atmyers <atmyers2@gmail.com> commit 6a5a056 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Thu Sep 15 13:23:40 2022 -0700 Add template parameter to ParallelFor and launch specifying block size (AMReX-Codes#2947) By default, amrex::ParallelFor launches AMREX_GPU_MAX_THREADS threads per block. We can now explicitly specfiy the block size with `ParallelFor<BLOCK_SIZE>(...)`, where BLOCK_SIZE should be a multiple of the warp size (e.g., 64, 128, etc.). A similar change has also been made to `launch`. The changes are backward compatible. commit 2cdb9df Author: Andrew Myers <atmyers2@gmail.com> Date: Thu Sep 15 10:55:41 2022 -0700 Byte spread fixes (AMReX-Codes#2949) commit 17c94cc Author: Candace Gilet <cgilet@users.noreply.github.com> Date: Wed Sep 14 11:49:35 2022 -0400 Correct MultiFab::norm0 doxygen brief description (AMReX-Codes#2946) commit 0351c99 Author: Axel Huebl <axel.huebl@plasma.ninja> Date: Wed Sep 14 08:48:25 2022 -0700 CMake: HIP_PATH from ROCM_PATH (AMReX-Codes#2948) * On machines like Crusher, `ROCM_PATH` is more likely to be available then a `HIP_PATH` environment variable. This is mainly needed for our hacky ROCTX hints. * ROCTX: New Include Supposedly, there is a new include we shall use: Ref.: ROCm/roctracer#79 * ROCtracer: Include as System library Because of GNU extensions in the roctracer include files for the legacy include. But we should make this `-isystem` anyway to be robust for the future. The 5.2 deprecated include file `<roctracer_ext.h>` throws warnings because they rely on GNU extensions: ``` In file included from /opt/rocm/hip/../roctracer/include/ext/prof_protocol.h:27: /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:70:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct] struct { ^ /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:70:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types] /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:75:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types] struct { ^ /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:82:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct] struct { ^ /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:86:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct] struct { ^ /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:90:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct] struct { ^ /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:82:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types] struct { ^ /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:86:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types] struct { ^ /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:90:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types] struct { ^ ``` * GNUmake: Update Includes in `hip.mak` Use public prefix. commit 9aa23c2 Author: Cody Balos <balos1@llnl.gov> Date: Mon Sep 12 11:49:37 2022 -0700 Fix minor typo in fcompare docs (AMReX-Codes#2945) commit bfbd68f Author: Axel Huebl <axel.huebl@plasma.ninja> Date: Mon Sep 12 11:40:55 2022 -0700 Fix: Make Finalize->Initialize->F->I->... Work (AMReX-Codes#2944) Fix assertions in Arena::Initialize. The_BArena never dies (tm) Co-authored-by: Weiqun Zhang <WeiqunZhang@lbl.gov> commit 6738470 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Sep 7 14:12:34 2022 -0700 Changes for Cray & Clang (AMReX-Codes#2941) * It seems that the new Cray compilers no longer define `_CRAYC`. However it does define `__cray__`. * For Clang based Cray compilers, use -O3 instead of -O2 for optimization. * Clang's vectorization pragma is very aggressive. For some codes, it makes ParallelFor with many if statements on CPU much slower than without vectorization. Unfortunately, it does not have an ivdep pragma. So we disable AMREX_PRAGMA for clang for safety. * No longer need to use -Wno-pass-failed for Clang based compilers. commit 5b0c598 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Sep 7 09:42:57 2022 -0700 Fix a warning in packing communication send buffer (AMReX-Codes#2940) When we communication double precision data in single precision, there is a conversion from double to float in packing the send buffer. A static cast is added to fix the warning. commit 3e397bb Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Sep 7 09:13:53 2022 -0700 Link to cublas when using CUDA and Hypre (AMReX-Codes#2933) commit 9525ea8 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Sep 7 09:13:20 2022 -0700 HIP: use coarse grained host memory (AMReX-Codes#2932) commit 7e04016 Author: Marco Garten <mgarten@lbl.gov> Date: Wed Sep 7 08:53:20 2022 -0700 Update Testing Docs (AMReX-Codes#2937) - document `abort_on_unused_inputs` - remove duplicate superfluous argument in regtest call commit 539427a Author: drangara <69211175+drangara@users.noreply.github.com> Date: Tue Sep 6 18:13:42 2022 -0400 EB checkpoint files (AMReX-Codes#2897) * support for loading EB from checkpoint file * add support for writing chkpt file as well Co-authored-by: Weiqun Zhang <WeiqunZhang@lbl.gov> commit 35ed6b4 Author: Axel Huebl <axel.huebl@plasma.ninja> Date: Tue Sep 6 15:07:16 2022 -0700 Fix: Loading Files Again (AMReX-Codes#2936) This enables that `amrex::ParmParse::addfile` can be called multiple times. Before this, we accidentially overwrite the `FILE` static keyword. Follow-up to AMReX-Codes#2842 commit 8f8198c Author: hengjiew <86926839+hengjiew@users.noreply.github.com> Date: Tue Sep 6 13:36:35 2022 -0400 Check if boundary particles container has been created before clearance. (AMReX-Codes#2935) This fixes a segmentation fault when using more GPUs for updating particles than fluid. commit fb0b31e Author: Nuno Miguel Nobre <nuno.nobre@stfc.ac.uk> Date: Sun Sep 4 05:18:49 2022 +0100 SYCL: Replace deprecated atomic types and operations (AMReX-Codes#2921) * SYCL: Replace deprecated atomic types and operations * Change atomic refs to device memory scope When using the relaxed memory order, the memory scope is ignored. Thus, for cosmetic reasons only, we set the memory scope to device, the broadest option when using the global address space. Co-authored-by: Weiqun Zhang <WeiqunZhang@lbl.gov> commit cc3cd14 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Thu Sep 1 07:39:25 2022 -0700 Update CHANGES for 22.09 (AMReX-Codes#2934) commit acc223f Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Tue Aug 30 16:04:43 2022 -0700 Add hypre as an option for OpenBCSolver (AMReX-Codes#2931) commit 3d29fd7 Author: hengjiew <86926839+hengjiew@users.noreply.github.com> Date: Wed Aug 24 16:10:22 2022 -0400 Preserve neighbor particles when sorting particles. (AMReX-Codes#2923) commit 8294c3a Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Mon Aug 22 10:46:05 2022 -0700 Scope of NonLocalBC::ParallelCopy (AMReX-Codes#2922) Make NonLocalBC::ParallelCopy accessible in namespace amrex, because it can be useful in situations other than non-local BC. commit 0911fc4 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Sun Aug 21 18:13:07 2022 -0700 Open Boundary Poisson Solver (AMReX-Codes#2912) This adds an open boundary Poisson solver based on the James's algorithm. To use it, the user builds an amrex:OpenBCSolver object, which can be reused until the grids change, and then call OpenBCSolver::solver. Currently, this is for 3D cell-centered data only. The solver works on CPU, Nvidia GPUS, and AMD GPUs. The SYCL version of a couple of kernels for Intel GPUs are to be implemented. commit f270b3d Author: Marc T. Henry de Frahan <marchdf@gmail.com> Date: Thu Aug 18 13:51:56 2022 -0600 Fix OOB access of ref ratio on HDF write header (AMReX-Codes#2919) commit fa8e20f Author: Jean M. Sexton <jmsexton@lbl.gov> Date: Thu Aug 18 08:57:51 2022 -0700 Add Polaris to GNUMake (AMReX-Codes#2908) commit bd5f6a9 Author: Axel Huebl <axel.huebl@plasma.ninja> Date: Mon Aug 15 14:24:21 2022 -0700 Export GpuDevice Globals (AMReX-Codes#2918) * Export GpuDevice Globals Implement symbol export via `AMREX_EXPORT` for the global variables in `Src/Base/AMReX_GpuDevice.H`. Follow-up to AMReX-Codes#1847 AMReX-Codes#1847 Fix AMReX-Codes#2917 * Fix: Export `AMReX::m_instance` commit 4f63929 Author: asalmgren <asalmgren@lbl.gov> Date: Sat Aug 13 09:00:02 2022 -0700 enable LinOp to use the right Factory (fixes moving geometry problem) (AMReX-Codes#2916) commit 6593518 Author: Andrew Myers <atmyers2@gmail.com> Date: Thu Aug 11 15:24:16 2022 -0700 Use 1 atomic instead of two per item in DenseBins::build (AMReX-Codes#2911) commit d295f22 Author: Nuno Miguel Nobre <nuno.nobre@stfc.ac.uk> Date: Thu Aug 11 03:40:09 2022 +0100 [SYCL] Remove amrex::oneapi and update deprecated device descriptors (AMReX-Codes#2910) * Remove amrex::oneapi in favour of standard features * Change deprecated device descriptors commit 1bda173 Author: Axel Huebl <axel.huebl@plasma.ninja> Date: Wed Aug 10 15:46:43 2022 -0600 Add: `MultiFab::sum_unique` (AMReX-Codes#2909) This provides a new method to sum values in a `MultiFab`. For non-cell-centered data, `MultiFab::sum` double counts box boundary values that are owned by multiple boxes. This provides a function that does not double count these and provides a quick way to get only the sum of physically unique values. Co-authored-by: Weiqun Zhang <WeiqunZhang@lbl.gov> commit 3f715d2 Author: Candace Gilet <cgilet@users.noreply.github.com> Date: Mon Aug 8 14:40:28 2022 -0400 In MLMG::mgFcycle, assert that for EB the linop is cell-centered. (AMReX-Codes#2905) commit 59b0742 Author: hengjiew <86926839+hengjiew@users.noreply.github.com> Date: Mon Aug 8 14:17:57 2022 -0400 Clear the boundary particle indices' container before updating it. (AMReX-Codes#2907) This avoids potential segmentation faults when one grid's particles all move to other grids. commit 103db6e Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Fri Aug 5 15:25:33 2022 -0700 EB: Add Fine Levels (AMReX-Codes#2881) Add a new function EB2::addFineLevels() that can be used to add more fine levels to the existing EB IndexSpace without changing the coarse levels. This is useful for restarting with a larger amr.max_level. The issue is we build EB at the finest level first and then coarsen it to the coarse levels. If the restart run has a different finest level, the EB on the coarse levels could be different without using this new capability. commit 6ebf8ff Author: Jon Rood <jon.rood@nrel.gov> Date: Thu Aug 4 14:32:59 2022 -0600 Add rpath to lib64 for ZFP. (AMReX-Codes#2902) commit ed23627 Author: Yadong_Zeng <30739800+ruohai0925@users.noreply.github.com> Date: Thu Aug 4 16:32:21 2022 -0400 change data types from double to amrex::Real, and thus we can use single precision for the hypre IJ interface (AMReX-Codes#2896) Co-authored-by: yzeng <yzeng@altair.com> commit 9ed4f59 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Aug 3 16:53:20 2022 -0700 Fix a new bug introduced in AMReX-Codes#2858 (AMReX-Codes#2901) We need to take into account that `amrex::Any` stores `MultiFab&` or `MultiFab const&`. commit 6eaab8c Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Aug 3 13:39:44 2022 -0700 MPMD Support (AMReX-Codes#2895) Add support for multiple programs multiple data (MPMD). For now, we assume there are only two programs (i.e., executables) in the MPMD mode. During the initialization, MPI_COMM_WORLD is split into two communicators. The MPMD::Copier class can be used to copy FabArray/MultiFab data between two programs. This new capability can be used by FHDeX to couple FHD with SPPARKS. commit 9469329 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Mon Aug 1 09:43:21 2022 -0700 MLMG interface (AMReX-Codes#2858) These changes are made to support a generic type (i.e., amrex::Any) in MLMG. This is still work in progress. But it should not break any existing codes. commit 5a3b303 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Mon Aug 1 09:34:44 2022 -0700 Update CHANGES for 22.08 (AMReX-Codes#2894) commit 48702b4 Author: hengjiew <86926839+hengjiew@users.noreply.github.com> Date: Thu Jul 28 14:14:19 2022 -0400 Let `selectActualNeighbors` return right after starting if there are (AMReX-Codes#2886) no particles for communication. commit 6a47d89 Author: kngott <kngott@lbl.gov> Date: Wed Jul 27 17:03:04 2022 -0700 Add Comm Sync to Redistribute (AMReX-Codes#2891) commit 51542c8 Author: philip-blakely <46958218+philip-blakely@users.noreply.github.com> Date: Wed Jul 27 17:29:26 2022 +0100 Multi-materials and derived variable output (AMReX-Codes#2888) ## Summary Output small plots if only derived variables are specified. Also, make DeriveFuncFab a std::function<> instead of plain function-pointer. ## Additional background We have been implementing small-plots for outputing variables at gauges (e.g. pressure at specific gauge locations). We may want to output the derived variable pressure only, and not all state-variables. The if-condition was incorrect in this case. Further, multi-material simulations require a material index in order to compute derived variables, in addition to existing parameters. Making DeriveFuncFab a std::function is sufficient for our purposes. commit ce0fb74 Author: Andrew Myers <atmyers2@gmail.com> Date: Tue Jul 26 16:20:38 2022 -0700 Fix host / device sync bug in PODVector (AMReX-Codes#2890) commit 06753e6 Author: Axel Huebl <axel.huebl@plasma.ninja> Date: Tue Jul 26 12:54:35 2022 -0700 `TagBoxArray::collate`: Fujitsu Clang (AMReX-Codes#2889) `mpiFCC -Nclang` only defines `__CLANG_FUJITSU`, not `__FUJITSU` as in the classic compiler mode. commit 7cf77dc Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Tue Jul 26 11:01:21 2022 -0700 MinLoc and MaxLoc Support (AMReX-Codes#2885) Add struct ValLocPair that can be used by ReduceOps/ReduceData and ParReduce to find the location of the min/max value. Add warp shuffle down function for more general types. This is needed for MinLoc/MaxLoc with CUDA < 11, because we don't use CUB for earlier versions of CUDA. The Intel GPU support is not done yet. We need to allocate enough shared local memory when the size of ValLocPair is larger than the size of unsigned long long. commit 4b7e200 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Thu Jul 21 10:25:57 2022 -0700 HIP: Remove the call to hipDeviceSetSharedMemConfig (AMReX-Codes#2884) AMD devices do not support shared cache banking. Thanks @afanfa for reporting this. (AMReX-Codes#2883) commit 8e40952 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Jul 20 12:10:26 2022 -0700 Add Frontier to GNU Make (AMReX-Codes#2879) commit b673d81 Author: Max Katz <maxpkatz@gmail.com> Date: Mon Jul 18 15:14:19 2022 -0400 Add option to derefine to AMRErrorTag (AMReX-Codes#2875) This allows a refinement field to specify *derefinement* (by setting a zone's tagging value to the clear value). commit 73dbf2f Author: hengjiew <86926839+hengjiew@users.noreply.github.com> Date: Mon Jul 18 12:53:35 2022 -0400 Fix the segmentation fault in selecting actual neighbor particles. (AMReX-Codes#2877) commit 40b3d21 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Wed Jul 13 13:24:15 2022 -0700 Add extra braces in initialization of GpuArray (AMReX-Codes#2876) It should not be needed since C++14. But some compilers seem to need the double braces. commit a633d2b Author: Luca Fedeli <luca.fedeli@cea.fr> Date: Fri Jul 8 20:34:18 2022 +0200 Workaround to bypass issue observed at very large scale with Fujitsu MPI (AMReX-Codes#2874) We have observed some MPI issues at very large scale when WarpX is compiled using Fujitsu MPI (i.e., with the Fujitsu compiler). These issues seem to be related to the use of MPI Gatherv with MPI_Datatype. This PR implements a possible workaround, initially proposed by @WeiqunZhang . The idea is that, when WarpX is compiled with the Fujitsu compiler, simpler integer arrays instead of MPI_Datatype are used in the routine where the issue was observed. commit 7660c88 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Fri Jul 8 08:48:14 2022 -0700 Allow zero components MultiFab and BaseFab (AMReX-Codes#2873) This is useful for particle I/O that does not have any mesh data. yt needs a header file associated with a MultiFab. commit c849dd1 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Fri Jul 8 08:06:37 2022 -0700 New EB optimization parameter: eb2.num_coarsen_opt (AMReX-Codes#2872) At the beginning of EB generation, we chop the entire finest domain into boxes and find out the type of the boxes. We then collect the completely covered boxes and cut boxes into two BoxArrays. This process can be costly because of the number of calls to the implicit functions. In this commit, we have introduced a new ParmParse parameter, eb2.num_coarsen_opt with a default value of zero. If for instance it is set to 3, we start the box type categorization at a resolution that is coarsened by a factor of 2^3. For the provisional cut boxes, we refine them by a factor of 2, Then we chop them into small boxes and categorize the new boxes. This process is performed recursively until we are at the original finest resolution. The users should be aware that, if eb2.num_coaren_opt is too big, this could produce in erroneous results because evaluating the implicit function on coarse boxes could miss fine structures in the EB. Thank Robert Marskar for sharing this algorithm. commit 557aae8 Author: Erik <epalmer@lbl.gov> Date: Wed Jul 6 08:54:24 2022 -0700 point to new location of AMReX images, AMReX website repo (AMReX-Codes#2867) commit cbdc658 Author: Axel Huebl <axel.huebl@plasma.ninja> Date: Tue Jul 5 01:41:03 2022 +0200 SENSEI 4.0: Fix Build for Particles (AMReX-Codes#2869) ## Summary This part causes a compile error now in WarpX. cc @burlen @kwryankrattiger ## Additional background X-ref: Blocks WarpX 22.07 release ECP-WarpX/WarpX#3211 Follow-up to: - AMReX-Codes#2785 - AMReX-Codes#2834 commit dc8b734 Author: Andrew Myers <atmyers2@gmail.com> Date: Fri Jul 1 17:19:20 2022 -0700 Cache the neighbor comm tags for the CPU implementation of fillNeighbors. (AMReX-Codes#2862) * Cache the neighbor comm tags for the CPU implementation of fillNeighbors. * fix areMasksValid function commit 2b42fb5 Author: drangara <69211175+drangara@users.noreply.github.com> Date: Fri Jul 1 18:44:35 2022 -0400 Remove some hard checks in check_mvmc for 3D (AMReX-Codes#2864) Removing some hard checks in 3D coarsening logic as it appears that those are not necessarily bad states, and a soft failure to coarsen should suffice. commit 19c7068 Author: Erik <epalmer@lbl.gov> Date: Fri Jul 1 18:24:24 2022 -0400 Carry over fix for ngbxy.smallEnd typo (AMReX-Codes#2868) This a typo that got correct in other places but didn't get fixed here. commit d736ef2 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Fri Jul 1 11:00:15 2022 -0700 Update CHANGES for 22.07 (AMReX-Codes#2866) commit be813d0 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Fri Jul 1 10:29:13 2022 -0700 Hypre: add version check (AMReX-Codes#2865) These HYPRE_SetSp* are only available in hypre >= 22500. commit 8fb23ec Author: Jon Rood <jon.rood@nrel.gov> Date: Wed Jun 29 16:52:35 2022 -0600 Refactor Make.nrel to use MPT for MPI with the Intel compiler on Eagle. (AMReX-Codes#2861) commit 6f9a46c Author: PaulMullowney <60452402+PaulMullowney@users.noreply.github.com> Date: Wed Jun 29 11:09:57 2022 -0600 Adding control APIs and namespacing for core algorithm paths like SpGEMM, SpMV, and SpTrans. (AMReX-Codes#2859) Co-authored-by: Paul Mullowney <Paul.Mullowney@nrel.gov> commit e4c83cf Author: Jon Rood <jon.rood@nrel.gov> Date: Wed Jun 29 11:08:42 2022 -0600 Add lib64 library location for ZFP since it may exist there instead of lib. (AMReX-Codes#2860) commit b2b9150 Author: Burlen Loring <bloring@lbl.gov> Date: Tue Jun 28 13:42:41 2022 -0700 update the SENSEI in situ coupling for SENSEI v4.0.0 (AMReX-Codes#2785) In this release, an install of VTK is no longer required. To compile AMReX w/ SENSEI use: ```cmake -DAMReX_SENSEI=ON -DSENSEI_DIR=<path to SENSEI install>/<lib dir>/cmake ``` Note: <lib dir> may be `lib` or `lib64` or something else depending on your OS and is determined by CMake at configure time. See the CMake GNUInstallDirs documentation for more information. commit 2c5f475 Author: Andrew Myers <atmyers2@gmail.com> Date: Tue Jun 28 12:51:19 2022 -0700 Write runtime attribs to checkpoints on GPUs (AMReX-Codes#2856) commit d2cb546 Author: Jon Rood <jon.rood@nrel.gov> Date: Tue Jun 28 13:27:02 2022 -0600 Fix gnu make on Crusher for mpi_gtl_hsa (AMReX-Codes#2857) Update environment variable at OLCF for mpi_gtl_hsa. commit 21fe4b3 Author: Axel Huebl <axel.huebl@plasma.ninja> Date: Tue Jun 28 19:53:09 2022 +0200 CMake: FindDependency CUDAToolkit (AMReX-Codes#2849) If we install AMReX with CUDA support using a modern CMake, we need to repopulate targets such as `CUDA::curand` from `find_dependency` for downstream. Downstream users find us via `find_package` and that target link dependency showed up to be unpopulated in MFIX. commit 027f2ff Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Thu Jun 23 16:15:57 2022 -0700 Fix make help (AMReX-Codes#2854) This reverts the change in AMReX-Codes#2845, which fixed an issue with `make print-%`, but broke `make help`. This is now fixed in a different way. Both `make print-%` and `make help` should work now. commit 3d3ad21 Author: kngott <kngott@lbl.gov> Date: Thu Jun 23 13:39:59 2022 -0700 NERSC Programming Environment prototype (AMReX-Codes#2848) commit 4872676 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Thu Jun 23 12:41:20 2022 -0700 GNU Make: No need to query mpif90 if Fortran is not used. (AMReX-Codes#2852) This minimize potential issues. commit fc0d646 Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Thu Jun 23 12:23:55 2022 -0700 Remove f90doc (AMReX-Codes#2851) We no longer use it. commit 5188a6a Author: Weiqun Zhang <WeiqunZhang@lbl.gov> Date: Thu Jun 23 11:09:15 2022 -0700 Explicitly invoke python3 (AMReX-Codes#2850) According to PEP 394, a python distributor may choose to not provide the python command. In fact, that's what recent versions of macOS do. commit 2d931f6 Author: Andrew Myers <atmyers2@gmail.com> Date: Wed Jun 22 15:03:50 2022 -0500 Maintain the high end of the 'roundoff domain' in both float and double precision (AMReX-Codes#2839) * Maintain the high end of the 'roundoff domain' in both float and double precision * fix shadowing * fix warning * fix float conversion warning * fix logic * Update Src/Base/AMReX_Geometry.H * Update Src/Base/AMReX_Geometry.H

ax3l added the bug label Aug 15, 2022

ax3l requested review from atmyers and WeiqunZhang August 15, 2022 17:13

Export GpuDevice Globals

b29de95

Implement symbol export via `AMREX_EXPORT` for the global variables in `Src/Base/AMReX_GpuDevice.H`. Follow-up to AMReX-Codes#1847 AMReX-Codes#1847 Fix AMReX-Codes#2917

ax3l force-pushed the fix-export-win-globals-gpudevice branch from 4687b0f to b29de95 Compare August 15, 2022 17:31

ax3l added the install label Aug 15, 2022

Fix: Export AMReX::m_instance

3e44b73

WeiqunZhang approved these changes Aug 15, 2022

View reviewed changes

WeiqunZhang merged commit bd5f6a9 into AMReX-Codes:development Aug 15, 2022

ax3l deleted the fix-export-win-globals-gpudevice branch August 15, 2022 21:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export GpuDevice Globals #2918

Export GpuDevice Globals #2918

ax3l commented Aug 15, 2022 •

edited

Loading

Export GpuDevice Globals #2918

Export GpuDevice Globals #2918

Conversation

ax3l commented Aug 15, 2022 • edited Loading

Summary

Additional background

Checklist

ax3l commented Aug 15, 2022 •

edited

Loading