Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove sycl namespace alias #2971

Merged
merged 1 commit into from
Oct 3, 2022

Conversation

WeiqunZhang
Copy link
Member

This causes a conflict with new compilers.

This causes a conflict with new compilers.
@WeiqunZhang WeiqunZhang merged commit 1bc4e4e into AMReX-Codes:development Oct 3, 2022
@WeiqunZhang WeiqunZhang deleted the namespace_sycl branch October 3, 2022 23:50
atmyers added a commit to Thierry992/amrex that referenced this pull request Nov 2, 2022
commit 10e99fb
Merge: d03045d f1e1d6f
Author: Andrew Myers <atmyers2@gmail.com>
Date:   Wed Nov 2 14:06:00 2022 -0700

    Merge branch 'particle_soa_refactor' of github.com:Thierry992/amrex into HEAD

commit d03045d
Author: Andrew Myers <atmyers2@gmail.com>
Date:   Wed Nov 2 14:04:23 2022 -0700

    fix buffer pack / unpack

commit d771fc8
Author: Andrew Myers <atmyers2@gmail.com>
Date:   Wed Nov 2 14:04:08 2022 -0700

    revert to one int for each id for now

commit f1e1d6f
Merge: 4dbfbac c4a4811
Author: Axel Huebl <axel.huebl@plasma.ninja>
Date:   Tue Nov 1 15:18:54 2022 -0500

    Merge remote-tracking branch 'mainline/development' into particle_soa_refactor

commit c4a4811
Author: Axel Huebl <axel.huebl@plasma.ninja>
Date:   Tue Nov 1 14:08:38 2022 -0500

    C++17 Transition (AMReX-Codes#2992)

    ## Summary

    Update AMReX to require C++17 or newer.

    - [x] docs
    - [x] CMake
    - [x] GNUmake
    - [x] CI

    ## Additional background

    Requires a mature [C++17](https://en.wikipedia.org/wiki/C%2B%2B17)
    compiler, e.g., GCC 8, Clang 7, NVCC 11.0, MSVC 19.15 or newer.

    Already used since 1+ year in production by downstream codes such as
    Castro and WarpX. Needed for modernization and new features such as
    AMReX-Codes#2878

    Co-authored-by: Weiqun Zhang <weiqunzhang@lbl.gov>

commit d2b8293
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Tue Nov 1 09:01:54 2022 -0700

    Update CHANGES for 22.11 (AMReX-Codes#3006)

commit 5ec270b
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Tue Nov 1 08:59:44 2022 -0700

    Fix compilation for PETSc (AMReX-Codes#3005)

    We cannot include PETSc headers too early because it might redefine MPI
    routines as macros
    (https://github.com/petsc/petsc/blob/main/include/petsclog.h#L441). They
    break MPI calls like below,

        MPI_Allreduce(&tmp, &vi, 1,
                      ParallelDescriptor::Mpi_typemap<T>::type(),
    ParallelDescriptor::Mpi_op<T,amrex::Greater<T>>(), comm);

    because of the `,` in `<T,amrex::Greater<T>>`.

commit 735c351
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Sat Oct 29 10:57:23 2022 -0700

    MPI Reduce for ValLocPair (AMReX-Codes#3003)

    Add ParallelReduce::Min, ParallelReduce::Max, ParallelAllReduce::Min,
    and ParallelAllReduce::Max for ValLocPair<TV,TI>, where TV and TI are
    types that have corresponding MPI types (e.g., int, Real, IntVect, Box,
    etc.).

commit 3ec0768
Author: Axel Huebl <axel.huebl@plasma.ninja>
Date:   Wed Oct 26 16:49:40 2022 -0700

    `FabArray::isDefined` (AMReX-Codes#2997)

    ## Summary

    Add a new query to `define_function_called`.

    ## Additional background

    This is a cheaper check than `ok()` for finding out if a MultiFab has
    been allocated or not yet, assuming that the calling code follows the
    convention that `define()` is called collectively.

    Update: It turns out you can also call `empty` inherited from
    `FabArrayBase`. The new API is quite explicit, which is ok, too.

    Co-authored-by: Weiqun Zhang <WeiqunZhang@lbl.gov>

commit 7f3c908
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Oct 26 16:40:16 2022 -0700

    Make The_Device_Arena non-managed (AMReX-Codes#2998)

    The_Device_Arena used to be a separate Arena. We changed it to be an
    alias of The_Arena to avoid memory fragmentation. However, the issue is
    we don't have an Arena that can allocate non-managed memory unless
    The_Arena is not managed. Because of performance concerns, we sometimes
    want to allocate non-managed memory. Therefore, we make The_Device_Arena
    an alias if and only if The_Arena is not managed.

commit ab8c892
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Oct 26 15:59:39 2022 -0700

    Add alias template Gpu::NonManagedDeviceVector (AMReX-Codes#2999)

commit b3e0a62
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Oct 26 15:02:13 2022 -0700

    Pre- and Post-interpolation hook interface (AMReX-Codes#2991)

    Support both Fab and MultiFab versions of pre- and post-interpolation
    hooks.

    Because the pre-interp hook might modify the data, we need to make a copy to
    avoid modifying cached coarse data.

    Close AMReX-Codes#2989.

commit 3082028
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Oct 19 19:24:10 2022 -0700

    Update GitHub Actions (AMReX-Codes#2996)

    https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/

    ## Summary

    ## Additional background

    ## Checklist

    The proposed changes:
    - [ ] fix a bug or incorrect behavior in AMReX
    - [ ] add new capabilities to AMReX
    - [ ] changes answers in the test suite to more than roundoff level
    - [ ] are likely to significantly affect the results of downstream AMReX
    users
    - [ ] include documentation in the code and/or rst files, if appropriate

commit 0b88bfd
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Oct 19 13:39:18 2022 -0700

    Add user defined BC types (AMReX-Codes#2995)

    Add BCType::user_1, BCType::user_2 and BCType::user_3. Previously the
    only "user" type is ext_dir (external Dirichlet). The BC types are
    passed from the user's code to FillPatch, which in turn passes them back
    to the user provided BC filling function. These new types will make it
    easy for the user to determine the user defined BC types in their BC
    filling functions.

commit 9502b99
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Tue Oct 18 10:20:06 2022 -0700

    Add BCRec::set for convenience (AMReX-Codes#2993)

commit 4dbfbac
Author: Thierry Antoun <thierry.antoun@ensta-paris.fr>
Date:   Mon Oct 17 15:05:54 2022 -0700

    Adding AMReX_RESTRICT for GPU Test

commit 7051a6c
Author: Thierry Antoun <thierry.antoun@ensta-paris.fr>
Date:   Mon Oct 17 15:03:19 2022 -0700

    Modyfing RedistributeMPI to make it work with 2 ranks

commit 56b6402
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Sat Oct 15 14:59:38 2022 -0700

    ParallelFor with compile time optimization of kernels with run time parameters (AMReX-Codes#2954)

    Branches inside ParallelFor can be very expensive. If a branch uses a
    lot of resources (e.g., registers), it can significantly affect the
    performance even if at run time the branch is never executed because it
    affects the GPU occupancy. For CPUs, it can affect vectorization of the
    kernel.

    The new ParallelFor functions use C++17 fold expression to generate
    kernel launches for all run time variants. Only one will be executed.
    Which one is chosen at run time depends the run time parameters. The
    kernel function can use constexpr if to discard unused code blocks for
    better run time performance. Here are two examples of how to use them.

        int runtime_option = ...;
        enum All_options : int { A0, A1, A2, A3};
        // Four ParallelFors will be generated.
    ParallelFor(TypeList<CompileTimeOptions<A0,A1,A2,A3>>{},
    {runtime_option},
    box, [=] AMREX_GPU_DEVICE (int i, int j, int k, auto control)
        {
            ...
            if constexpr (control.value == A0) {
                ...
            } else if constexpr (control.value == A1) {
                ...
            } else if constexpr (control.value == A2) {
                ...
            else {
                ...
            }
            ...
        });

    and

        int A_runtime_option = ...;
        int B_runtime_option = ...;
        enum A_options : int { A0, A1, A2, A3};
        enum B_options : int { B0, B1 };
        // 4*2=8 ParallelFors will be generated.
        ParallelFor(TypeList<CompileTimeOptions<A0,A1,A2,A3>,
                             CompileTimeOptions<B0,B1> > {},
                    {A_runtime_option, B_runtime_option},
    N, [=] AMREX_GPU_DEVICE (int i, auto A_control, auto B_control)
        {
            ...
            if constexpr (A_control.value == A0) {
                ...
            } else if constexpr (A_control.value == A1) {
                ...
            } else if constexpr (A_control.value == A2) {
                ...
            else {
                ...
            }
            if constexpr (A_control.value != A3 && B_control.value == B1) {
                ...
            }
            ...
        });

    Note that that due to a limitation of CUDA's extended device lambda, the
    constexpr if block cannot be the one that captures a variable first. If
    nvcc complains about it, you will have to manually capture it outside
    constexpr if. The data type for the parameters is int.

    Thank Maikel Nadolski and Alex Sinn for showing us the meta-programming
    techniques used here.

commit bcbf17f
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Fri Oct 14 19:48:14 2022 -0700

    2D RZ solver for WarpX: Arbitrary coefficient (AMReX-Codes#2986)

    The assumption in the 2D RZ solver for WarpX used to be there was no
    sigma_r (i.e., sigma_r == 1). In this PR, we allow arbitrary sigma_r
    coefficient.

commit 9a3cd5d
Author: Axel Huebl <axel.huebl@plasma.ninja>
Date:   Fri Oct 14 17:27:41 2022 -0700

    CMake Docs: Fix User-Guidance (Link) (AMReX-Codes#2990)

    Update the user-guidance on CMake dependency linking to CMake 3.0+
    (anno. 2014+).

    Seen in AMReX-Codes#2978

commit 1ad4144
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Fri Oct 14 10:36:17 2022 -0700

    Runge-Kutta support for AMR (AMReX-Codes#2974)

    This adds RK2, RK3 and RK4 in a new namespace RungeKutta. Together with
    the enhanced FillPatcher class, these functions can be used for RK time
    stepping in AMR simulations. A new function AmrLevel::RK is added for
    AmrLevel based codes. See CNS::advance in Tests/GPU/CNS/CNS_advance.cpp
    for an example of using the new AmrLevel::RK function.

    The main motivation for this PR is that ghost cell filling for high
    order (> 2) RK methods at coarse/fine boundary is non-trivial when there
    is subcycling.

    Co-authored-by: Jean M. Sexton <jmsexton@lbl.gov>

commit c841ae8
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Fri Oct 14 10:03:34 2022 -0700

    Fourth-order interpolation from fine to coarse level (AMReX-Codes#2987)

    For fourth-order finite-difference methods with data at cell centers, we
    cannot use the usual averageDown function to overwrite coarse level data
    with fine data. We actually need to do interpolation.

commit 975b830
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Fri Oct 14 09:53:22 2022 -0700

    Fix EB data inconsistency when fixing small cells and multiple cuts (AMReX-Codes#2943)

    ## Summary

    For consistency, we need to call the function that zeros out the level
    set even if that box does not have any small cells or multiple cuts.
    This is because a node could exist in multiple boxes. Furthermore, a
    covered cell or covered face may have a node with a level set < 0.

    ## Additional background

    This is usually not an issue. However, in WarpX, we use the level set to
    decide whether a node is an unknown in the linear system. The
    inconsistency makes the solver fail in some cases.

commit 9c2264b
Author: Axel Huebl <axel.huebl@plasma.ninja>
Date:   Fri Oct 14 07:41:06 2022 -0700

    `MFIter::Finalize`: Free `m_fa` (AMReX-Codes#2988)

    This `free` should potentially not be delayed until the destructor is
    called.

    Follow-up to AMReX-Codes#2985 AMReX-Codes#2983

commit f84c7a8
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Oct 12 10:44:11 2022 -0700

    Fix MLMG::getGradSolution & getFluxes for inhomogeneous Neumann and Robin BC (AMReX-Codes#2984)

    Because of the way how inhomogeneous and Robin BC are handled, we must
    add the inhomogeneous fluxes back, otherwise they would be zero at those
    boundaries.

commit ed1ecd6
Author: Axel Huebl <axel.huebl@plasma.ninja>
Date:   Wed Oct 12 08:46:34 2022 -0700

    MFIter: Make Finalize Public (AMReX-Codes#2985)

    Follow-up to AMReX-Codes#2983

commit 5acfe07
Author: Axel Huebl <axel.huebl@plasma.ninja>
Date:   Tue Oct 11 14:51:48 2022 -0700

    MFIter::Finalize (AMReX-Codes#2983)

    Add a Finalize function to MFIter.

    The idea about this is, that we can call this already before destruction
    in Python, where `for` loops do not create scope.

    This function must be robust enough to be called again in the
    constructor (or we need to add an extra bool to guard that it is not
    called again in the destructor).

    Co-authored-by: Weiqun Zhang <WeiqunZhang@lbl.gov>

commit 53e34d1
Author: Andy Nonaka <AJNonaka@lbl.gov>
Date:   Tue Oct 11 12:00:34 2022 -0700

    fix docs; Robin BC's for MLMG (AMReX-Codes#2982)

    Update the MLMG Robin BC description in the docs.

commit 0019b3a
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Tue Oct 11 11:00:13 2022 -0700

    MLLinOp::postSolve (AMReX-Codes#2981)

    Add a virtual function MLLinOp::postSolve. This allows WarpX to set EB
    covered nodes to prescribed values in the solver's output for
    visualization purpose.

commit 2d87a4c
Author: Brandon Runnels <brunnels@uccs.edu>
Date:   Mon Oct 10 09:49:29 2022 -0600

    add templating for the cell bilinear interpolators (AMReX-Codes#2979)

    This templates the `mf_cell_bilin_interp` functions so that the
    interpolators can be used with `BaseFab`s of arbitrary type.

commit e4ab048
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Oct 5 12:03:41 2022 -0700

    FillPatcher class (AMReX-Codes#2972)

    This adds a class FillPatcher for filling fine level data. It's not as
    general as the various FillPatch functions (e.g., FillPatchTwoLevels).
    However, it can reduce the amount of communication data. Suppose we use
    RK2 with subcycling and the refinement ratio is 2. For each step on
    level 0, there are two steps on level 1. With RK2, each fine step needs
    to call FillPatch twice. So the total number of FillPatch calls is 4 in
    the two fine steps. Using the free function, one ParallelCopy per
    FillPatch call is needed for copying coarse data for spatial
    interpolation. With the FillPatcher class, two ParallelCopy calls will
    be done to copy old and new coarse data. Then these data will be used in
    the four FillPatcher::fill calls. This new approach saves two
    ParallelCopy calls per coarse step for a two levels run. It could save
    more if the time stepping requires more substeps or the refinement ratio
    is higher. Note that many of our AMReX codes use a time stepping
    algorithm that needs only one FillPatch call per step. For those codes,
    this new approach will not save any communication for a refinement ratio
    of 2. However, it will save communication when the refinement ratio is
    4.

commit 1bc4e4e
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Mon Oct 3 16:50:45 2022 -0700

    Remove sycl namespace alias (AMReX-Codes#2971)

    This causes a conflict with new compilers.

commit de7b7f4
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Mon Oct 3 14:06:58 2022 -0700

    Fix Tensor Solver BC (AMReX-Codes#2930)

    This fixes some bugs in the physical domain BC of tensor linear solver.

    At the corner of two no-slip walls (e.g., (0,0)), we have u(-1,0) =
    -u(0,0)
    and u(0,-1) = -u(0,0). It's incorrect to fill the corner ghost cell with
    u(-1,-1) = u(-1,0) + u(0,-1) - u(0,0), because it will result in
    u(-1,-1) =
    -3 * u(0,0).

    In the old approach, to avoid branches in computing transverse
    derivatives
    on cell faces, we fill the ghost cells first. For example, to compute
    du/dy
    at the lo-x boundary, we use the data in i = -1 and 0, just like we
    compute
    du/dy(i) using u(i-1) and u(i) for interior faces.  The problem is the
    normal velocity in the ghost cells outside a wall is filled with
    extrapolation of the Dirichlet value (which is zero) and more than 1
    interior cells. Because of the high-order extrapolation, u(-1) != -u(0).
    This is the desired approach for computing du/dx on the wall. However,
    this
    produces incorrect results in dudy.

    In the new approach, we explicitly handle the boundaries in the
    derivative
    stencil. For example, to compute transverse derivatives on an inflow
    face,
    we use the boundary values directly.

    Co-authored-by: cgilet <cgilet@gmail.com>

commit 13aa4df
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Fri Sep 30 17:48:22 2022 -0700

    Disable host device for macros for SYCL/DPC++ (AMReX-Codes#2969)

    The host part of the AMREX_HOST_DEVICE_FOR_* macros is disabled for
    SYCL/DPC++. It's really slow for compilation.

commit 62379fb
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Fri Sep 30 15:37:35 2022 -0700

    Update CHANGES for 22.10 (AMReX-Codes#2968)

commit d65e09e
Author: Roberto Porcu <53792251+rporcu@users.noreply.github.com>
Date:   Thu Sep 29 15:46:19 2022 -0400

    Solve an issue with particles async IO when having runtime added variables (AMReX-Codes#2966)

commit cd07b0d
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Sep 28 09:20:42 2022 -0700

    Fix int overflow in amrex::bisect (AMReX-Codes#2964)

    Change from (lo+hi)/2 to lo+(hi-lo)/2.  Although it's very unlikely, it's
    possible (lo+hi), where both lo and hi are integers, could overflow.

commit e55d6b4
Author: Junghyeon Park <j824h@outlook.com>
Date:   Thu Sep 29 01:20:15 2022 +0900

    Update the SWFFT project site (AMReX-Codes#2965)

commit b84d7c0
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Mon Sep 26 16:05:10 2022 -0700

    Fix MLEBNodeFDLaplacian bottom solver (AMReX-Codes#2963)

    MLEBNodeFDLaplacian is never singular because it has Dirichlet boundary on
    the EB surface.  We did set the singular flag to false, but forgot about the
    bottom solver used a different function to query.  This fixes it by
    overriding the isBottomSingular function.

commit 5e84f43
Author: asalmgren <asalmgren@lbl.gov>
Date:   Sun Sep 25 09:38:51 2022 -0700

    make tagging routines EB_aware (AMReX-Codes#2962)

commit 8b367b0
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Sun Sep 25 09:22:13 2022 -0700

    Volume weighted sum (AMReX-Codes#2961)

    Add a new function doing volume weighted sum across AMR levels.  This may
    not be exactly what amrex application codes want.  But it should work for
    many cases.

commit 2a3cc05
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Fri Sep 23 12:24:05 2022 -0700

    CellData: data in a single cell (AMReX-Codes#2959)

    This adds struct CellData that allows for accessing data in a single cell in
    Array4.  This is convenient sometimes because one can omit the i, j and k
    indices.  It might also be faster sometimes because it can skip the repeated
    index calculation involving i,j,k.

commit 27ef106
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Fri Sep 23 12:23:34 2022 -0700

    Quartic interpolation for cell centered data (AMReX-Codes#2960)

    New Interpolator for interpolation of cell centered data using a
    fourth-degreee polynomial.  Note that the interpolation is not conservative
    and does not do any slope limiting.

commit c4b7982
Author: Luca Fedeli <luca.fedeli@cea.fr>
Date:   Fri Sep 23 21:17:12 2022 +0200

    Add GPU-compatible upper bound and lower bound algorithms to AMReX_Algorithm (AMReX-Codes#2958)

commit 3e5cc77
Author: Don E. Willcox <dwillcox@users.noreply.github.com>
Date:   Tue Sep 20 17:59:48 2022 -0700

    add option for makebuildsources to specify the style arguments for 'git describe'. (AMReX-Codes#2957)

commit a6e0c11
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Tue Sep 20 10:01:21 2022 -0700

    Add more warnings (AMReX-Codes#2956)

    * Add -Wnon-virtual-dtor -Wlogical-op -Wmisleading-indentation
      -Wduplicated-cond -Wduplicated-branches to gcc.

    * Add -Wnon-virtual-dtor to clang.

    * Add more warnings to CI.

    * Fix some non-virtual dtors and some other warnings.

commit 826cd37
Author: Phil Miller <phil.miller@intensecomputing.com>
Date:   Thu Sep 15 17:26:00 2022 -0700

    Add roundoff_lo corresponding to roundoff_hi for domains that don't start at 0 (AMReX-Codes#2950)

    * Lay groundwork for roundoff_lo

    * Add dummy implementation of roundoff_lo computation

    * implement bisect_prob_lo

    * change idx -> dxinv

    * use rlo instead of plo in locateParticle

    Co-authored-by: atmyers <atmyers2@gmail.com>

commit 6a5a056
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Thu Sep 15 13:23:40 2022 -0700

    Add template parameter to ParallelFor and launch specifying block size (AMReX-Codes#2947)

    By default, amrex::ParallelFor launches AMREX_GPU_MAX_THREADS threads per
    block. We can now explicitly specfiy the block size with
    `ParallelFor<BLOCK_SIZE>(...)`, where BLOCK_SIZE should be a multiple of the
    warp size (e.g., 64, 128, etc.).  A similar change has also been made to
    `launch`.

    The changes are backward compatible.

commit 2cdb9df
Author: Andrew Myers <atmyers2@gmail.com>
Date:   Thu Sep 15 10:55:41 2022 -0700

    Byte spread fixes (AMReX-Codes#2949)

commit 17c94cc
Author: Candace Gilet <cgilet@users.noreply.github.com>
Date:   Wed Sep 14 11:49:35 2022 -0400

    Correct MultiFab::norm0 doxygen brief description (AMReX-Codes#2946)

commit 0351c99
Author: Axel Huebl <axel.huebl@plasma.ninja>
Date:   Wed Sep 14 08:48:25 2022 -0700

    CMake: HIP_PATH from ROCM_PATH (AMReX-Codes#2948)

    * On machines like Crusher, `ROCM_PATH` is more likely to be available
    then a `HIP_PATH` environment variable.

    This is mainly needed for our hacky ROCTX hints.

    * ROCTX: New Include

    Supposedly, there is a new include we shall use:

    Ref.:
    ROCm/roctracer#79

    * ROCtracer: Include as System library

    Because of GNU extensions in the roctracer include files for the legacy include.
    But we should make this `-isystem` anyway to be robust for the future.

    The 5.2 deprecated include file `<roctracer_ext.h>` throws warnings
    because they rely on GNU extensions:
    ```
    In file included from /opt/rocm/hip/../roctracer/include/ext/prof_protocol.h:27:
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:70:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
          struct {
          ^
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:70:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:75:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
          struct {
          ^
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:82:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
          struct {
          ^
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:86:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
          struct {
          ^
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:90:7: warning: anonymous structs are a GNU extension [-Wgnu-anonymous-struct]
          struct {
          ^
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:82:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
          struct {
          ^
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:86:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
          struct {
          ^
    /opt/rocm/hip/../roctracer/include/ext/../../../include/roctracer/ext/prof_protocol.h:90:7: warning: anonymous types declared in an anonymous union are an extension [-Wnested-anon-types]
          struct {
          ^
    ```

    * GNUmake: Update Includes in `hip.mak`

    Use public prefix.

commit 9aa23c2
Author: Cody Balos <balos1@llnl.gov>
Date:   Mon Sep 12 11:49:37 2022 -0700

    Fix minor typo in fcompare docs (AMReX-Codes#2945)

commit bfbd68f
Author: Axel Huebl <axel.huebl@plasma.ninja>
Date:   Mon Sep 12 11:40:55 2022 -0700

    Fix: Make Finalize->Initialize->F->I->... Work (AMReX-Codes#2944)

    Fix assertions in Arena::Initialize.  The_BArena never dies (tm)

    Co-authored-by: Weiqun Zhang <WeiqunZhang@lbl.gov>

commit 6738470
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Sep 7 14:12:34 2022 -0700

    Changes for Cray & Clang (AMReX-Codes#2941)

    * It seems that the new Cray compilers no longer define `_CRAYC`.  However it does define
      `__cray__`.

    * For Clang based Cray compilers, use -O3 instead of -O2 for optimization.

    * Clang's vectorization pragma is very aggressive.  For some codes, it makes ParallelFor
      with many if statements on CPU much slower than without vectorization.  Unfortunately,
      it does not have an ivdep pragma.  So we disable AMREX_PRAGMA for clang for safety.

    * No longer need to use -Wno-pass-failed for Clang based compilers.

commit 5b0c598
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Sep 7 09:42:57 2022 -0700

    Fix a warning in packing communication send buffer (AMReX-Codes#2940)

    When we communication double precision data in single precision, there is a
    conversion from double to float in packing the send buffer.  A static cast
    is added to fix the warning.

commit 3e397bb
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Sep 7 09:13:53 2022 -0700

    Link to cublas when using CUDA and Hypre (AMReX-Codes#2933)

commit 9525ea8
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Sep 7 09:13:20 2022 -0700

    HIP: use coarse grained host memory (AMReX-Codes#2932)

commit 7e04016
Author: Marco Garten <mgarten@lbl.gov>
Date:   Wed Sep 7 08:53:20 2022 -0700

    Update Testing Docs (AMReX-Codes#2937)

    - document `abort_on_unused_inputs`
    - remove duplicate superfluous argument in regtest call

commit 539427a
Author: drangara <69211175+drangara@users.noreply.github.com>
Date:   Tue Sep 6 18:13:42 2022 -0400

    EB checkpoint files (AMReX-Codes#2897)

    * support for loading EB from checkpoint file

    * add support for writing chkpt file as well

    Co-authored-by: Weiqun Zhang <WeiqunZhang@lbl.gov>

commit 35ed6b4
Author: Axel Huebl <axel.huebl@plasma.ninja>
Date:   Tue Sep 6 15:07:16 2022 -0700

    Fix: Loading Files Again (AMReX-Codes#2936)

    This enables that `amrex::ParmParse::addfile` can be called
    multiple times. Before this, we accidentially overwrite the
    `FILE` static keyword.

    Follow-up to AMReX-Codes#2842

commit 8f8198c
Author: hengjiew <86926839+hengjiew@users.noreply.github.com>
Date:   Tue Sep 6 13:36:35 2022 -0400

    Check if boundary particles container has been created before clearance. (AMReX-Codes#2935)

    This fixes a segmentation fault when using more GPUs for updating particles
    than fluid.

commit fb0b31e
Author: Nuno Miguel Nobre <nuno.nobre@stfc.ac.uk>
Date:   Sun Sep 4 05:18:49 2022 +0100

    SYCL: Replace deprecated atomic types and operations (AMReX-Codes#2921)

    * SYCL: Replace deprecated atomic types and operations

    * Change atomic refs to device memory scope

    When using the relaxed memory order, the memory scope is ignored.
    Thus, for cosmetic reasons only, we set the memory scope to device, the broadest option when using the global address space.

    Co-authored-by: Weiqun Zhang <WeiqunZhang@lbl.gov>

commit cc3cd14
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Thu Sep 1 07:39:25 2022 -0700

    Update CHANGES for 22.09 (AMReX-Codes#2934)

commit acc223f
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Tue Aug 30 16:04:43 2022 -0700

    Add hypre as an option for OpenBCSolver (AMReX-Codes#2931)

commit 3d29fd7
Author: hengjiew <86926839+hengjiew@users.noreply.github.com>
Date:   Wed Aug 24 16:10:22 2022 -0400

    Preserve neighbor particles when sorting particles. (AMReX-Codes#2923)

commit 8294c3a
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Mon Aug 22 10:46:05 2022 -0700

    Scope of NonLocalBC::ParallelCopy (AMReX-Codes#2922)

    Make NonLocalBC::ParallelCopy accessible in namespace amrex, because it can
    be useful in situations other than non-local BC.

commit 0911fc4
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Sun Aug 21 18:13:07 2022 -0700

    Open Boundary Poisson Solver (AMReX-Codes#2912)

    This adds an open boundary Poisson solver based on the James's algorithm.
    To use it, the user builds an amrex:OpenBCSolver object, which can be reused
    until the grids change, and then call OpenBCSolver::solver.

    Currently, this is for 3D cell-centered data only. The solver works on CPU,
    Nvidia GPUS, and AMD GPUs.  The SYCL version of a couple of kernels for
    Intel GPUs are to be implemented.

commit f270b3d
Author: Marc T. Henry de Frahan <marchdf@gmail.com>
Date:   Thu Aug 18 13:51:56 2022 -0600

    Fix OOB access of ref ratio on HDF write header (AMReX-Codes#2919)

commit fa8e20f
Author: Jean M. Sexton <jmsexton@lbl.gov>
Date:   Thu Aug 18 08:57:51 2022 -0700

    Add Polaris to GNUMake (AMReX-Codes#2908)

commit bd5f6a9
Author: Axel Huebl <axel.huebl@plasma.ninja>
Date:   Mon Aug 15 14:24:21 2022 -0700

    Export GpuDevice Globals (AMReX-Codes#2918)

    * Export GpuDevice Globals

    Implement symbol export via `AMREX_EXPORT` for the global variables
    in `Src/Base/AMReX_GpuDevice.H`.

    Follow-up to AMReX-Codes#1847 AMReX-Codes#1847

    Fix AMReX-Codes#2917

    * Fix: Export `AMReX::m_instance`

commit 4f63929
Author: asalmgren <asalmgren@lbl.gov>
Date:   Sat Aug 13 09:00:02 2022 -0700

    enable LinOp to use the right Factory (fixes moving geometry problem) (AMReX-Codes#2916)

commit 6593518
Author: Andrew Myers <atmyers2@gmail.com>
Date:   Thu Aug 11 15:24:16 2022 -0700

    Use 1 atomic instead of two per item in DenseBins::build (AMReX-Codes#2911)

commit d295f22
Author: Nuno Miguel Nobre <nuno.nobre@stfc.ac.uk>
Date:   Thu Aug 11 03:40:09 2022 +0100

    [SYCL] Remove amrex::oneapi and update deprecated device descriptors (AMReX-Codes#2910)

    * Remove amrex::oneapi in favour of standard features

    * Change deprecated device descriptors

commit 1bda173
Author: Axel Huebl <axel.huebl@plasma.ninja>
Date:   Wed Aug 10 15:46:43 2022 -0600

    Add: `MultiFab::sum_unique` (AMReX-Codes#2909)

    This provides a new method to sum values in a `MultiFab`.
    For non-cell-centered data, `MultiFab::sum` double counts box
    boundary values that are owned by multiple boxes. This provides
    a function that does not double count these and provides a
    quick way to get only the sum of physically unique values.

    Co-authored-by: Weiqun Zhang <WeiqunZhang@lbl.gov>

commit 3f715d2
Author: Candace Gilet <cgilet@users.noreply.github.com>
Date:   Mon Aug 8 14:40:28 2022 -0400

    In MLMG::mgFcycle, assert that for EB the linop is cell-centered. (AMReX-Codes#2905)

commit 59b0742
Author: hengjiew <86926839+hengjiew@users.noreply.github.com>
Date:   Mon Aug 8 14:17:57 2022 -0400

    Clear the boundary particle indices' container before updating it. (AMReX-Codes#2907)

    This avoids potential segmentation faults when one grid's particles all
    move to other grids.

commit 103db6e
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Fri Aug 5 15:25:33 2022 -0700

    EB: Add Fine Levels (AMReX-Codes#2881)

    Add a new function EB2::addFineLevels() that can be used to add more fine
    levels to the existing EB IndexSpace without changing the coarse levels.
    This is useful for restarting with a larger amr.max_level.  The issue is we
    build EB at the finest level first and then coarsen it to the coarse levels.
    If the restart run has a different finest level, the EB on the coarse levels
    could be different without using this new capability.

commit 6ebf8ff
Author: Jon Rood <jon.rood@nrel.gov>
Date:   Thu Aug 4 14:32:59 2022 -0600

    Add rpath to lib64 for ZFP. (AMReX-Codes#2902)

commit ed23627
Author: Yadong_Zeng <30739800+ruohai0925@users.noreply.github.com>
Date:   Thu Aug 4 16:32:21 2022 -0400

    change data types from double to amrex::Real, and thus we can use single precision for the hypre IJ interface (AMReX-Codes#2896)

    Co-authored-by: yzeng <yzeng@altair.com>

commit 9ed4f59
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Aug 3 16:53:20 2022 -0700

    Fix a new bug introduced in AMReX-Codes#2858 (AMReX-Codes#2901)

    We need to take into account that `amrex::Any` stores `MultiFab&` or `MultiFab const&`.

commit 6eaab8c
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Aug 3 13:39:44 2022 -0700

    MPMD Support (AMReX-Codes#2895)

    Add support for multiple programs multiple data (MPMD).  For now, we assume
    there are only two programs (i.e., executables) in the MPMD mode.  During
    the initialization, MPI_COMM_WORLD is split into two communicators.  The
    MPMD::Copier class can be used to copy FabArray/MultiFab data between two
    programs.  This new capability can be used by FHDeX to couple FHD with
    SPPARKS.

commit 9469329
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Mon Aug 1 09:43:21 2022 -0700

    MLMG interface (AMReX-Codes#2858)

    These changes are made to support a generic type (i.e., amrex::Any) in MLMG.
    This is still work in progress.  But it should not break any existing codes.

commit 5a3b303
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Mon Aug 1 09:34:44 2022 -0700

    Update CHANGES for 22.08 (AMReX-Codes#2894)

commit 48702b4
Author: hengjiew <86926839+hengjiew@users.noreply.github.com>
Date:   Thu Jul 28 14:14:19 2022 -0400

    Let `selectActualNeighbors` return right after starting if there are (AMReX-Codes#2886)

    no particles for communication.

commit 6a47d89
Author: kngott <kngott@lbl.gov>
Date:   Wed Jul 27 17:03:04 2022 -0700

    Add Comm Sync to Redistribute (AMReX-Codes#2891)

commit 51542c8
Author: philip-blakely <46958218+philip-blakely@users.noreply.github.com>
Date:   Wed Jul 27 17:29:26 2022 +0100

    Multi-materials and derived variable output (AMReX-Codes#2888)

    ## Summary

    Output small plots if only derived variables are specified.
    Also, make DeriveFuncFab a std::function<> instead of plain function-pointer.

    ## Additional background

    We have been implementing small-plots for outputing variables at gauges (e.g. pressure at specific gauge locations). We may want to output the derived variable pressure only, and not all state-variables. The if-condition was incorrect in this case.

    Further, multi-material simulations require a material index in order to compute derived variables, in addition to existing parameters. Making DeriveFuncFab a std::function is sufficient for our purposes.

commit ce0fb74
Author: Andrew Myers <atmyers2@gmail.com>
Date:   Tue Jul 26 16:20:38 2022 -0700

    Fix host / device sync bug in PODVector (AMReX-Codes#2890)

commit 06753e6
Author: Axel Huebl <axel.huebl@plasma.ninja>
Date:   Tue Jul 26 12:54:35 2022 -0700

    `TagBoxArray::collate`: Fujitsu Clang (AMReX-Codes#2889)

    `mpiFCC -Nclang` only defines `__CLANG_FUJITSU`, not `__FUJITSU` as
    in the classic compiler mode.

commit 7cf77dc
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Tue Jul 26 11:01:21 2022 -0700

    MinLoc and MaxLoc Support (AMReX-Codes#2885)

    Add struct ValLocPair that can be used by ReduceOps/ReduceData and ParReduce
    to find the location of the min/max value.

    Add warp shuffle down function for more general types.  This is needed for
    MinLoc/MaxLoc with CUDA < 11, because we don't use CUB for earlier versions
    of CUDA.

    The Intel GPU support is not done yet.  We need to allocate enough shared
    local memory when the size of ValLocPair is larger than the size of unsigned
    long long.

commit 4b7e200
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Thu Jul 21 10:25:57 2022 -0700

    HIP: Remove the call to hipDeviceSetSharedMemConfig (AMReX-Codes#2884)

    AMD devices do not support shared cache banking.

    Thanks @afanfa for reporting this. (AMReX-Codes#2883)

commit 8e40952
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Jul 20 12:10:26 2022 -0700

    Add Frontier to GNU Make (AMReX-Codes#2879)

commit b673d81
Author: Max Katz <maxpkatz@gmail.com>
Date:   Mon Jul 18 15:14:19 2022 -0400

    Add option to derefine to AMRErrorTag (AMReX-Codes#2875)

    This allows a refinement field to specify *derefinement* (by setting a zone's tagging value to the clear value).

commit 73dbf2f
Author: hengjiew <86926839+hengjiew@users.noreply.github.com>
Date:   Mon Jul 18 12:53:35 2022 -0400

    Fix the segmentation fault in selecting actual neighbor particles. (AMReX-Codes#2877)

commit 40b3d21
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Wed Jul 13 13:24:15 2022 -0700

    Add extra braces in initialization of GpuArray (AMReX-Codes#2876)

    It should not be needed since C++14.  But some compilers seem to need the
    double braces.

commit a633d2b
Author: Luca Fedeli <luca.fedeli@cea.fr>
Date:   Fri Jul 8 20:34:18 2022 +0200

    Workaround to bypass issue observed at very large scale with Fujitsu MPI (AMReX-Codes#2874)

    We have observed some MPI issues at very large scale when WarpX is compiled using Fujitsu MPI (i.e., with the Fujitsu compiler). These issues seem to be related to the use of MPI Gatherv with MPI_Datatype. This PR implements a possible workaround, initially proposed by @WeiqunZhang . The idea is that, when WarpX is compiled with the Fujitsu compiler, simpler integer arrays instead of MPI_Datatype are used in the routine where the issue was observed.

commit 7660c88
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Fri Jul 8 08:48:14 2022 -0700

    Allow zero components MultiFab and BaseFab (AMReX-Codes#2873)

    This is useful for particle I/O that does not have any mesh data.  yt needs
    a header file associated with a MultiFab.

commit c849dd1
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Fri Jul 8 08:06:37 2022 -0700

    New EB optimization parameter: eb2.num_coarsen_opt (AMReX-Codes#2872)

    At the beginning of EB generation, we chop the entire finest domain into
    boxes and find out the type of the boxes.  We then collect the completely
    covered boxes and cut boxes into two BoxArrays.  This process can be costly
    because of the number of calls to the implicit functions.  In this commit,
    we have introduced a new ParmParse parameter, eb2.num_coarsen_opt with a
    default value of zero.  If for instance it is set to 3, we start the box
    type categorization at a resolution that is coarsened by a factor of 2^3.
    For the provisional cut boxes, we refine them by a factor of 2, Then we chop
    them into small boxes and categorize the new boxes.  This process is
    performed recursively until we are at the original finest resolution.

    The users should be aware that, if eb2.num_coaren_opt is too big, this could
    produce in erroneous results because evaluating the implicit function on
    coarse boxes could miss fine structures in the EB.

    Thank Robert Marskar for sharing this algorithm.

commit 557aae8
Author: Erik <epalmer@lbl.gov>
Date:   Wed Jul 6 08:54:24 2022 -0700

    point to new location of AMReX images, AMReX website repo (AMReX-Codes#2867)

commit cbdc658
Author: Axel Huebl <axel.huebl@plasma.ninja>
Date:   Tue Jul 5 01:41:03 2022 +0200

    SENSEI 4.0: Fix Build for Particles (AMReX-Codes#2869)

    ## Summary

    This part causes a compile error now in WarpX.

    cc  @burlen @kwryankrattiger

    ## Additional background

    X-ref: Blocks WarpX 22.07 release ECP-WarpX/WarpX#3211

    Follow-up to:
    - AMReX-Codes#2785
    - AMReX-Codes#2834

commit dc8b734
Author: Andrew Myers <atmyers2@gmail.com>
Date:   Fri Jul 1 17:19:20 2022 -0700

    Cache the neighbor comm tags for the CPU implementation of fillNeighbors. (AMReX-Codes#2862)

    * Cache the neighbor comm tags for the CPU implementation of fillNeighbors.

    * fix areMasksValid function

commit 2b42fb5
Author: drangara <69211175+drangara@users.noreply.github.com>
Date:   Fri Jul 1 18:44:35 2022 -0400

    Remove some hard checks in check_mvmc for 3D (AMReX-Codes#2864)

    Removing some hard checks in 3D coarsening logic as it appears that those are not necessarily bad states, and a soft failure to coarsen should suffice.

commit 19c7068
Author: Erik <epalmer@lbl.gov>
Date:   Fri Jul 1 18:24:24 2022 -0400

    Carry over fix for ngbxy.smallEnd typo (AMReX-Codes#2868)

    This a typo that got correct in other places but didn't get fixed here.

commit d736ef2
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Fri Jul 1 11:00:15 2022 -0700

    Update CHANGES for 22.07 (AMReX-Codes#2866)

commit be813d0
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Fri Jul 1 10:29:13 2022 -0700

    Hypre: add version check (AMReX-Codes#2865)

    These HYPRE_SetSp* are only available in hypre >= 22500.

commit 8fb23ec
Author: Jon Rood <jon.rood@nrel.gov>
Date:   Wed Jun 29 16:52:35 2022 -0600

    Refactor Make.nrel to use MPT for MPI with the Intel compiler on Eagle. (AMReX-Codes#2861)

commit 6f9a46c
Author: PaulMullowney <60452402+PaulMullowney@users.noreply.github.com>
Date:   Wed Jun 29 11:09:57 2022 -0600

    Adding control APIs and namespacing for core algorithm paths like SpGEMM, SpMV, and SpTrans. (AMReX-Codes#2859)

    Co-authored-by: Paul Mullowney <Paul.Mullowney@nrel.gov>

commit e4c83cf
Author: Jon Rood <jon.rood@nrel.gov>
Date:   Wed Jun 29 11:08:42 2022 -0600

    Add lib64 library location for ZFP since it may exist there instead of lib. (AMReX-Codes#2860)

commit b2b9150
Author: Burlen Loring <bloring@lbl.gov>
Date:   Tue Jun 28 13:42:41 2022 -0700

    update the SENSEI in situ coupling for SENSEI v4.0.0 (AMReX-Codes#2785)

    In this release, an install of VTK is no longer required.
    To compile AMReX w/ SENSEI use:

    ```cmake
    -DAMReX_SENSEI=ON -DSENSEI_DIR=<path to SENSEI install>/<lib dir>/cmake
    ```

    Note: <lib dir> may be `lib` or `lib64` or something else depending on
    your OS and is determined by CMake at configure time. See the CMake
    GNUInstallDirs documentation for more information.

commit 2c5f475
Author: Andrew Myers <atmyers2@gmail.com>
Date:   Tue Jun 28 12:51:19 2022 -0700

    Write runtime attribs to checkpoints on GPUs (AMReX-Codes#2856)

commit d2cb546
Author: Jon Rood <jon.rood@nrel.gov>
Date:   Tue Jun 28 13:27:02 2022 -0600

    Fix gnu make on Crusher for mpi_gtl_hsa (AMReX-Codes#2857)

    Update environment variable at OLCF for mpi_gtl_hsa.

commit 21fe4b3
Author: Axel Huebl <axel.huebl@plasma.ninja>
Date:   Tue Jun 28 19:53:09 2022 +0200

    CMake: FindDependency CUDAToolkit (AMReX-Codes#2849)

    If we install AMReX with CUDA support using a modern
    CMake, we need to repopulate targets such as `CUDA::curand`
    from `find_dependency` for downstream.
    Downstream users find us via `find_package` and that target
    link dependency showed up to be unpopulated in MFIX.

commit 027f2ff
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Thu Jun 23 16:15:57 2022 -0700

    Fix make help (AMReX-Codes#2854)

    This reverts the change in AMReX-Codes#2845, which fixed an issue with `make print-%`, but broke
    `make help`.  This is now fixed in a different way.  Both `make print-%` and `make help`
    should work now.

commit 3d3ad21
Author: kngott <kngott@lbl.gov>
Date:   Thu Jun 23 13:39:59 2022 -0700

    NERSC Programming Environment prototype (AMReX-Codes#2848)

commit 4872676
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Thu Jun 23 12:41:20 2022 -0700

    GNU Make: No need to query mpif90 if Fortran is not used. (AMReX-Codes#2852)

    This minimize potential issues.

commit fc0d646
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Thu Jun 23 12:23:55 2022 -0700

    Remove f90doc (AMReX-Codes#2851)

    We no longer use it.

commit 5188a6a
Author: Weiqun Zhang <WeiqunZhang@lbl.gov>
Date:   Thu Jun 23 11:09:15 2022 -0700

    Explicitly invoke python3 (AMReX-Codes#2850)

    According to PEP 394, a python distributor may choose to not provide the
    python command.  In fact, that's what recent versions of macOS do.

commit 2d931f6
Author: Andrew Myers <atmyers2@gmail.com>
Date:   Wed Jun 22 15:03:50 2022 -0500

    Maintain the high end of the 'roundoff domain' in both float and double precision (AMReX-Codes#2839)

    * Maintain the high end of the 'roundoff domain' in both float and double precision

    * fix shadowing

    * fix warning

    * fix float conversion warning

    * fix logic

    * Update Src/Base/AMReX_Geometry.H

    * Update Src/Base/AMReX_Geometry.H
@nmnobre
Copy link
Contributor

nmnobre commented Nov 11, 2022

Hi @WeiqunZhang,

Thank you for this and for your work in #3024.

This change relies on CL/sycl.hpp to either:

  1. inline the cl namespace thereby exposing the sycl namespace, as in current intel compiler versions;
  2. expose the sycl namespace, as in future intel compiler versions.

The latter is unfortunate as it is non-standard. We should be including sycl/sycl.hpp which, as you know, has never been a part of Intel's production/consumer compilers (as opposed to their open-source version). I suppose AMReX never set out to offer pure sycl compatibility, it was always about intel's dpc++ flavour, so I'm guessing this change was made with that mindset?

Cheers,
-Nuno

@WeiqunZhang
Copy link
Member Author

@nmnobre What SYCL compiler are you using? If possible, we could add a CI test for it.

@nmnobre
Copy link
Contributor

nmnobre commented Nov 11, 2022

I'm using both hipSYCL and intel's open-source dpc++ which both pack sycl/sycl.hpp.
I should probably say I'm not sure what the right choice here is.
If it were up to me I'd ask intel to just add sycl/sycl.hpp, but I'm not sure where I'd ask for that, and I'm sure they have their reasons... unless you have insider knowledge they'll be doing that starting with version 2023.1 :P
I'm happy with continuing with my own fork with the patches I need in AMReX for my use case, it's probably wiser for AMReX to support the production-ready compilers?...

@WeiqunZhang
Copy link
Member Author

Could you let me know what macros does the opensouce dpc++ compiler define? dpcpp -dM -E - < /dev/null

@nmnobre
Copy link
Contributor

nmnobre commented Nov 11, 2022

clang++ -dM -E - < /dev/null gives:

#define _LP64 1
#define __ATOMIC_ACQUIRE 2
#define __ATOMIC_ACQ_REL 4
#define __ATOMIC_CONSUME 1
#define __ATOMIC_RELAXED 0
#define __ATOMIC_RELEASE 3
#define __ATOMIC_SEQ_CST 5
#define __BIGGEST_ALIGNMENT__ 16
#define __BITINT_MAXWIDTH__ 128
#define __BOOL_WIDTH__ 8
#define __BYTE_ORDER__ __ORDER_LITTLE_ENDIAN__
#define __CHAR16_TYPE__ unsigned short
#define __CHAR32_TYPE__ unsigned int
#define __CHAR_BIT__ 8
#define __CLANG_ATOMIC_BOOL_LOCK_FREE 2
#define __CLANG_ATOMIC_CHAR16_T_LOCK_FREE 2
#define __CLANG_ATOMIC_CHAR32_T_LOCK_FREE 2
#define __CLANG_ATOMIC_CHAR_LOCK_FREE 2
#define __CLANG_ATOMIC_INT_LOCK_FREE 2
#define __CLANG_ATOMIC_LLONG_LOCK_FREE 2
#define __CLANG_ATOMIC_LONG_LOCK_FREE 2
#define __CLANG_ATOMIC_POINTER_LOCK_FREE 2
#define __CLANG_ATOMIC_SHORT_LOCK_FREE 2
#define __CLANG_ATOMIC_WCHAR_T_LOCK_FREE 2
#define __CONSTANT_CFSTRINGS__ 1
#define __DBL_DECIMAL_DIG__ 17
#define __DBL_DENORM_MIN__ 4.9406564584124654e-324
#define __DBL_DIG__ 15
#define __DBL_EPSILON__ 2.2204460492503131e-16
#define __DBL_HAS_DENORM__ 1
#define __DBL_HAS_INFINITY__ 1
#define __DBL_HAS_QUIET_NAN__ 1
#define __DBL_MANT_DIG__ 53
#define __DBL_MAX_10_EXP__ 308
#define __DBL_MAX_EXP__ 1024
#define __DBL_MAX__ 1.7976931348623157e+308
#define __DBL_MIN_10_EXP__ (-307)
#define __DBL_MIN_EXP__ (-1021)
#define __DBL_MIN__ 2.2250738585072014e-308
#define __DECIMAL_DIG__ __LDBL_DECIMAL_DIG__
#define __ELF__ 1
#define __FINITE_MATH_ONLY__ 0
#define __FLOAT128__ 1
#define __FLT16_DECIMAL_DIG__ 5
#define __FLT16_DENORM_MIN__ 5.9604644775390625e-8F16
#define __FLT16_DIG__ 3
#define __FLT16_EPSILON__ 9.765625e-4F16
#define __FLT16_HAS_DENORM__ 1
#define __FLT16_HAS_INFINITY__ 1
#define __FLT16_HAS_QUIET_NAN__ 1
#define __FLT16_MANT_DIG__ 11
#define __FLT16_MAX_10_EXP__ 4
#define __FLT16_MAX_EXP__ 16
#define __FLT16_MAX__ 6.5504e+4F16
#define __FLT16_MIN_10_EXP__ (-4)
#define __FLT16_MIN_EXP__ (-13)
#define __FLT16_MIN__ 6.103515625e-5F16
#define __FLT_DECIMAL_DIG__ 9
#define __FLT_DENORM_MIN__ 1.40129846e-45F
#define __FLT_DIG__ 6
#define __FLT_EPSILON__ 1.19209290e-7F
#define __FLT_HAS_DENORM__ 1
#define __FLT_HAS_INFINITY__ 1
#define __FLT_HAS_QUIET_NAN__ 1
#define __FLT_MANT_DIG__ 24
#define __FLT_MAX_10_EXP__ 38
#define __FLT_MAX_EXP__ 128
#define __FLT_MAX__ 3.40282347e+38F
#define __FLT_MIN_10_EXP__ (-37)
#define __FLT_MIN_EXP__ (-125)
#define __FLT_MIN__ 1.17549435e-38F
#define __FLT_RADIX__ 2
#define __FXSR__ 1
#define __GCC_ASM_FLAG_OUTPUTS__ 1
#define __GCC_ATOMIC_BOOL_LOCK_FREE 2
#define __GCC_ATOMIC_CHAR16_T_LOCK_FREE 2
#define __GCC_ATOMIC_CHAR32_T_LOCK_FREE 2
#define __GCC_ATOMIC_CHAR_LOCK_FREE 2
#define __GCC_ATOMIC_INT_LOCK_FREE 2
#define __GCC_ATOMIC_LLONG_LOCK_FREE 2
#define __GCC_ATOMIC_LONG_LOCK_FREE 2
#define __GCC_ATOMIC_POINTER_LOCK_FREE 2
#define __GCC_ATOMIC_SHORT_LOCK_FREE 2
#define __GCC_ATOMIC_TEST_AND_SET_TRUEVAL 1
#define __GCC_ATOMIC_WCHAR_T_LOCK_FREE 2
#define __GCC_HAVE_DWARF2_CFI_ASM 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_1 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_2 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 1
#define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 1
#define __GNUC_MINOR__ 2
#define __GNUC_PATCHLEVEL__ 1
#define __GNUC_STDC_INLINE__ 1
#define __GNUC__ 4
#define __GXX_ABI_VERSION 1002
#define __INT16_C_SUFFIX__ 
#define __INT16_FMTd__ "hd"
#define __INT16_FMTi__ "hi"
#define __INT16_MAX__ 32767
#define __INT16_TYPE__ short
#define __INT32_C_SUFFIX__ 
#define __INT32_FMTd__ "d"
#define __INT32_FMTi__ "i"
#define __INT32_MAX__ 2147483647
#define __INT32_TYPE__ int
#define __INT64_C_SUFFIX__ L
#define __INT64_FMTd__ "ld"
#define __INT64_FMTi__ "li"
#define __INT64_MAX__ 9223372036854775807L
#define __INT64_TYPE__ long int
#define __INT8_C_SUFFIX__ 
#define __INT8_FMTd__ "hhd"
#define __INT8_FMTi__ "hhi"
#define __INT8_MAX__ 127
#define __INT8_TYPE__ signed char
#define __INTMAX_C_SUFFIX__ L
#define __INTMAX_FMTd__ "ld"
#define __INTMAX_FMTi__ "li"
#define __INTMAX_MAX__ 9223372036854775807L
#define __INTMAX_TYPE__ long int
#define __INTMAX_WIDTH__ 64
#define __INTPTR_FMTd__ "ld"
#define __INTPTR_FMTi__ "li"
#define __INTPTR_MAX__ 9223372036854775807L
#define __INTPTR_TYPE__ long int
#define __INTPTR_WIDTH__ 64
#define __INT_FAST16_FMTd__ "hd"
#define __INT_FAST16_FMTi__ "hi"
#define __INT_FAST16_MAX__ 32767
#define __INT_FAST16_TYPE__ short
#define __INT_FAST16_WIDTH__ 16
#define __INT_FAST32_FMTd__ "d"
#define __INT_FAST32_FMTi__ "i"
#define __INT_FAST32_MAX__ 2147483647
#define __INT_FAST32_TYPE__ int
#define __INT_FAST32_WIDTH__ 32
#define __INT_FAST64_FMTd__ "ld"
#define __INT_FAST64_FMTi__ "li"
#define __INT_FAST64_MAX__ 9223372036854775807L
#define __INT_FAST64_TYPE__ long int
#define __INT_FAST64_WIDTH__ 64
#define __INT_FAST8_FMTd__ "hhd"
#define __INT_FAST8_FMTi__ "hhi"
#define __INT_FAST8_MAX__ 127
#define __INT_FAST8_TYPE__ signed char
#define __INT_FAST8_WIDTH__ 8
#define __INT_LEAST16_FMTd__ "hd"
#define __INT_LEAST16_FMTi__ "hi"
#define __INT_LEAST16_MAX__ 32767
#define __INT_LEAST16_TYPE__ short
#define __INT_LEAST16_WIDTH__ 16
#define __INT_LEAST32_FMTd__ "d"
#define __INT_LEAST32_FMTi__ "i"
#define __INT_LEAST32_MAX__ 2147483647
#define __INT_LEAST32_TYPE__ int
#define __INT_LEAST32_WIDTH__ 32
#define __INT_LEAST64_FMTd__ "ld"
#define __INT_LEAST64_FMTi__ "li"
#define __INT_LEAST64_MAX__ 9223372036854775807L
#define __INT_LEAST64_TYPE__ long int
#define __INT_LEAST64_WIDTH__ 64
#define __INT_LEAST8_FMTd__ "hhd"
#define __INT_LEAST8_FMTi__ "hhi"
#define __INT_LEAST8_MAX__ 127
#define __INT_LEAST8_TYPE__ signed char
#define __INT_LEAST8_WIDTH__ 8
#define __INT_MAX__ 2147483647
#define __INT_WIDTH__ 32
#define __LDBL_DECIMAL_DIG__ 21
#define __LDBL_DENORM_MIN__ 3.64519953188247460253e-4951L
#define __LDBL_DIG__ 18
#define __LDBL_EPSILON__ 1.08420217248550443401e-19L
#define __LDBL_HAS_DENORM__ 1
#define __LDBL_HAS_INFINITY__ 1
#define __LDBL_HAS_QUIET_NAN__ 1
#define __LDBL_MANT_DIG__ 64
#define __LDBL_MAX_10_EXP__ 4932
#define __LDBL_MAX_EXP__ 16384
#define __LDBL_MAX__ 1.18973149535723176502e+4932L
#define __LDBL_MIN_10_EXP__ (-4931)
#define __LDBL_MIN_EXP__ (-16381)
#define __LDBL_MIN__ 3.36210314311209350626e-4932L
#define __LITTLE_ENDIAN__ 1
#define __LLONG_WIDTH__ 64
#define __LONG_LONG_MAX__ 9223372036854775807LL
#define __LONG_MAX__ 9223372036854775807L
#define __LONG_WIDTH__ 64
#define __LP64__ 1
#define __MMX__ 1
#define __NO_INLINE__ 1
#define __NO_MATH_INLINES 1
#define __OBJC_BOOL_IS_BOOL 0
#define __OPENCL_MEMORY_SCOPE_ALL_SVM_DEVICES 3
#define __OPENCL_MEMORY_SCOPE_DEVICE 2
#define __OPENCL_MEMORY_SCOPE_SUB_GROUP 4
#define __OPENCL_MEMORY_SCOPE_WORK_GROUP 1
#define __OPENCL_MEMORY_SCOPE_WORK_ITEM 0
#define __ORDER_BIG_ENDIAN__ 4321
#define __ORDER_LITTLE_ENDIAN__ 1234
#define __ORDER_PDP_ENDIAN__ 3412
#define __POINTER_WIDTH__ 64
#define __PRAGMA_REDEFINE_EXTNAME 1
#define __PTRDIFF_FMTd__ "ld"
#define __PTRDIFF_FMTi__ "li"
#define __PTRDIFF_MAX__ 9223372036854775807L
#define __PTRDIFF_TYPE__ long int
#define __PTRDIFF_WIDTH__ 64
#define __REGISTER_PREFIX__ 
#define __SCHAR_MAX__ 127
#define __SEG_FS 1
#define __SEG_GS 1
#define __SHRT_MAX__ 32767
#define __SHRT_WIDTH__ 16
#define __SIG_ATOMIC_MAX__ 2147483647
#define __SIG_ATOMIC_WIDTH__ 32
#define __SIZEOF_DOUBLE__ 8
#define __SIZEOF_FLOAT128__ 16
#define __SIZEOF_FLOAT__ 4
#define __SIZEOF_INT128__ 16
#define __SIZEOF_INT__ 4
#define __SIZEOF_LONG_DOUBLE__ 16
#define __SIZEOF_LONG_LONG__ 8
#define __SIZEOF_LONG__ 8
#define __SIZEOF_POINTER__ 8
#define __SIZEOF_PTRDIFF_T__ 8
#define __SIZEOF_SHORT__ 2
#define __SIZEOF_SIZE_T__ 8
#define __SIZEOF_WCHAR_T__ 4
#define __SIZEOF_WINT_T__ 4
#define __SIZE_FMTX__ "lX"
#define __SIZE_FMTo__ "lo"
#define __SIZE_FMTu__ "lu"
#define __SIZE_FMTx__ "lx"
#define __SIZE_MAX__ 18446744073709551615UL
#define __SIZE_TYPE__ long unsigned int
#define __SIZE_WIDTH__ 64
#define __SSE2_MATH__ 1
#define __SSE2__ 1
#define __SSE_MATH__ 1
#define __SSE__ 1
#define __STDC_HOSTED__ 1
#define __STDC_UTF_16__ 1
#define __STDC_UTF_32__ 1
#define __STDC_VERSION__ 201710L
#define __STDC__ 1
#define __UINT16_C_SUFFIX__ 
#define __UINT16_FMTX__ "hX"
#define __UINT16_FMTo__ "ho"
#define __UINT16_FMTu__ "hu"
#define __UINT16_FMTx__ "hx"
#define __UINT16_MAX__ 65535
#define __UINT16_TYPE__ unsigned short
#define __UINT32_C_SUFFIX__ U
#define __UINT32_FMTX__ "X"
#define __UINT32_FMTo__ "o"
#define __UINT32_FMTu__ "u"
#define __UINT32_FMTx__ "x"
#define __UINT32_MAX__ 4294967295U
#define __UINT32_TYPE__ unsigned int
#define __UINT64_C_SUFFIX__ UL
#define __UINT64_FMTX__ "lX"
#define __UINT64_FMTo__ "lo"
#define __UINT64_FMTu__ "lu"
#define __UINT64_FMTx__ "lx"
#define __UINT64_MAX__ 18446744073709551615UL
#define __UINT64_TYPE__ long unsigned int
#define __UINT8_C_SUFFIX__ 
#define __UINT8_FMTX__ "hhX"
#define __UINT8_FMTo__ "hho"
#define __UINT8_FMTu__ "hhu"
#define __UINT8_FMTx__ "hhx"
#define __UINT8_MAX__ 255
#define __UINT8_TYPE__ unsigned char
#define __UINTMAX_C_SUFFIX__ UL
#define __UINTMAX_FMTX__ "lX"
#define __UINTMAX_FMTo__ "lo"
#define __UINTMAX_FMTu__ "lu"
#define __UINTMAX_FMTx__ "lx"
#define __UINTMAX_MAX__ 18446744073709551615UL
#define __UINTMAX_TYPE__ long unsigned int
#define __UINTMAX_WIDTH__ 64
#define __UINTPTR_FMTX__ "lX"
#define __UINTPTR_FMTo__ "lo"
#define __UINTPTR_FMTu__ "lu"
#define __UINTPTR_FMTx__ "lx"
#define __UINTPTR_MAX__ 18446744073709551615UL
#define __UINTPTR_TYPE__ long unsigned int
#define __UINTPTR_WIDTH__ 64
#define __UINT_FAST16_FMTX__ "hX"
#define __UINT_FAST16_FMTo__ "ho"
#define __UINT_FAST16_FMTu__ "hu"
#define __UINT_FAST16_FMTx__ "hx"
#define __UINT_FAST16_MAX__ 65535
#define __UINT_FAST16_TYPE__ unsigned short
#define __UINT_FAST32_FMTX__ "X"
#define __UINT_FAST32_FMTo__ "o"
#define __UINT_FAST32_FMTu__ "u"
#define __UINT_FAST32_FMTx__ "x"
#define __UINT_FAST32_MAX__ 4294967295U
#define __UINT_FAST32_TYPE__ unsigned int
#define __UINT_FAST64_FMTX__ "lX"
#define __UINT_FAST64_FMTo__ "lo"
#define __UINT_FAST64_FMTu__ "lu"
#define __UINT_FAST64_FMTx__ "lx"
#define __UINT_FAST64_MAX__ 18446744073709551615UL
#define __UINT_FAST64_TYPE__ long unsigned int
#define __UINT_FAST8_FMTX__ "hhX"
#define __UINT_FAST8_FMTo__ "hho"
#define __UINT_FAST8_FMTu__ "hhu"
#define __UINT_FAST8_FMTx__ "hhx"
#define __UINT_FAST8_MAX__ 255
#define __UINT_FAST8_TYPE__ unsigned char
#define __UINT_LEAST16_FMTX__ "hX"
#define __UINT_LEAST16_FMTo__ "ho"
#define __UINT_LEAST16_FMTu__ "hu"
#define __UINT_LEAST16_FMTx__ "hx"
#define __UINT_LEAST16_MAX__ 65535
#define __UINT_LEAST16_TYPE__ unsigned short
#define __UINT_LEAST32_FMTX__ "X"
#define __UINT_LEAST32_FMTo__ "o"
#define __UINT_LEAST32_FMTu__ "u"
#define __UINT_LEAST32_FMTx__ "x"
#define __UINT_LEAST32_MAX__ 4294967295U
#define __UINT_LEAST32_TYPE__ unsigned int
#define __UINT_LEAST64_FMTX__ "lX"
#define __UINT_LEAST64_FMTo__ "lo"
#define __UINT_LEAST64_FMTu__ "lu"
#define __UINT_LEAST64_FMTx__ "lx"
#define __UINT_LEAST64_MAX__ 18446744073709551615UL
#define __UINT_LEAST64_TYPE__ long unsigned int
#define __UINT_LEAST8_FMTX__ "hhX"
#define __UINT_LEAST8_FMTo__ "hho"
#define __UINT_LEAST8_FMTu__ "hhu"
#define __UINT_LEAST8_FMTx__ "hhx"
#define __UINT_LEAST8_MAX__ 255
#define __UINT_LEAST8_TYPE__ unsigned char
#define __USER_LABEL_PREFIX__ 
#define __VERSION__ "Clang 16.0.0"
#define __WCHAR_MAX__ 2147483647
#define __WCHAR_TYPE__ int
#define __WCHAR_WIDTH__ 32
#define __WINT_MAX__ 4294967295U
#define __WINT_TYPE__ unsigned int
#define __WINT_UNSIGNED__ 1
#define __WINT_WIDTH__ 32
#define __amd64 1
#define __amd64__ 1
#define __clang__ 1
#define __clang_literal_encoding__ "UTF-8"
#define __clang_major__ 16
#define __clang_minor__ 0
#define __clang_patchlevel__ 0
#define __clang_version__ "16.0.0 "
#define __clang_wide_literal_encoding__ "UTF-32"
#define __code_model_small__ 1
#define __gnu_linux__ 1
#define __k8 1
#define __k8__ 1
#define __linux 1
#define __linux__ 1
#define __llvm__ 1
#define __seg_fs __attribute__((address_space(257)))
#define __seg_gs __attribute__((address_space(256)))
#define __tune_k8__ 1
#define __unix 1
#define __unix__ 1
#define __x86_64 1
#define __x86_64__ 1
#define linux 1
#define unix 1

@WeiqunZhang
Copy link
Member Author

OK. We should be able to use __INTEL_LLVM_COMPILER to detect Intel's oneAPI compiler.

@nmnobre
Copy link
Contributor

nmnobre commented Nov 11, 2022

Indeed. But honestly, you are too kind, I don't want you to clutter the code if this is something I'd be the sole user of.
We could wait for 2023.1 (is that the next version?) and see if they include sycl/sycl.hpp, at which point it's a trivial patch for us.

@WeiqunZhang
Copy link
Member Author

Thank you for your contribution! I will add a commit to #3024. Both CL/sycl.hpp and sycl/sycl.hpp work in the latest Intel compiler. When I get time, I will try to set up a CI with hipSYCL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants