Openmp cce #1035

prathi-wind · 2025-11-10T20:13:16Z

User description

Description

Add OpenMP support to MFC on the Cray Compiler. Passes all test cases, and adds CI to test on Frontier.

Also, adds in mixed precision made added for Gordon Bell.

Fixes #(issue) [optional]

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Something else

Scope

This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

Test A
Test B

Test Configuration:

What computers and compilers did you use to test this:

Checklist

I have added comments for the new code
I added Doxygen docstrings to the new code
I have made corresponding changes to the documentation (docs/)
I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
I have added example cases in examples/ that demonstrate my new feature performing as expected.
They run to completion and demonstrate "interesting physics"
I ran ./mfc.sh format before committing my code
New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
This PR does not introduce any repeated code (it follows the DRY principle)
I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in `src/simulation`)

To make sure the code is performing as expected on GPU devices, I have:

Checked that the code compiles using NVHPC compilers
Checked that the code compiles using CRAY compilers
Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
Enclosed the new feature via nvtx ranges so that they can be identified in profiles
Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
Ran a Rocprof Systems profile using ./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.
Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

PR Type

Enhancement, Bug fix

Description

Add OpenMP support to MFC on the Cray Compiler with comprehensive case optimization
Implement mixed precision support throughout the codebase for improved performance and memory efficiency
Refactor boundary condition arrays from negative indexing (-1:1) to positive indexing (1:2) for optimization
Add case optimization guards to skip unnecessary 3D computations in 2D simulations (WENO, viscous, MHD, capillary)
Introduce simplex noise generation module for procedural perturbation support
Fix GPU compatibility issues in chemistry and FFT modules for both NVIDIA and AMD compilers
Update MPI operations to support mixed precision with proper type conversions and 64-bit indexing
Refactor bubble Eulerian projection data structure for improved memory layout
Fix MUSCL workspace array naming conflicts and GPU parallel loop syntax issues
Add CI testing on Frontier supercomputer for Cray Compiler validation

Diagram Walkthrough

flowchart LR
  A["Cray Compiler<br/>OpenMP Support"] --> B["Case Optimization<br/>2D/3D Guards"]
  A --> C["Mixed Precision<br/>wp/stp Types"]
  B --> D["WENO, Viscous,<br/>MHD Modules"]
  C --> E["Boundary Arrays<br/>MPI Operations"]
  C --> F["Bubble EL,<br/>QBMM Modules"]
  G["GPU Compatibility<br/>Fixes"] --> H["Chemistry<br/>FFT Modules"]
  I["Simplex Noise<br/>Module"] --> J["Perturbation<br/>Support"]
  D --> K["Performance<br/>Optimization"]
  E --> K
  H --> K

File Walkthrough

Relevant files

Enhancement

17 files

m_boundary_common.fpp `Refactor boundary arrays and add case optimization support` src/common/m_boundary_common.fpp Changed boundary condition array indexing from `(-1:1)` to `(1:2)` to support case optimization Updated all references from negative indices `(-1, 1)` to positive indices `(1, 2)` throughout the module Added conditional compilation directives `#:if not` `MFC_CASE_OPTIMIZATION or num_dims > 2` to skip 3D boundary operations in optimized 2D cases Changed data type from `real(wp)` to `real(stp)` for mixed precision support in buffer parameters Updated MPI I/O operations to handle mixed precision with `MPI_BYTE` and element count calculations	+249/-231
m_weno.fpp `Add case optimization guards to WENO reconstruction schemes` src/simulation/m_weno.fpp Added `#:include 'case.fpp'` directive for case optimization support Wrapped WENO order 5 and 7 reconstruction code with conditional compilation `#:if not MFC_CASE_OPTIMIZATION or weno_num_stencils > N` Nested TENO stencil calculations within additional optimization guards for higher-order schemes Maintained existing WENO algorithm logic while enabling compile-time optimization for reduced stencil cases	+286/-274
m_bubbles_EL.fpp `Refactor bubble Eulerian projection data structure for mixed precision` src/simulation/m_bubbles_EL.fpp Changed `q_beta` from `type(vector_field)` to `type(scalar_field),` `dimension(:), allocatable` for better memory layout Updated allocation from `q_beta%vf(1:q_beta_idx)` to `q_beta(1:q_beta_idx)` with individual scalar field setup Changed data type from `real(wp)` to `real(stp)` for mixed precision support in gradient calculations Updated `s_gradient_dir` subroutine signature to accept raw arrays instead of scalar field types Updated all references from `q_beta%vf(i)%sf` to `q_beta(i)%sf` throughout the module	+50/-48
m_boundary_conditions.fpp `Update boundary condition array indexing to positive indices` src/pre_process/m_boundary_conditions.fpp Updated boundary condition array indexing from `(-1:1)` to `(1:2)` in function signatures Added conditional logic to map old negative indices `(-1)` to new positive index `(1)` and `(1)` to `(2)` Updated all `bc_type` array accesses to use new indexing scheme with explicit if-statements for location mapping	+20/-12
m_viscous.fpp `Add case optimization conditionals to viscous module` src/simulation/m_viscous.fpp Added `case.fpp` include directive for case optimization support Wrapped viscous stress tensor computation blocks with `MFC_CASE_OPTIMIZATION` conditional to skip 3D-specific calculations in 2D cases Reorganized gradient computation loops with conditional compilation for dimensional optimization Improved code indentation and structure for better readability	+319/-316
m_riemann_solvers.fpp `Optimize MHD calculations and improve GPU parallelization` src/simulation/m_riemann_solvers.fpp Added `case.fpp` include for case optimization support Wrapped MHD-related 3D vector operations with `MFC_CASE_OPTIMIZATION` conditionals to optimize 2D cases Expanded private variable lists in GPU parallel loops for better memory management Fixed duplicate variable assignments and improved variable initialization in HLLC solver Added explicit type conversions for mixed precision support	+78/-35
m_mpi_common.fpp `Support mixed precision and dynamic dimensionality in MPI` src/common/m_mpi_common.fpp Added `case.fpp` include directive for case optimization Changed `halo_size` from `integer` to `integer(kind=8)` for 64-bit support Fixed dimension indexing to use `num_dims` instead of hardcoded value 3 for 2D/3D compatibility Added explicit type conversions between `wp` and `stp` precision kinds in MPI buffer operations	+23/-23
m_cbc.fpp `Optimize CBC module with case-specific conditionals` src/simulation/m_cbc.fpp Added `case.fpp` include for case optimization support Moved `dpres_ds` variable declaration from module level to local scope within subroutine Added `dpres_ds` to private variable list in GPU parallel loop Wrapped `dpi_inf_dt` assignment with `MFC_CASE_OPTIMIZATION` conditional for single-fluid optimization Improved GPU update directive formatting	+7/-8
m_global_parameters.fpp `Add simplex noise perturbation parameters` src/pre_process/m_global_parameters.fpp Added new `simplex_perturb` logical flag for simplex noise perturbation support Added `simplex_params` derived type variable to store simplex noise parameters Initialized simplex perturbation parameters in module initialization routine	+12/-0
m_qbmm.fpp `Mixed precision support and case optimization for QBMM` src/simulation/m_qbmm.fpp Changed `pb` parameter type from `real(wp)` to `real(stp)` for mixed precision support Wrapped bubble model coefficient calculations with preprocessor conditionals for case optimization Added missing loop variables `i1, i2, j` to GPU parallel loop private list Added TODO comment regarding `rhs_pb` and `rhs_mv` precision types	+98/-87
m_simplex_noise.fpp `New simplex noise generation module` src/pre_process/m_simplex_noise.fpp New module implementing 3D and 2D Perlin simplex noise functions Includes gradient lookup tables and permutation vectors for noise generation Provides `f_simplex3d` and `f_simplex2d` functions for procedural noise	+245/-0
m_fftw.fpp `FFT filter GPU implementation refactoring` src/simulation/m_fftw.fpp Refactored GPU FFT filter implementation to use direct device addresses instead of pointers Removed `#:block UNDEF_CCE` wrapper and simplified GPU data management Restructured loop logic to process initial ring separately before main loop Improved compatibility with both NVIDIA and AMD GPU compilers	+115/-130
m_start_up.fpp `Mixed precision I/O and time stepping improvements` src/simulation/m_start_up.fpp Changed `WP_MOK` from 8 bytes to 4 bytes for mixed precision I/O Updated MPI file read calls to use `mpi_io_type` multiplier for data size Added logic to copy `q_cons_ts(1)` to `q_cons_ts(2)` for multi-stage time steppers Fixed NaN checking to use explicit `real()` conversion for mixed precision Added GPU updates for bubble and chemistry parameters Removed unused GPU declare directive for `q_cons_temp`	+88/-54
m_rhs.fpp `Mixed precision and GPU compatibility in RHS module` src/simulation/m_rhs.fpp Added conditional GPU declare directives for OpenACC-specific declarations Changed loop variable types to `integer(kind=8)` for large array indexing Wrapped flux allocation logic with `if (.not. igr)` condition Updated `bc_type` parameter dimension from `1:3, -1:1` to `1:3, 1:2` Changed `pb_in`, `mv_in` parameters to `real(stp)` type for mixed precision Added TODO comments about precision type consistency	+57/-45
m_icpp_patches.fpp `Mixed precision support for patch identification arrays` src/pre_process/m_icpp_patches.fpp Added conditional compilation for `patch_id_fp` array type based on `MFC_MIXED_PRECISION` flag Uses `integer(kind=1)` for mixed precision mode, standard `integer` otherwise Applied consistently across all patch subroutines for memory efficiency	+75/-4
m_ibm.fpp `Mixed precision support for IBM module` src/simulation/m_ibm.fpp Changed `pb_in` and `mv_in` parameters to `real(stp)` type for mixed precision Added missing variables to GPU parallel loop private list Added type casting for `pb_in` and `mv_in` in interpolation calls Changed `levelset_in` access to use explicit `real()` conversion Updated `pb_in, mv_in` intent from INOUT to IN in interpolation subroutine	+7/-5
inline_capillary.fpp `Case optimization for capillary tensor calculations` src/simulation/include/inline_capillary.fpp Wrapped 3D capillary tensor calculations with preprocessor conditional Only computes 3D components when not using case optimization or when `num_dims > 2` Reduces unnecessary computations for 2D simulations	+7/-6

Bug fix

2 files

m_muscl.fpp `Rename MUSCL arrays and fix GPU loop syntax` src/simulation/m_muscl.fpp Renamed MUSCL workspace arrays from `v_rs_ws_x/y/z` to `v_rs_ws_x/y/z_muscl` to avoid naming conflicts Fixed GPU parallel loop syntax and indentation consistency Corrected `#:endcall` directives to include full macro name for clarity Improved code formatting and nested subroutine indentation	+158/-158
m_chemistry.fpp `GPU compatibility fixes for chemistry module` src/common/m_chemistry.fpp Removed sequential GPU loop directives that were causing compilation issues Added temporary variable `T_in` for type conversion in temperature calculations Wrapped large GPU parallel loop with `#:block UNDEF_AMD` to handle AMD compiler differences Added missing variables to GPU parallel loop private and copyin lists	+113/-112

Additional files

48 files

bench.yml	+8/-0
submit-bench.sh	+1/-1
submit-bench.sh	+1/-1
test.yml	+0/-4
CMakeLists.txt	+18/-7
case.py	+104/-0
load_amd.sh	+7/-0
setupNB.sh	+5/-0
3dHardcodedIC.fpp	+113/-2
macros.fpp	+5/-4
omp_macros.fpp	+25/-1
m_derived_types.fpp	+26/-9
m_helper.fpp	+6/-6
m_precision_select.f90	+14/-0
m_variables_conversion.fpp	+10/-34
m_data_input.f90	+11/-10
m_assign_variables.fpp	+18/-4
m_checker.fpp	+23/-0
m_data_output.fpp	+32/-34
m_initial_condition.fpp	+35/-14
m_mpi_proxy.fpp	+29/-2
m_perturbation.fpp	+90/-1
m_start_up.fpp	+3/-2
m_acoustic_src.fpp	+8/-11
m_body_forces.fpp	+1/-1
m_bubbles_EE.fpp	+4/-4
m_bubbles_EL_kernels.fpp	+15/-15
m_data_output.fpp	+25/-26
m_derived_variables.fpp	+15/-13
m_global_parameters.fpp	+6/-11
m_hyperelastic.fpp	+1/-1
m_hypoelastic.fpp	+4/-4
m_igr.fpp	+1616/-1385
m_mhd.fpp	+1/-1
m_sim_helpers.fpp	+31/-26
m_surface_tension.fpp	+38/-34
m_time_steppers.fpp	+86/-37
p_main.fpp	+2/-0
golden-metadata.txt	+154/-0
golden.txt	+10/-0
build.py	+2/-1
lock.py	+1/-1
case_dicts.py	+16/-0
input.py	+1/-1
state.py	+3/-3
modules	+4/-0
pyproject.toml	+3/-2
frontier.mako	+11/-10

…, started work on enter and exit data, compiles

…, exit data, and update

…d WAIT

…e, add mappers to derived types, change how allocate is done

…loop

…types, removed rest of pure functions, fix issue with acoustic on nvfortran

…utput

qodo-merge-pro · 2025-11-12T17:21:53Z

src/common/m_boundary_common.fpp

+        bc_type(1, 1)%sf(:, :, :) = int(min(bc_x%beg, 0), kind=1)
+        bc_type(1, 2)%sf(:, :, :) = int(min(bc_x%end, 0), kind=1)
+        $:GPU_UPDATE(device='[bc_type(1,1)%sf,bc_type(1,2)%sf]')

        if (n > 0) then
-            bc_type(2, -1)%sf(:, :, :) = bc_y%beg
-            bc_type(2, 1)%sf(:, :, :) = bc_y%end
-            $:GPU_UPDATE(device='[bc_type(2,-1)%sf,bc_type(2,1)%sf]')
-
-            if (p > 0) then
-                bc_type(3, -1)%sf(:, :, :) = bc_z%beg
-                bc_type(3, 1)%sf(:, :, :) = bc_z%end
-                $:GPU_UPDATE(device='[bc_type(3,-1)%sf,bc_type(3,1)%sf]')
-            end if
+            bc_type(2, 1)%sf(:, :, :) = int(min(bc_y%beg, 0), kind=1)
+            bc_type(2, 2)%sf(:, :, :) = int(min(bc_y%end, 0), kind=1)
+            $:GPU_UPDATE(device='[bc_type(2,1)%sf,bc_type(2,2)%sf]')
+            #:if not MFC_CASE_OPTIMIZATION or num_dims > 2
+                if (p > 0) then
+                    bc_type(3, 1)%sf(:, :, :) = int(min(bc_z%beg, 0), kind=1)
+                    bc_type(3, 2)%sf(:, :, :) = int(min(bc_z%end, 0), kind=1)
+                    $:GPU_UPDATE(device='[bc_type(3,1)%sf,bc_type(3,2)%sf]')
+                end if
+            #:endif
        end if


Suggestion: Correct the default boundary condition assignment by removing the min(..., 0) logic, which incorrectly handles MPI processor ranks. The previous logic of direct assignment from bc_...%... should be restored with a type cast. [possible issue, importance: 10]

Suggested change

bc_type(1, 1)%sf(:, :, :) = int(min(bc_x%beg, 0), kind=1)

bc_type(1, 2)%sf(:, :, :) = int(min(bc_x%end, 0), kind=1)

$:GPU_UPDATE(device='[bc_type(1,1)%sf,bc_type(1,2)%sf]')

if (n > 0) then

bc_type(2, -1)%sf(:, :, :) = bc_y%beg

bc_type(2, 1)%sf(:, :, :) = bc_y%end

$:GPU_UPDATE(device='[bc_type(2,-1)%sf,bc_type(2,1)%sf]')

if (p > 0) then

bc_type(3, -1)%sf(:, :, :) = bc_z%beg

bc_type(3, 1)%sf(:, :, :) = bc_z%end

$:GPU_UPDATE(device='[bc_type(3,-1)%sf,bc_type(3,1)%sf]')

end if

bc_type(2, 1)%sf(:, :, :) = int(min(bc_y%beg, 0), kind=1)

bc_type(2, 2)%sf(:, :, :) = int(min(bc_y%end, 0), kind=1)

$:GPU_UPDATE(device='[bc_type(2,1)%sf,bc_type(2,2)%sf]')

#:if not MFC_CASE_OPTIMIZATION or num_dims > 2

if (p > 0) then

bc_type(3, 1)%sf(:, :, :) = int(min(bc_z%beg, 0), kind=1)

bc_type(3, 2)%sf(:, :, :) = int(min(bc_z%end, 0), kind=1)

$:GPU_UPDATE(device='[bc_type(3,1)%sf,bc_type(3,2)%sf]')

end if

#:endif

end if

bc_type(1, 1)%sf(:, :, :) = int(bc_x%beg, kind=1)

bc_type(1, 2)%sf(:, :, :) = int(bc_x%end, kind=1)

$:GPU_UPDATE(device='[bc_type(1,1)%sf,bc_type(1,2)%sf]')

if (n > 0) then

bc_type(2, 1)%sf(:, :, :) = int(bc_y%beg, kind=1)

bc_type(2, 2)%sf(:, :, :) = int(bc_y%end, kind=1)

$:GPU_UPDATE(device='[bc_type(2,1)%sf,bc_type(2,2)%sf]')

#:if not MFC_CASE_OPTIMIZATION or num_dims > 2

if (p > 0) then

bc_type(3, 1)%sf(:, :, :) = int(bc_z%beg, kind=1)

bc_type(3, 2)%sf(:, :, :) = int(bc_z%end, kind=1)

$:GPU_UPDATE(device='[bc_type(3,1)%sf,bc_type(3,2)%sf]')

end if

#:endif

end if

I believe this suggestion is irrelevant. bc_type is only used for negative values of bc_x[y,z]%beg[end]. If bc_x[y,z]%beg[end] is positive then it does MPI communication and bc_type is never touched.

qodo-merge-pro · 2025-11-12T17:21:53Z

src/simulation/m_viscous.fpp

+            if (shear_stress) then    ! Shear stresses
+                #:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
+                    do l = is3_viscous%beg, is3_viscous%end
+                        do k = -1, 1
+                            do j = is1_viscous%beg, is1_viscous%end

-                            $:GPU_LOOP(parallelism='[seq]')
-                            do i = 1, num_fluids
-                                alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
-                                if (bubbles_euler .and. num_fluids == 1) then
-                                    alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
+                                $:GPU_LOOP(parallelism='[seq]')
+                                do i = 1, num_fluids
+                                    alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
+                                    if (bubbles_euler .and. num_fluids == 1) then
+                                        alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
+                                    else
+                                        alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
+                                    end if
+                                end do
+
+                                if (bubbles_euler) then
+                                    rho_visc = 0._wp
+                                    gamma_visc = 0._wp
+                                    pi_inf_visc = 0._wp
+
+                                    if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
+                                        $:GPU_LOOP(parallelism='[seq]')
+                                        do i = 1, num_fluids
+                                            rho_visc = rho_visc + alpha_rho_visc(i)
+                                            gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
+                                            pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
+                                        end do
+                                    else if ((model_eqns == 2) .and. (num_fluids > 2)) then
+                                        $:GPU_LOOP(parallelism='[seq]')
+                                        do i = 1, num_fluids - 1
+                                            rho_visc = rho_visc + alpha_rho_visc(i)
+                                            gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
+                                            pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
+                                        end do
+                                    else
+                                        rho_visc = alpha_rho_visc(1)
+                                        gamma_visc = gammas(1)
+                                        pi_inf_visc = pi_infs(1)
+                                    end if
                                else
-                                    alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
-                                end if
-                            end do
+                                    rho_visc = 0._wp
+                                    gamma_visc = 0._wp
+                                    pi_inf_visc = 0._wp

-                            if (bubbles_euler) then
-                                rho_visc = 0._wp
-                                gamma_visc = 0._wp
-                                pi_inf_visc = 0._wp
+                                    alpha_visc_sum = 0._wp
+
+                                    if (mpp_lim) then
+                                        $:GPU_LOOP(parallelism='[seq]')
+                                        do i = 1, num_fluids
+                                            alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
+                                            alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
+                                            alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
+                                        end do
+
+                                        alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
+
+                                    end if

-                                if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
                                    $:GPU_LOOP(parallelism='[seq]')
                                    do i = 1, num_fluids
                                        rho_visc = rho_visc + alpha_rho_visc(i)
                                        gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                                        pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                                    end do
-                                else if ((model_eqns == 2) .and. (num_fluids > 2)) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, num_fluids - 1
-                                        rho_visc = rho_visc + alpha_rho_visc(i)
-                                        gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
-                                        pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
-                                    end do
-                                else
-                                    rho_visc = alpha_rho_visc(1)
-                                    gamma_visc = gammas(1)
-                                    pi_inf_visc = pi_infs(1)
+
+                                    if (viscous) then
+                                        $:GPU_LOOP(parallelism='[seq]')
+                                        do i = 1, 2
+                                            Re_visc(i) = dflt_real
+
+                                            if (Re_size(i) > 0) Re_visc(i) = 0._wp
+                                            $:GPU_LOOP(parallelism='[seq]')
+                                            do q = 1, Re_size(i)
+                                                Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+                                                             + Re_visc(i)
+                                            end do
+
+                                            Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
+
+                                        end do
+                                    end if
                                end if
-                            else
-                                rho_visc = 0._wp
-                                gamma_visc = 0._wp
-                                pi_inf_visc = 0._wp

-                                alpha_visc_sum = 0._wp
+                                tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
+                                               Re_visc(1)

-                                if (mpp_lim) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, num_fluids
-                                        alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
-                                        alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
-                                        alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
-                                    end do
+                                tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &
+                                                 q_prim_vf(momxe)%sf(j, k, l))/ &
+                                                y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &
+                                               Re_visc(1)

-                                    alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
+                                $:GPU_LOOP(parallelism='[seq]')
+                                do i = 2, 3
+                                    tau_Re_vf(contxe + i)%sf(j, k, l) = &
+                                        tau_Re_vf(contxe + i)%sf(j, k, l) - &
+                                        tau_Re(2, i)
+
+                                    tau_Re_vf(E_idx)%sf(j, k, l) = &
+                                        tau_Re_vf(E_idx)%sf(j, k, l) - &
+                                        q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
+                                end do

-                                end if
+                            end do
+                        end do
+                    end do
+                #:endcall GPU_PARALLEL_LOOP
+            end if
+
+            if (bulk_stress) then    ! Bulk stresses
+                #:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
+                    do l = is3_viscous%beg, is3_viscous%end
+                        do k = -1, 1
+                            do j = is1_viscous%beg, is1_viscous%end

                                $:GPU_LOOP(parallelism='[seq]')
                                do i = 1, num_fluids
-                                    rho_visc = rho_visc + alpha_rho_visc(i)
-                                    gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
-                                    pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
+                                    alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
+                                    if (bubbles_euler .and. num_fluids == 1) then
+                                        alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
+                                    else
+                                        alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
+                                    end if
                                end do

-                                if (viscous) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, 2
-                                        Re_visc(i) = dflt_real
+                                if (bubbles_euler) then
+                                    rho_visc = 0._wp
+                                    gamma_visc = 0._wp
+                                    pi_inf_visc = 0._wp

-                                        if (Re_size(i) > 0) Re_visc(i) = 0._wp
+                                    if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
                                        $:GPU_LOOP(parallelism='[seq]')
-                                        do q = 1, Re_size(i)
-                                            Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
-                                                         + Re_visc(i)
+                                        do i = 1, num_fluids
+                                            rho_visc = rho_visc + alpha_rho_visc(i)
+                                            gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
+                                            pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                                        end do
+                                    else if ((model_eqns == 2) .and. (num_fluids > 2)) then
+                                        $:GPU_LOOP(parallelism='[seq]')
+                                        do i = 1, num_fluids - 1
+                                            rho_visc = rho_visc + alpha_rho_visc(i)
+                                            gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
+                                            pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
+                                        end do
+                                    else
+                                        rho_visc = alpha_rho_visc(1)
+                                        gamma_visc = gammas(1)
+                                        pi_inf_visc = pi_infs(1)
+                                    end if
+                                else
+                                    rho_visc = 0._wp
+                                    gamma_visc = 0._wp
+                                    pi_inf_visc = 0._wp

-                                        Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
-
-                                    end do
-                                end if
-                            end if
-
-                            tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
-                                           Re_visc(1)
-
-                            tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &
-                                             q_prim_vf(momxe)%sf(j, k, l))/ &
-                                            y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &
-                                           Re_visc(1)
-
-                            $:GPU_LOOP(parallelism='[seq]')
-                            do i = 2, 3
-                                tau_Re_vf(contxe + i)%sf(j, k, l) = &
-                                    tau_Re_vf(contxe + i)%sf(j, k, l) - &
-                                    tau_Re(2, i)
-
-                                tau_Re_vf(E_idx)%sf(j, k, l) = &
-                                    tau_Re_vf(E_idx)%sf(j, k, l) - &
-                                    q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
-                            end do
-
-                        end do
-                    end do
-                end do
-            #:endcall GPU_PARALLEL_LOOP
-        end if
+                                    alpha_visc_sum = 0._wp

-        if (bulk_stress) then    ! Bulk stresses
-            #:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
-                do l = is3_viscous%beg, is3_viscous%end
-                    do k = -1, 1
-                        do j = is1_viscous%beg, is1_viscous%end
+                                    if (mpp_lim) then
+                                        $:GPU_LOOP(parallelism='[seq]')
+                                        do i = 1, num_fluids
+                                            alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
+                                            alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
+                                            alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
+                                        end do

-                            $:GPU_LOOP(parallelism='[seq]')
-                            do i = 1, num_fluids
-                                alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
-                                if (bubbles_euler .and. num_fluids == 1) then
-                                    alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
-                                else
-                                    alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
-                                end if
-                            end do
+                                        alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

-                            if (bubbles_euler) then
-                                rho_visc = 0._wp
-                                gamma_visc = 0._wp
-                                pi_inf_visc = 0._wp
+                                    end if

-                                if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
                                    $:GPU_LOOP(parallelism='[seq]')
                                    do i = 1, num_fluids
                                        rho_visc = rho_visc + alpha_rho_visc(i)
                                        gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
                                        pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
                                    end do
-                                else if ((model_eqns == 2) .and. (num_fluids > 2)) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, num_fluids - 1
-                                        rho_visc = rho_visc + alpha_rho_visc(i)
-                                        gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
-                                        pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
-                                    end do
-                                else
-                                    rho_visc = alpha_rho_visc(1)
-                                    gamma_visc = gammas(1)
-                                    pi_inf_visc = pi_infs(1)
-                                end if
-                            else
-                                rho_visc = 0._wp
-                                gamma_visc = 0._wp
-                                pi_inf_visc = 0._wp
-
-                                alpha_visc_sum = 0._wp
-
-                                if (mpp_lim) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, num_fluids
-                                        alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
-                                        alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
-                                        alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
-                                    end do
-
-                                    alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

-                                end if
-
-                                $:GPU_LOOP(parallelism='[seq]')
-                                do i = 1, num_fluids
-                                    rho_visc = rho_visc + alpha_rho_visc(i)
-                                    gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
-                                    pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
-                                end do
-
-                                if (viscous) then
-                                    $:GPU_LOOP(parallelism='[seq]')
-                                    do i = 1, 2
-                                        Re_visc(i) = dflt_real
-
-                                        if (Re_size(i) > 0) Re_visc(i) = 0._wp
+                                    if (viscous) then
                                        $:GPU_LOOP(parallelism='[seq]')
-                                        do q = 1, Re_size(i)
-                                            Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
-                                                         + Re_visc(i)
-                                        end do
+                                        do i = 1, 2
+                                            Re_visc(i) = dflt_real
+
+                                            if (Re_size(i) > 0) Re_visc(i) = 0._wp
+                                            $:GPU_LOOP(parallelism='[seq]')
+                                            do q = 1, Re_size(i)
+                                                Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+                                                             + Re_visc(i)
+                                            end do

-                                        Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
+                                            Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

-                                    end do
+                                        end do
+                                    end if
                                end if
-                            end if

-                            tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
-                                           Re_visc(2)
+                                tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
+                                               Re_visc(2)

-                            tau_Re_vf(momxb + 1)%sf(j, k, l) = &
-                                tau_Re_vf(momxb + 1)%sf(j, k, l) - &
-                                tau_Re(2, 2)
+                                tau_Re_vf(momxb + 1)%sf(j, k, l) = &
+                                    tau_Re_vf(momxb + 1)%sf(j, k, l) - &
+                                    tau_Re(2, 2)

-                            tau_Re_vf(E_idx)%sf(j, k, l) = &
-                                tau_Re_vf(E_idx)%sf(j, k, l) - &
-                                q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)
+                                tau_Re_vf(E_idx)%sf(j, k, l) = &
+                                    tau_Re_vf(E_idx)%sf(j, k, l) - &
+                                    q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)

+                            end do
                        end do
                    end do
-                end do
-            #:endcall GPU_PARALLEL_LOOP
-        end if
+                #:endcall GPU_PARALLEL_LOOP
+            end if


Suggestion: Refactor the duplicated logic within the shear_stress and bulk_stress blocks into a separate subroutine to improve code maintainability. [general, importance: 7]

Suggested change

if (shear_stress) then ! Shear stresses

#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')

do l = is3_viscous%beg, is3_viscous%end

do k = -1, 1

do j = is1_viscous%beg, is1_viscous%end

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)

if (bubbles_euler .and. num_fluids == 1) then

alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)

if (bubbles_euler .and. num_fluids == 1) then

alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)

else

alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)

end if

end do

if (bubbles_euler) then

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else if ((model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids - 1

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else

rho_visc = alpha_rho_visc(1)

gamma_visc = gammas(1)

pi_inf_visc = pi_infs(1)

end if

else

alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)

end if

end do

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

if (bubbles_euler) then

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

alpha_visc_sum = 0._wp

if (mpp_lim) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))

alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)

alpha_visc_sum = alpha_visc_sum + alpha_visc(i)

end do

alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

end if

if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else if ((model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids - 1

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else

rho_visc = alpha_rho_visc(1)

gamma_visc = gammas(1)

pi_inf_visc = pi_infs(1)

if (viscous) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, 2

Re_visc(i) = dflt_real

if (Re_size(i) > 0) Re_visc(i) = 0._wp

$:GPU_LOOP(parallelism='[seq]')

do q = 1, Re_size(i)

Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &

+ Re_visc(i)

end do

Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

end do

end if

end if

else

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

alpha_visc_sum = 0._wp

tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &

Re_visc(1)

if (mpp_lim) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))

alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)

alpha_visc_sum = alpha_visc_sum + alpha_visc(i)

end do

tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &

q_prim_vf(momxe)%sf(j, k, l))/ &

y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &

Re_visc(1)

alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

$:GPU_LOOP(parallelism='[seq]')

do i = 2, 3

tau_Re_vf(contxe + i)%sf(j, k, l) = &

tau_Re_vf(contxe + i)%sf(j, k, l) - &

tau_Re(2, i)

tau_Re_vf(E_idx)%sf(j, k, l) = &

tau_Re_vf(E_idx)%sf(j, k, l) - &

q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)

end do

end if

end do

end do

end do

#:endcall GPU_PARALLEL_LOOP

end if

if (bulk_stress) then ! Bulk stresses

#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')

do l = is3_viscous%beg, is3_viscous%end

do k = -1, 1

do j = is1_viscous%beg, is1_viscous%end

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)

if (bubbles_euler .and. num_fluids == 1) then

alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)

else

alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)

end if

end do

if (viscous) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, 2

Re_visc(i) = dflt_real

if (bubbles_euler) then

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

if (Re_size(i) > 0) Re_visc(i) = 0._wp

if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do q = 1, Re_size(i)

Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &

+ Re_visc(i)

do i = 1, num_fluids

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else if ((model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids - 1

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else

rho_visc = alpha_rho_visc(1)

gamma_visc = gammas(1)

pi_inf_visc = pi_infs(1)

end if

else

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

end do

end if

end if

tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &

Re_visc(1)

tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &

q_prim_vf(momxe)%sf(j, k, l))/ &

y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &

Re_visc(1)

$:GPU_LOOP(parallelism='[seq]')

do i = 2, 3

tau_Re_vf(contxe + i)%sf(j, k, l) = &

tau_Re_vf(contxe + i)%sf(j, k, l) - &

tau_Re(2, i)

tau_Re_vf(E_idx)%sf(j, k, l) = &

tau_Re_vf(E_idx)%sf(j, k, l) - &

q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)

end do

end do

end do

end do

#:endcall GPU_PARALLEL_LOOP

end if

alpha_visc_sum = 0._wp

if (bulk_stress) then ! Bulk stresses

#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')

do l = is3_viscous%beg, is3_viscous%end

do k = -1, 1

do j = is1_viscous%beg, is1_viscous%end

if (mpp_lim) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))

alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)

alpha_visc_sum = alpha_visc_sum + alpha_visc(i)

end do

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)

if (bubbles_euler .and. num_fluids == 1) then

alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)

else

alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)

end if

end do

alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

if (bubbles_euler) then

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

end if

if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else if ((model_eqns == 2) .and. (num_fluids > 2)) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids - 1

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

else

rho_visc = alpha_rho_visc(1)

gamma_visc = gammas(1)

pi_inf_visc = pi_infs(1)

end if

else

rho_visc = 0._wp

gamma_visc = 0._wp

pi_inf_visc = 0._wp

alpha_visc_sum = 0._wp

if (mpp_lim) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))

alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)

alpha_visc_sum = alpha_visc_sum + alpha_visc(i)

end do

alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

end if

$:GPU_LOOP(parallelism='[seq]')

do i = 1, num_fluids

rho_visc = rho_visc + alpha_rho_visc(i)

gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)

pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)

end do

if (viscous) then

$:GPU_LOOP(parallelism='[seq]')

do i = 1, 2

Re_visc(i) = dflt_real

if (Re_size(i) > 0) Re_visc(i) = 0._wp

if (viscous) then

$:GPU_LOOP(parallelism='[seq]')

do q = 1, Re_size(i)

Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &

+ Re_visc(i)

end do

do i = 1, 2

Re_visc(i) = dflt_real

if (Re_size(i) > 0) Re_visc(i) = 0._wp

$:GPU_LOOP(parallelism='[seq]')

do q = 1, Re_size(i)

Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &

+ Re_visc(i)

end do

Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

end do

end do

end if

end if

end if

tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &

Re_visc(2)

tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &

Re_visc(2)

tau_Re_vf(momxb + 1)%sf(j, k, l) = &

tau_Re_vf(momxb + 1)%sf(j, k, l) - &

tau_Re(2, 2)

tau_Re_vf(momxb + 1)%sf(j, k, l) = &

tau_Re_vf(momxb + 1)%sf(j, k, l) - &

tau_Re(2, 2)

tau_Re_vf(E_idx)%sf(j, k, l) = &

tau_Re_vf(E_idx)%sf(j, k, l) - &

q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)

tau_Re_vf(E_idx)%sf(j, k, l) = &

tau_Re_vf(E_idx)%sf(j, k, l) - &

q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)

end do

end do

end do

end do

#:endcall GPU_PARALLEL_LOOP

end if

#:endcall GPU_PARALLEL_LOOP

end if

if (shear_stress) then ! Shear stresses

#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re, rho_visc, gamma_visc, pi_inf_visc, alpha_visc_sum]')

do l = is3_viscous%beg, is3_viscous%end

do k = -1, 1

do j = is1_viscous%beg, is1_viscous%end

call s_compute_common_stress_terms(j, k, l, alpha_rho_visc, alpha_visc, rho_visc, gamma_visc, pi_inf_visc, Re_visc)

tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &

Re_visc(1)

tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &

q_prim_vf(momxe)%sf(j, k, l))/ &

y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &

Re_visc(1)

$:GPU_LOOP(parallelism='[seq]')

do i = 2, 3

tau_Re_vf(contxe + i)%sf(j, k, l) = &

tau_Re_vf(contxe + i)%sf(j, k, l) - &

tau_Re(2, i)

tau_Re_vf(E_idx)%sf(j, k, l) = &

tau_Re_vf(E_idx)%sf(j, k, l) - &

q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)

end do

end do

end do

end do

#:endcall GPU_PARALLEL_LOOP

end if

if (bulk_stress) then ! Bulk stresses

#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re, rho_visc, gamma_visc, pi_inf_visc, alpha_visc_sum]')

do l = is3_viscous%beg, is3_viscous%end

do k = -1, 1

do j = is1_viscous%beg, is1_viscous%end

call s_compute_common_stress_terms(j, k, l, alpha_rho_visc, alpha_visc, rho_visc, gamma_visc, pi_inf_visc, Re_visc)

tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &

Re_visc(2)

tau_Re_vf(momxb + 1)%sf(j, k, l) = &

tau_Re_vf(momxb + 1)%sf(j, k, l) - &

tau_Re(2, 2)

tau_Re_vf(E_idx)%sf(j, k, l) = &

tau_Re_vf(E_idx)%sf(j, k, l) - &

q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)

end do

end do

end do

#:endcall GPU_PARALLEL_LOOP

end if

Copilot

Pull Request Overview

This PR adds OpenMP support for the Cray Compiler Environment (CCE) and introduces mixed precision functionality developed for Gordon Bell submissions. The changes span toolchain configuration, simulation code optimization, and new simplex noise perturbation capabilities.

Key Changes:

Added mixed precision support with a new stp (storage precision) type alongside the existing wp (working precision) type
Implemented case-specific optimizations that conditionally compile code based on runtime parameters
Updated MPI I/O operations to handle mixed precision data types
Added simplex noise module for initial condition perturbations
Enhanced Frontier HPC template with better resource management (sbcast for binaries, nvme storage)

Reviewed Changes

Copilot reviewed 65 out of 67 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
toolchain/templates/frontier.mako	Added nvme constraint, improved binary broadcast with sbcast, streamlined profiler placement
toolchain/pyproject.toml	Updated pyrometheus dependency to development branch for OpenMP testing
toolchain/modules	Duplicate entry for CSCS Santis configuration (lines 92-95)
toolchain/mfc/state.py	Added `mixed` precision flag to MFCConfig, improved code formatting
toolchain/mfc/run/input.py	Updated real_type selection logic to include mixed precision mode
toolchain/mfc/run/case_dicts.py	Added simplex_params parameter definitions for density/velocity perturbations
toolchain/mfc/lock.py	Incremented lock version to 8
toolchain/mfc/build.py	Added MFC_MIXED_PRECISION CMake flag
tests/43B5FEBD/golden-metadata.txt	New golden metadata file documenting test configuration
src/simulation/p_main.fpp	Added unused `status` variable declaration
src/simulation/m_weno.fpp	Added case optimization guards to conditionally compile WENO stencil code
src/simulation/m_viscous.fpp	Wrapped 3D viscous stress calculations in case optimization guards
src/simulation/m_time_steppers.fpp	Converted to mixed precision (stp) for time-stepping arrays, reorganized probe storage
src/simulation/m_surface_tension.fpp	Added explicit type conversion for sqrt argument, wrapped 3D code
src/simulation/m_start_up.fpp	Changed MPI I/O word size to 4 bytes, extensive mixed precision conversions
src/simulation/m_sim_helpers.fpp	Wrapped 3D CFL calculations in case optimization guards
src/simulation/m_riemann_solvers.fpp	Expanded private variable lists in parallel loops for thread safety
src/simulation/m_rhs.fpp	Changed loop iterators to 8-byte integers, wrapped OpenACC directives
src/simulation/m_qbmm.fpp	Added case optimization for bubble model coefficient calculations
src/simulation/m_muscl.fpp	Renamed reconstruction workspace arrays with `_muscl` suffix, fixed indentation
src/simulation/m_mhd.fpp	Added missing private variables to parallel loop
src/simulation/m_ibm.fpp	Converted ghost point routines to use mixed precision
src/simulation/m_hypoelastic.fpp	Added private variables to GPU loops
src/simulation/m_hyperelastic.fpp	Added missing private variable to parallel loop
src/simulation/m_global_parameters.fpp	Cleaned up GPU data transfer directives
src/simulation/m_fftw.fpp	Refactored FFT filtering for improved device management
src/simulation/m_derived_variables.fpp	Updated time-stepping array references
src/simulation/m_data_output.fpp	Renamed downsampling temporary arrays, updated MPI I/O types
src/simulation/m_cbc.fpp	Made dpres_ds a local variable, wrapped case-specific code
src/simulation/m_bubbles_EL_kernels.fpp	Converted to mixed precision with explicit type conversions
src/simulation/m_bubbles_EL.fpp	Changed q_beta from vector_field to scalar_field array
src/simulation/m_bubbles_EE.fpp	Added private variables and explicit type conversions
src/simulation/m_body_forces.fpp	Removed unnecessary parentheses
src/simulation/m_acoustic_src.fpp	Updated loop structure and allocation
src/simulation/include/inline_capillary.fpp	Wrapped 3D capillary stress code in optimization guards
src/pre_process/m_start_up.fpp	Added simplex_perturb to namelist, added finalization call
src/pre_process/m_simplex_noise.fpp	New module implementing 2D/3D simplex noise for perturbations
src/pre_process/m_perturbation.fpp	Added simplex noise perturbation routine
src/pre_process/m_mpi_proxy.fpp	Added MPI broadcast for simplex parameters
src/pre_process/m_initial_condition.fpp	Updated bc_type indexing, added simplex perturbation call
src/pre_process/m_global_parameters.fpp	Added simplex_perturb parameters
setupNB.sh	New setup script for NVHPC compilers on specific system

examples/3D_IGR_jet_1fluid/case.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/pre_process/m_simplex_noise.fpp

cursor

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Bug: Array Index Mismatch Breaks Deallocation

The deallocation code attempts to deallocate bc_type(1, -1)%sf, bc_type(2, -1)%sf, and bc_type(3, -1)%sf, but the bc_type array dimension was changed from (-1:1) to (1:2) throughout the codebase. The second dimension no longer has index -1, causing an out-of-bounds access. The indices should be 1 and 2 instead of -1 and 1.

src/post_process/m_data_input.f90#L671-L676

MFC/src/post_process/m_data_input.f90

Lines 671 to 676 in e1d9044

    
           deallocate (bc_type(1, -1)%sf, bc_type(1, 1)%sf) 
        
           if (n > 0) then 
        
               deallocate (bc_type(2, -1)%sf, bc_type(2, 1)%sf) 
        
               if (p > 0) then 
        
                   deallocate (bc_type(3, -1)%sf, bc_type(3, 1)%sf)

src/pre_process/m_boundary_conditions.fpp

src/simulation/m_riemann_solvers.fpp

src/pre_process/m_initial_condition.fpp

CMakeLists.txt

examples/3D_IGR_jet_1fluid/case.py

wilfonba · 2025-11-12T19:01:23Z

load_amd.sh

This file needs to be deleted, moved to misc/, or incorporated into the toolchain somehow. This PR doesn't add AMD compilers anyway, so my suggestion would be deleted for now.

setupNB.sh

src/common/include/3dHardcodedIC.fpp

src/simulation/m_rhs.fpp

src/simulation/m_time_steppers.fpp

wilfonba · 2025-11-12T19:36:06Z

toolchain/pyproject.toml

-    "pyrometheus == 1.0.5",
-
+    #"pyrometheus == 1.0.5",
+    "pyrometheus @ git+https://github.com/wilfonba/pyrometheus-wilfong.git@OpenMPTest",


I need to merge this with the Pyro upstream...

The relevant PR has been merged. Just waiting on an official release now

is the release done now? if not, message Esteban

src/simulation/p_main.fpp

wilfonba

Approve for benchmark

wilfonba · 2025-11-13T03:51:48Z

I addressed a bunch of my own comments and pushed the changes. Hopefully the weird phoenix test faiulure goes away. It didn't look like an MFC code problem. The rest of the comments are more relevant to a general support of half precision, so I don't think they're very urgent at the moment.

wilfonba

Approve for benchmark

src/common/m_boundary_common.fpp

sbryngelson · 2025-11-14T02:54:23Z

src/common/m_boundary_common.fpp

+                                call s_symmetry(q_prim_vf, 3, 1, k, l, pb_in, mv_in)
+                            case (BC_PERIODIC)
+                                call s_periodic(q_prim_vf, 3, 1, k, l, pb_in, mv_in)
+                            case (BC_SlIP_WALL)


typo (though it already existed)

prathi-wind and others added 30 commits July 22, 2025 13:16

Update mfc python and cmake to support OpenMP

8584e89

Fixed issue with not compiling on CPU builds

7521731

Temporary commit

90c6738

OMP parallel and parallel loop

06a783e

Removed pure markings

2abfad5

Added routine and declare and partial data, non compiling

db1b8c5

Some manual changes to codebase, and implemented attach

ed29d13

Changed parallel loop to also include the end parallel

c1b41a6

Ran formatter

29e9404

Fixed some issues with matching start and end of parallel loop macros…

cec9867

…, started work on enter and exit data, compiles

Moved macro code to their corresponding file, and finished enter data…

33eca5f

…, exit data, and update

remove line that sets default_val to empty string

11822f5

Fixed GPU_PARALLEL for omp and ran formatter

314fa13

Add syscheck of OpenMP, add omp support for GPU_HOST_DATA, ATOMIC, an…

9d23036

…d WAIT

Update var name

de586ad

Change how parallel loop is translated

84ddc01

Ran formatter

7246e9b

Remove extraneous build flags

d5381aa

Remove thermochem function calls

dbcb6f3

Remove LTO add always to data allocation omp, switch delete to releas…

94222f4

…e, add mappers to derived types, change how allocate is done

Fixed parallel loop when no OpenMP or OpenACC

473d19d

Update how allocate macro works, update riemann solver, and parallel …

5703c19

…loop

Passing most 1D cases

7021690

Fixed IGR 2D, readded parallel loop in cbc, undid changes in derived …

828d9d8

…types, removed rest of pure functions, fix issue with acoustic on nvfortran

Forgot to add something for IGR and add back parallel loop for data o…

cb02511

…utput

change binding + test suite works

69b5792

Added missing space

a01f262

Chemistry works with OpenACC and almost works with OpenMP

d7fbcab

Added a half-precision data type

b3c1ad1

Readd LTO to cmake

0d550aa

Copilot finished reviewing on behalf of sbryngelson November 12, 2025 17:21

qodo-merge-pro bot reviewed Nov 12, 2025

View reviewed changes

Copilot AI reviewed Nov 12, 2025

View reviewed changes

examples/3D_IGR_jet_1fluid/case.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 12, 2025

View reviewed changes

src/pre_process/m_simplex_noise.fpp Show resolved Hide resolved

cursor bot reviewed Nov 12, 2025

View reviewed changes

sbryngelson approved these changes Nov 12, 2025

View reviewed changes

Anand Radhakrishnan and others added 3 commits November 12, 2025 13:40

fix post_process benchmark

a774051

fix benchmarking on frontier

87fae79

fix benchmarking on frontier

771797b

sbryngelson approved these changes Nov 12, 2025

View reviewed changes

wilfonba reviewed Nov 12, 2025

View reviewed changes

wilfonba approved these changes Nov 12, 2025

View reviewed changes

wilfonba added 3 commits November 12, 2025 22:19

example case fix ups

f1afee3

CMakeLists.txt fix

e08bcae

Addressing more PR comments

b2ad78a

wilfonba and others added 6 commits November 12, 2025 23:03

fix boundary condition patches and simplex noise

88cbac0

more bug fixes

7ae0a65

Merge remote-tracking branch 'upstream/master' into openmp_cce

78ff9c6

format

4f2f579

fix IB markers seg fault

56097f8

Fix for benchmarking on frontier

8a66941

wilfonba self-requested a review November 13, 2025 22:54

wilfonba approved these changes Nov 13, 2025

View reviewed changes

Anand Radhakrishnan added 2 commits November 13, 2025 21:18

fix to make benchmark work on frontier

f5ac529

format

5fa68fa

wilfonba approved these changes Nov 14, 2025

View reviewed changes

sbryngelson reviewed Nov 14, 2025

View reviewed changes

src/common/m_boundary_common.fpp Show resolved Hide resolved

sbryngelson reviewed Nov 14, 2025

View reviewed changes

anandrdbz merged commit 7169711 into MFlowCode:master Nov 14, 2025
44 of 47 checks passed


	deallocate (bc_type(1, -1)%sf, bc_type(1, 1)%sf)
	if (n > 0) then
	deallocate (bc_type(2, -1)%sf, bc_type(2, 1)%sf)
	if (p > 0) then
	deallocate (bc_type(3, -1)%sf, bc_type(3, 1)%sf)

Openmp cce #1035

Openmp cce #1035

Conversation

prathi-wind commented Nov 10, 2025 • edited by qodo-merge-pro bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Description

Type of change

Scope

How Has This Been Tested?

Checklist

If your code changes any code source files (anything in src/simulation)

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-merge-pro bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

wilfonba Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

qodo-merge-pro bot Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Bug: Array Index Mismatch Breaks Deallocation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wilfonba Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

wilfonba Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

wilfonba Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

sbryngelson Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wilfonba left a comment

Choose a reason for hiding this comment

Uh oh!

wilfonba commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wilfonba left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sbryngelson Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

prathi-wind commented Nov 10, 2025 •

edited by qodo-merge-pro bot

Loading

If your code changes any code source files (anything in `src/simulation`)

wilfonba commented Nov 13, 2025 •

edited

Loading