Skip to content

Conversation

@prathi-wind
Copy link
Collaborator

@prathi-wind prathi-wind commented Nov 10, 2025

User description

Description

Add OpenMP support to MFC on the Cray Compiler. Passes all test cases, and adds CI to test on Frontier.

Also, adds in mixed precision made added for Gordon Bell.

Fixes #(issue) [optional]

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Something else

Scope

  • This PR comprises a set of related changes with a common goal

If you cannot check the above box, please split your PR into multiple PRs that each have a common goal.

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.
Provide instructions so we can reproduce.
Please also list any relevant details for your test configuration

  • Test A
  • Test B

Test Configuration:

  • What computers and compilers did you use to test this:

Checklist

  • I have added comments for the new code
  • I added Doxygen docstrings to the new code
  • I have made corresponding changes to the documentation (docs/)
  • I have added regression tests to the test suite so that people can verify in the future that the feature is behaving as expected
  • I have added example cases in examples/ that demonstrate my new feature performing as expected.
    They run to completion and demonstrate "interesting physics"
  • I ran ./mfc.sh format before committing my code
  • New and existing tests pass locally with my changes, including with GPU capability enabled (both NVIDIA hardware with NVHPC compilers and AMD hardware with CRAY compilers) and disabled
  • This PR does not introduce any repeated code (it follows the DRY principle)
  • I cannot think of a way to condense this code and reduce any introduced additional line count

If your code changes any code source files (anything in src/simulation)

To make sure the code is performing as expected on GPU devices, I have:

  • Checked that the code compiles using NVHPC compilers
  • Checked that the code compiles using CRAY compilers
  • Ran the code on either V100, A100, or H100 GPUs and ensured the new feature performed as expected (the GPU results match the CPU results)
  • Ran the code on MI200+ GPUs and ensure the new features performed as expected (the GPU results match the CPU results)
  • Enclosed the new feature via nvtx ranges so that they can be identified in profiles
  • Ran a Nsight Systems profile using ./mfc.sh run XXXX --gpu -t simulation --nsys, and have attached the output file (.nsys-rep) and plain text results to this PR
  • Ran a Rocprof Systems profile using ./mfc.sh run XXXX --gpu -t simulation --rsys --hip-trace, and have attached the output file and plain text results to this PR.
  • Ran my code using various numbers of different GPUs (1, 2, and 8, for example) in parallel and made sure that the results scale similarly to what happens if you run without the new code/feature

PR Type

Enhancement, Bug fix


Description

  • Add OpenMP support to MFC on the Cray Compiler with comprehensive case optimization

  • Implement mixed precision support throughout the codebase for improved performance and memory efficiency

  • Refactor boundary condition arrays from negative indexing (-1:1) to positive indexing (1:2) for optimization

  • Add case optimization guards to skip unnecessary 3D computations in 2D simulations (WENO, viscous, MHD, capillary)

  • Introduce simplex noise generation module for procedural perturbation support

  • Fix GPU compatibility issues in chemistry and FFT modules for both NVIDIA and AMD compilers

  • Update MPI operations to support mixed precision with proper type conversions and 64-bit indexing

  • Refactor bubble Eulerian projection data structure for improved memory layout

  • Fix MUSCL workspace array naming conflicts and GPU parallel loop syntax issues

  • Add CI testing on Frontier supercomputer for Cray Compiler validation


Diagram Walkthrough

flowchart LR
  A["Cray Compiler<br/>OpenMP Support"] --> B["Case Optimization<br/>2D/3D Guards"]
  A --> C["Mixed Precision<br/>wp/stp Types"]
  B --> D["WENO, Viscous,<br/>MHD Modules"]
  C --> E["Boundary Arrays<br/>MPI Operations"]
  C --> F["Bubble EL,<br/>QBMM Modules"]
  G["GPU Compatibility<br/>Fixes"] --> H["Chemistry<br/>FFT Modules"]
  I["Simplex Noise<br/>Module"] --> J["Perturbation<br/>Support"]
  D --> K["Performance<br/>Optimization"]
  E --> K
  H --> K
Loading

File Walkthrough

Relevant files
Enhancement
17 files
m_boundary_common.fpp
Refactor boundary arrays and add case optimization support

src/common/m_boundary_common.fpp

  • Changed boundary condition array indexing from (-1:1) to (1:2) to
    support case optimization
  • Updated all references from negative indices (-1, 1) to positive
    indices (1, 2) throughout the module
  • Added conditional compilation directives #:if not
    MFC_CASE_OPTIMIZATION or num_dims > 2 to skip 3D boundary operations
    in optimized 2D cases
  • Changed data type from real(wp) to real(stp) for mixed precision
    support in buffer parameters
  • Updated MPI I/O operations to handle mixed precision with MPI_BYTE and
    element count calculations
+249/-231
m_weno.fpp
Add case optimization guards to WENO reconstruction schemes

src/simulation/m_weno.fpp

  • Added #:include 'case.fpp' directive for case optimization support
  • Wrapped WENO order 5 and 7 reconstruction code with conditional
    compilation #:if not MFC_CASE_OPTIMIZATION or weno_num_stencils > N
  • Nested TENO stencil calculations within additional optimization guards
    for higher-order schemes
  • Maintained existing WENO algorithm logic while enabling compile-time
    optimization for reduced stencil cases
+286/-274
m_bubbles_EL.fpp
Refactor bubble Eulerian projection data structure for mixed precision

src/simulation/m_bubbles_EL.fpp

  • Changed q_beta from type(vector_field) to type(scalar_field),
    dimension(:), allocatable for better memory layout
  • Updated allocation from q_beta%vf(1:q_beta_idx) to
    q_beta(1:q_beta_idx) with individual scalar field setup
  • Changed data type from real(wp) to real(stp) for mixed precision
    support in gradient calculations
  • Updated s_gradient_dir subroutine signature to accept raw arrays
    instead of scalar field types
  • Updated all references from q_beta%vf(i)%sf to q_beta(i)%sf throughout
    the module
+50/-48 
m_boundary_conditions.fpp
Update boundary condition array indexing to positive indices

src/pre_process/m_boundary_conditions.fpp

  • Updated boundary condition array indexing from (-1:1) to (1:2) in
    function signatures
  • Added conditional logic to map old negative indices (-1) to new
    positive index (1) and (1) to (2)
  • Updated all bc_type array accesses to use new indexing scheme with
    explicit if-statements for location mapping
+20/-12 
m_viscous.fpp
Add case optimization conditionals to viscous module         

src/simulation/m_viscous.fpp

  • Added case.fpp include directive for case optimization support
  • Wrapped viscous stress tensor computation blocks with
    MFC_CASE_OPTIMIZATION conditional to skip 3D-specific calculations in
    2D cases
  • Reorganized gradient computation loops with conditional compilation
    for dimensional optimization
  • Improved code indentation and structure for better readability
+319/-316
m_riemann_solvers.fpp
Optimize MHD calculations and improve GPU parallelization

src/simulation/m_riemann_solvers.fpp

  • Added case.fpp include for case optimization support
  • Wrapped MHD-related 3D vector operations with MFC_CASE_OPTIMIZATION
    conditionals to optimize 2D cases
  • Expanded private variable lists in GPU parallel loops for better
    memory management
  • Fixed duplicate variable assignments and improved variable
    initialization in HLLC solver
  • Added explicit type conversions for mixed precision support
+78/-35 
m_mpi_common.fpp
Support mixed precision and dynamic dimensionality in MPI

src/common/m_mpi_common.fpp

  • Added case.fpp include directive for case optimization
  • Changed halo_size from integer to integer(kind=8) for 64-bit support
  • Fixed dimension indexing to use num_dims instead of hardcoded value 3
    for 2D/3D compatibility
  • Added explicit type conversions between wp and stp precision kinds in
    MPI buffer operations
+23/-23 
m_cbc.fpp
Optimize CBC module with case-specific conditionals           

src/simulation/m_cbc.fpp

  • Added case.fpp include for case optimization support
  • Moved dpres_ds variable declaration from module level to local scope
    within subroutine
  • Added dpres_ds to private variable list in GPU parallel loop
  • Wrapped dpi_inf_dt assignment with MFC_CASE_OPTIMIZATION conditional
    for single-fluid optimization
  • Improved GPU update directive formatting
+7/-8     
m_global_parameters.fpp
Add simplex noise perturbation parameters                               

src/pre_process/m_global_parameters.fpp

  • Added new simplex_perturb logical flag for simplex noise perturbation
    support
  • Added simplex_params derived type variable to store simplex noise
    parameters
  • Initialized simplex perturbation parameters in module initialization
    routine
+12/-0   
m_qbmm.fpp
Mixed precision support and case optimization for QBMM     

src/simulation/m_qbmm.fpp

  • Changed pb parameter type from real(wp) to real(stp) for mixed
    precision support
  • Wrapped bubble model coefficient calculations with preprocessor
    conditionals for case optimization
  • Added missing loop variables i1, i2, j to GPU parallel loop private
    list
  • Added TODO comment regarding rhs_pb and rhs_mv precision types
+98/-87 
m_simplex_noise.fpp
New simplex noise generation module                                           

src/pre_process/m_simplex_noise.fpp

  • New module implementing 3D and 2D Perlin simplex noise functions
  • Includes gradient lookup tables and permutation vectors for noise
    generation
  • Provides f_simplex3d and f_simplex2d functions for procedural noise
+245/-0 
m_fftw.fpp
FFT filter GPU implementation refactoring                               

src/simulation/m_fftw.fpp

  • Refactored GPU FFT filter implementation to use direct device
    addresses instead of pointers
  • Removed #:block UNDEF_CCE wrapper and simplified GPU data management
  • Restructured loop logic to process initial ring separately before main
    loop
  • Improved compatibility with both NVIDIA and AMD GPU compilers
+115/-130
m_start_up.fpp
Mixed precision I/O and time stepping improvements             

src/simulation/m_start_up.fpp

  • Changed WP_MOK from 8 bytes to 4 bytes for mixed precision I/O
  • Updated MPI file read calls to use mpi_io_type multiplier for data
    size
  • Added logic to copy q_cons_ts(1) to q_cons_ts(2) for multi-stage time
    steppers
  • Fixed NaN checking to use explicit real() conversion for mixed
    precision
  • Added GPU updates for bubble and chemistry parameters
  • Removed unused GPU declare directive for q_cons_temp
+88/-54 
m_rhs.fpp
Mixed precision and GPU compatibility in RHS module           

src/simulation/m_rhs.fpp

  • Added conditional GPU declare directives for OpenACC-specific
    declarations
  • Changed loop variable types to integer(kind=8) for large array
    indexing
  • Wrapped flux allocation logic with if (.not. igr) condition
  • Updated bc_type parameter dimension from 1:3, -1:1 to 1:3, 1:2
  • Changed pb_in, mv_in parameters to real(stp) type for mixed precision
  • Added TODO comments about precision type consistency
+57/-45 
m_icpp_patches.fpp
Mixed precision support for patch identification arrays   

src/pre_process/m_icpp_patches.fpp

  • Added conditional compilation for patch_id_fp array type based on
    MFC_MIXED_PRECISION flag
  • Uses integer(kind=1) for mixed precision mode, standard integer
    otherwise
  • Applied consistently across all patch subroutines for memory
    efficiency
+75/-4   
m_ibm.fpp
Mixed precision support for IBM module                                     

src/simulation/m_ibm.fpp

  • Changed pb_in and mv_in parameters to real(stp) type for mixed
    precision
  • Added missing variables to GPU parallel loop private list
  • Added type casting for pb_in and mv_in in interpolation calls
  • Changed levelset_in access to use explicit real() conversion
  • Updated pb_in, mv_in intent from INOUT to IN in interpolation
    subroutine
+7/-5     
inline_capillary.fpp
Case optimization for capillary tensor calculations           

src/simulation/include/inline_capillary.fpp

  • Wrapped 3D capillary tensor calculations with preprocessor conditional
  • Only computes 3D components when not using case optimization or when
    num_dims > 2
  • Reduces unnecessary computations for 2D simulations
+7/-6     
Bug fix
2 files
m_muscl.fpp
Rename MUSCL arrays and fix GPU loop syntax                           

src/simulation/m_muscl.fpp

  • Renamed MUSCL workspace arrays from v_rs_ws_x/y/z to
    v_rs_ws_x/y/z_muscl to avoid naming conflicts
  • Fixed GPU parallel loop syntax and indentation consistency
  • Corrected #:endcall directives to include full macro name for clarity
  • Improved code formatting and nested subroutine indentation
+158/-158
m_chemistry.fpp
GPU compatibility fixes for chemistry module                         

src/common/m_chemistry.fpp

  • Removed sequential GPU loop directives that were causing compilation
    issues
  • Added temporary variable T_in for type conversion in temperature
    calculations
  • Wrapped large GPU parallel loop with #:block UNDEF_AMD to handle AMD
    compiler differences
  • Added missing variables to GPU parallel loop private and copyin lists
+113/-112
Additional files
48 files
bench.yml +8/-0     
submit-bench.sh +1/-1     
submit-bench.sh +1/-1     
test.yml +0/-4     
CMakeLists.txt +18/-7   
case.py +104/-0 
load_amd.sh +7/-0     
setupNB.sh +5/-0     
3dHardcodedIC.fpp +113/-2 
macros.fpp +5/-4     
omp_macros.fpp +25/-1   
m_derived_types.fpp +26/-9   
m_helper.fpp +6/-6     
m_precision_select.f90 +14/-0   
m_variables_conversion.fpp +10/-34 
m_data_input.f90 +11/-10 
m_assign_variables.fpp +18/-4   
m_checker.fpp +23/-0   
m_data_output.fpp +32/-34 
m_initial_condition.fpp +35/-14 
m_mpi_proxy.fpp +29/-2   
m_perturbation.fpp +90/-1   
m_start_up.fpp +3/-2     
m_acoustic_src.fpp +8/-11   
m_body_forces.fpp +1/-1     
m_bubbles_EE.fpp +4/-4     
m_bubbles_EL_kernels.fpp +15/-15 
m_data_output.fpp +25/-26 
m_derived_variables.fpp +15/-13 
m_global_parameters.fpp +6/-11   
m_hyperelastic.fpp +1/-1     
m_hypoelastic.fpp +4/-4     
m_igr.fpp +1616/-1385
m_mhd.fpp +1/-1     
m_sim_helpers.fpp +31/-26 
m_surface_tension.fpp +38/-34 
m_time_steppers.fpp +86/-37 
p_main.fpp +2/-0     
golden-metadata.txt +154/-0 
golden.txt +10/-0   
build.py +2/-1     
lock.py +1/-1     
case_dicts.py +16/-0   
input.py +1/-1     
state.py +3/-3     
modules +4/-0     
pyproject.toml +3/-2     
frontier.mako +11/-10 

prathi-wind and others added 30 commits July 22, 2025 13:16
…, started work on enter and exit data, compiles
…e, add mappers to derived types, change how allocate is done
…types, removed rest of pure functions, fix issue with acoustic on nvfortran
Copilot finished reviewing on behalf of sbryngelson November 12, 2025 17:21
Comment on lines +1944 to 1959
bc_type(1, 1)%sf(:, :, :) = int(min(bc_x%beg, 0), kind=1)
bc_type(1, 2)%sf(:, :, :) = int(min(bc_x%end, 0), kind=1)
$:GPU_UPDATE(device='[bc_type(1,1)%sf,bc_type(1,2)%sf]')

if (n > 0) then
bc_type(2, -1)%sf(:, :, :) = bc_y%beg
bc_type(2, 1)%sf(:, :, :) = bc_y%end
$:GPU_UPDATE(device='[bc_type(2,-1)%sf,bc_type(2,1)%sf]')

if (p > 0) then
bc_type(3, -1)%sf(:, :, :) = bc_z%beg
bc_type(3, 1)%sf(:, :, :) = bc_z%end
$:GPU_UPDATE(device='[bc_type(3,-1)%sf,bc_type(3,1)%sf]')
end if
bc_type(2, 1)%sf(:, :, :) = int(min(bc_y%beg, 0), kind=1)
bc_type(2, 2)%sf(:, :, :) = int(min(bc_y%end, 0), kind=1)
$:GPU_UPDATE(device='[bc_type(2,1)%sf,bc_type(2,2)%sf]')
#:if not MFC_CASE_OPTIMIZATION or num_dims > 2
if (p > 0) then
bc_type(3, 1)%sf(:, :, :) = int(min(bc_z%beg, 0), kind=1)
bc_type(3, 2)%sf(:, :, :) = int(min(bc_z%end, 0), kind=1)
$:GPU_UPDATE(device='[bc_type(3,1)%sf,bc_type(3,2)%sf]')
end if
#:endif
end if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Correct the default boundary condition assignment by removing the min(..., 0) logic, which incorrectly handles MPI processor ranks. The previous logic of direct assignment from bc_...%... should be restored with a type cast. [possible issue, importance: 10]

Suggested change
bc_type(1, 1)%sf(:, :, :) = int(min(bc_x%beg, 0), kind=1)
bc_type(1, 2)%sf(:, :, :) = int(min(bc_x%end, 0), kind=1)
$:GPU_UPDATE(device='[bc_type(1,1)%sf,bc_type(1,2)%sf]')
if (n > 0) then
bc_type(2, -1)%sf(:, :, :) = bc_y%beg
bc_type(2, 1)%sf(:, :, :) = bc_y%end
$:GPU_UPDATE(device='[bc_type(2,-1)%sf,bc_type(2,1)%sf]')
if (p > 0) then
bc_type(3, -1)%sf(:, :, :) = bc_z%beg
bc_type(3, 1)%sf(:, :, :) = bc_z%end
$:GPU_UPDATE(device='[bc_type(3,-1)%sf,bc_type(3,1)%sf]')
end if
bc_type(2, 1)%sf(:, :, :) = int(min(bc_y%beg, 0), kind=1)
bc_type(2, 2)%sf(:, :, :) = int(min(bc_y%end, 0), kind=1)
$:GPU_UPDATE(device='[bc_type(2,1)%sf,bc_type(2,2)%sf]')
#:if not MFC_CASE_OPTIMIZATION or num_dims > 2
if (p > 0) then
bc_type(3, 1)%sf(:, :, :) = int(min(bc_z%beg, 0), kind=1)
bc_type(3, 2)%sf(:, :, :) = int(min(bc_z%end, 0), kind=1)
$:GPU_UPDATE(device='[bc_type(3,1)%sf,bc_type(3,2)%sf]')
end if
#:endif
end if
bc_type(1, 1)%sf(:, :, :) = int(bc_x%beg, kind=1)
bc_type(1, 2)%sf(:, :, :) = int(bc_x%end, kind=1)
$:GPU_UPDATE(device='[bc_type(1,1)%sf,bc_type(1,2)%sf]')
if (n > 0) then
bc_type(2, 1)%sf(:, :, :) = int(bc_y%beg, kind=1)
bc_type(2, 2)%sf(:, :, :) = int(bc_y%end, kind=1)
$:GPU_UPDATE(device='[bc_type(2,1)%sf,bc_type(2,2)%sf]')
#:if not MFC_CASE_OPTIMIZATION or num_dims > 2
if (p > 0) then
bc_type(3, 1)%sf(:, :, :) = int(bc_z%beg, kind=1)
bc_type(3, 2)%sf(:, :, :) = int(bc_z%end, kind=1)
$:GPU_UPDATE(device='[bc_type(3,1)%sf,bc_type(3,2)%sf]')
end if
#:endif
end if

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this suggestion is irrelevant. bc_type is only used for negative values of bc_x[y,z]%beg[end]. If bc_x[y,z]%beg[end] is positive then it does MPI communication and bc_type is never touched.

Comment on lines +311 to +519
if (shear_stress) then ! Shear stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end

$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do

if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp

if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)
end if
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp

if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
alpha_visc_sum = 0._wp

if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do

alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

end if

if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)

if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_visc(i) = dflt_real

if (Re_size(i) > 0) Re_visc(i) = 0._wp
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
end do

Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

end do
end if
end if
else
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp

alpha_visc_sum = 0._wp
tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(1)

if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do
tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &
q_prim_vf(momxe)%sf(j, k, l))/ &
y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &
Re_visc(1)

alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
$:GPU_LOOP(parallelism='[seq]')
do i = 2, 3
tau_Re_vf(contxe + i)%sf(j, k, l) = &
tau_Re_vf(contxe + i)%sf(j, k, l) - &
tau_Re(2, i)

tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
end do

end if
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if

if (bulk_stress) then ! Bulk stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end

$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do

if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_visc(i) = dflt_real
if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp

if (Re_size(i) > 0) Re_visc(i) = 0._wp
if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)
end if
else
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp

Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

end do
end if
end if

tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(1)

tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &
q_prim_vf(momxe)%sf(j, k, l))/ &
y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &
Re_visc(1)

$:GPU_LOOP(parallelism='[seq]')
do i = 2, 3
tau_Re_vf(contxe + i)%sf(j, k, l) = &
tau_Re_vf(contxe + i)%sf(j, k, l) - &
tau_Re(2, i)

tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
end do

end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if
alpha_visc_sum = 0._wp

if (bulk_stress) then ! Bulk stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end
if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do

$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do
alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
end if

if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)
end if
else
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp

alpha_visc_sum = 0._wp

if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do

alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)

end if

$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do

if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_visc(i) = dflt_real

if (Re_size(i) > 0) Re_visc(i) = 0._wp
if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
end do
do i = 1, 2
Re_visc(i) = dflt_real

if (Re_size(i) > 0) Re_visc(i) = 0._wp
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
end do

Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)

end do
end do
end if
end if
end if

tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(2)
tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(2)

tau_Re_vf(momxb + 1)%sf(j, k, l) = &
tau_Re_vf(momxb + 1)%sf(j, k, l) - &
tau_Re(2, 2)
tau_Re_vf(momxb + 1)%sf(j, k, l) = &
tau_Re_vf(momxb + 1)%sf(j, k, l) - &
tau_Re(2, 2)

tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)
tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)

end do
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if
#:endcall GPU_PARALLEL_LOOP
end if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Refactor the duplicated logic within the shear_stress and bulk_stress blocks into a separate subroutine to improve code maintainability. [general, importance: 7]

Suggested change
if (shear_stress) then ! Shear stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do
if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)
end if
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
alpha_visc_sum = 0._wp
if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do
alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
end if
if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)
if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_visc(i) = dflt_real
if (Re_size(i) > 0) Re_visc(i) = 0._wp
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
end do
Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
end do
end if
end if
else
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
alpha_visc_sum = 0._wp
tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(1)
if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do
tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &
q_prim_vf(momxe)%sf(j, k, l))/ &
y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &
Re_visc(1)
alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
$:GPU_LOOP(parallelism='[seq]')
do i = 2, 3
tau_Re_vf(contxe + i)%sf(j, k, l) = &
tau_Re_vf(contxe + i)%sf(j, k, l) - &
tau_Re(2, i)
tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
end do
end if
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if
if (bulk_stress) then ! Bulk stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do
if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_visc(i) = dflt_real
if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
if (Re_size(i) > 0) Re_visc(i) = 0._wp
if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)
end if
else
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
end do
end if
end if
tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(1)
tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &
q_prim_vf(momxe)%sf(j, k, l))/ &
y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &
Re_visc(1)
$:GPU_LOOP(parallelism='[seq]')
do i = 2, 3
tau_Re_vf(contxe + i)%sf(j, k, l) = &
tau_Re_vf(contxe + i)%sf(j, k, l) - &
tau_Re(2, i)
tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
end do
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if
alpha_visc_sum = 0._wp
if (bulk_stress) then ! Bulk stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end
if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = q_prim_vf(i)%sf(j, k, l)
if (bubbles_euler .and. num_fluids == 1) then
alpha_visc(i) = 1._wp - q_prim_vf(E_idx + i)%sf(j, k, l)
else
alpha_visc(i) = q_prim_vf(E_idx + i)%sf(j, k, l)
end if
end do
alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
if (bubbles_euler) then
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
end if
if (mpp_lim .and. (model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else if ((model_eqns == 2) .and. (num_fluids > 2)) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids - 1
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
else
rho_visc = alpha_rho_visc(1)
gamma_visc = gammas(1)
pi_inf_visc = pi_infs(1)
end if
else
rho_visc = 0._wp
gamma_visc = 0._wp
pi_inf_visc = 0._wp
alpha_visc_sum = 0._wp
if (mpp_lim) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
alpha_rho_visc(i) = max(0._wp, alpha_rho_visc(i))
alpha_visc(i) = min(max(0._wp, alpha_visc(i)), 1._wp)
alpha_visc_sum = alpha_visc_sum + alpha_visc(i)
end do
alpha_visc = alpha_visc/max(alpha_visc_sum, sgm_eps)
end if
$:GPU_LOOP(parallelism='[seq]')
do i = 1, num_fluids
rho_visc = rho_visc + alpha_rho_visc(i)
gamma_visc = gamma_visc + alpha_visc(i)*gammas(i)
pi_inf_visc = pi_inf_visc + alpha_visc(i)*pi_infs(i)
end do
if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do i = 1, 2
Re_visc(i) = dflt_real
if (Re_size(i) > 0) Re_visc(i) = 0._wp
if (viscous) then
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
end do
do i = 1, 2
Re_visc(i) = dflt_real
if (Re_size(i) > 0) Re_visc(i) = 0._wp
$:GPU_LOOP(parallelism='[seq]')
do q = 1, Re_size(i)
Re_visc(i) = alpha_visc(Re_idx(i, q))/Res_viscous(i, q) &
+ Re_visc(i)
end do
Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
Re_visc(i) = 1._wp/max(Re_visc(i), sgm_eps)
end do
end do
end if
end if
end if
tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(2)
tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(2)
tau_Re_vf(momxb + 1)%sf(j, k, l) = &
tau_Re_vf(momxb + 1)%sf(j, k, l) - &
tau_Re(2, 2)
tau_Re_vf(momxb + 1)%sf(j, k, l) = &
tau_Re_vf(momxb + 1)%sf(j, k, l) - &
tau_Re(2, 2)
tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)
tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)
end do
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if
#:endcall GPU_PARALLEL_LOOP
end if
if (shear_stress) then ! Shear stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re, rho_visc, gamma_visc, pi_inf_visc, alpha_visc_sum]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end
call s_compute_common_stress_terms(j, k, l, alpha_rho_visc, alpha_visc, rho_visc, gamma_visc, pi_inf_visc, Re_visc)
tau_Re(2, 2) = -(2._wp/3._wp)*grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(1)
tau_Re(2, 3) = ((grad_z_vf(2)%sf(j, k, l) - &
q_prim_vf(momxe)%sf(j, k, l))/ &
y_cc(k) + grad_y_vf(3)%sf(j, k, l))/ &
Re_visc(1)
$:GPU_LOOP(parallelism='[seq]')
do i = 2, 3
tau_Re_vf(contxe + i)%sf(j, k, l) = &
tau_Re_vf(contxe + i)%sf(j, k, l) - &
tau_Re(2, i)
tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(contxe + i)%sf(j, k, l)*tau_Re(2, i)
end do
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if
if (bulk_stress) then ! Bulk stresses
#:call GPU_PARALLEL_LOOP(collapse=3, private='[alpha_visc, alpha_rho_visc, Re_visc, tau_Re, rho_visc, gamma_visc, pi_inf_visc, alpha_visc_sum]')
do l = is3_viscous%beg, is3_viscous%end
do k = -1, 1
do j = is1_viscous%beg, is1_viscous%end
call s_compute_common_stress_terms(j, k, l, alpha_rho_visc, alpha_visc, rho_visc, gamma_visc, pi_inf_visc, Re_visc)
tau_Re(2, 2) = grad_z_vf(3)%sf(j, k, l)/y_cc(k)/ &
Re_visc(2)
tau_Re_vf(momxb + 1)%sf(j, k, l) = &
tau_Re_vf(momxb + 1)%sf(j, k, l) - &
tau_Re(2, 2)
tau_Re_vf(E_idx)%sf(j, k, l) = &
tau_Re_vf(E_idx)%sf(j, k, l) - &
q_prim_vf(momxb + 1)%sf(j, k, l)*tau_Re(2, 2)
end do
end do
end do
#:endcall GPU_PARALLEL_LOOP
end if

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds OpenMP support for the Cray Compiler Environment (CCE) and introduces mixed precision functionality developed for Gordon Bell submissions. The changes span toolchain configuration, simulation code optimization, and new simplex noise perturbation capabilities.

Key Changes:

  • Added mixed precision support with a new stp (storage precision) type alongside the existing wp (working precision) type
  • Implemented case-specific optimizations that conditionally compile code based on runtime parameters
  • Updated MPI I/O operations to handle mixed precision data types
  • Added simplex noise module for initial condition perturbations
  • Enhanced Frontier HPC template with better resource management (sbcast for binaries, nvme storage)

Reviewed Changes

Copilot reviewed 65 out of 67 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
toolchain/templates/frontier.mako Added nvme constraint, improved binary broadcast with sbcast, streamlined profiler placement
toolchain/pyproject.toml Updated pyrometheus dependency to development branch for OpenMP testing
toolchain/modules Duplicate entry for CSCS Santis configuration (lines 92-95)
toolchain/mfc/state.py Added mixed precision flag to MFCConfig, improved code formatting
toolchain/mfc/run/input.py Updated real_type selection logic to include mixed precision mode
toolchain/mfc/run/case_dicts.py Added simplex_params parameter definitions for density/velocity perturbations
toolchain/mfc/lock.py Incremented lock version to 8
toolchain/mfc/build.py Added MFC_MIXED_PRECISION CMake flag
tests/43B5FEBD/golden-metadata.txt New golden metadata file documenting test configuration
src/simulation/p_main.fpp Added unused status variable declaration
src/simulation/m_weno.fpp Added case optimization guards to conditionally compile WENO stencil code
src/simulation/m_viscous.fpp Wrapped 3D viscous stress calculations in case optimization guards
src/simulation/m_time_steppers.fpp Converted to mixed precision (stp) for time-stepping arrays, reorganized probe storage
src/simulation/m_surface_tension.fpp Added explicit type conversion for sqrt argument, wrapped 3D code
src/simulation/m_start_up.fpp Changed MPI I/O word size to 4 bytes, extensive mixed precision conversions
src/simulation/m_sim_helpers.fpp Wrapped 3D CFL calculations in case optimization guards
src/simulation/m_riemann_solvers.fpp Expanded private variable lists in parallel loops for thread safety
src/simulation/m_rhs.fpp Changed loop iterators to 8-byte integers, wrapped OpenACC directives
src/simulation/m_qbmm.fpp Added case optimization for bubble model coefficient calculations
src/simulation/m_muscl.fpp Renamed reconstruction workspace arrays with _muscl suffix, fixed indentation
src/simulation/m_mhd.fpp Added missing private variables to parallel loop
src/simulation/m_ibm.fpp Converted ghost point routines to use mixed precision
src/simulation/m_hypoelastic.fpp Added private variables to GPU loops
src/simulation/m_hyperelastic.fpp Added missing private variable to parallel loop
src/simulation/m_global_parameters.fpp Cleaned up GPU data transfer directives
src/simulation/m_fftw.fpp Refactored FFT filtering for improved device management
src/simulation/m_derived_variables.fpp Updated time-stepping array references
src/simulation/m_data_output.fpp Renamed downsampling temporary arrays, updated MPI I/O types
src/simulation/m_cbc.fpp Made dpres_ds a local variable, wrapped case-specific code
src/simulation/m_bubbles_EL_kernels.fpp Converted to mixed precision with explicit type conversions
src/simulation/m_bubbles_EL.fpp Changed q_beta from vector_field to scalar_field array
src/simulation/m_bubbles_EE.fpp Added private variables and explicit type conversions
src/simulation/m_body_forces.fpp Removed unnecessary parentheses
src/simulation/m_acoustic_src.fpp Updated loop structure and allocation
src/simulation/include/inline_capillary.fpp Wrapped 3D capillary stress code in optimization guards
src/pre_process/m_start_up.fpp Added simplex_perturb to namelist, added finalization call
src/pre_process/m_simplex_noise.fpp New module implementing 2D/3D simplex noise for perturbations
src/pre_process/m_perturbation.fpp Added simplex noise perturbation routine
src/pre_process/m_mpi_proxy.fpp Added MPI broadcast for simplex parameters
src/pre_process/m_initial_condition.fpp Updated bc_type indexing, added simplex perturbation call
src/pre_process/m_global_parameters.fpp Added simplex_perturb parameters
setupNB.sh New setup script for NVHPC compilers on specific system

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Bug: Array Index Mismatch Breaks Deallocation

The deallocation code attempts to deallocate bc_type(1, -1)%sf, bc_type(2, -1)%sf, and bc_type(3, -1)%sf, but the bc_type array dimension was changed from (-1:1) to (1:2) throughout the codebase. The second dimension no longer has index -1, causing an out-of-bounds access. The indices should be 1 and 2 instead of -1 and 1.

src/post_process/m_data_input.f90#L671-L676

deallocate (bc_type(1, -1)%sf, bc_type(1, 1)%sf)
if (n > 0) then
deallocate (bc_type(2, -1)%sf, bc_type(2, 1)%sf)
if (p > 0) then
deallocate (bc_type(3, -1)%sf, bc_type(3, 1)%sf)

Fix in Cursor Fix in Web


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file needs to be deleted, moved to misc/, or incorporated into the toolchain somehow. This PR doesn't add AMD compilers anyway, so my suggestion would be deleted for now.

"pyrometheus == 1.0.5",

#"pyrometheus == 1.0.5",
"pyrometheus @ git+https://github.com/wilfonba/pyrometheus-wilfong.git@OpenMPTest",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to merge this with the Pyro upstream...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relevant PR has been merged. Just waiting on an official release now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the release done now? if not, message Esteban

Copy link
Collaborator

@wilfonba wilfonba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve for benchmark

@wilfonba
Copy link
Collaborator

wilfonba commented Nov 13, 2025

I addressed a bunch of my own comments and pushed the changes. Hopefully the weird phoenix test faiulure goes away. It didn't look like an MFC code problem. The rest of the comments are more relevant to a general support of half precision, so I don't think they're very urgent at the moment.

@wilfonba wilfonba self-requested a review November 13, 2025 22:54
Copy link
Collaborator

@wilfonba wilfonba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve for benchmark

call s_symmetry(q_prim_vf, 3, 1, k, l, pb_in, mv_in)
case (BC_PERIODIC)
call s_periodic(q_prim_vf, 3, 1, k, l, pb_in, mv_in)
case (BC_SlIP_WALL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo (though it already existed)

@anandrdbz anandrdbz merged commit 7169711 into MFlowCode:master Nov 14, 2025
44 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants