diff --git a/.github/workflows/cleanliness.yml b/.github/workflows/cleanliness.yml
index ec472dce98..b02df12898 100644
--- a/.github/workflows/cleanliness.yml
+++ b/.github/workflows/cleanliness.yml
@@ -41,8 +41,8 @@ jobs:
- name: Setup Ubuntu
run: |
sudo apt update -y
- sudo apt install -y tar wget make cmake gcc g++ python3 python3-dev "openmpi-*" libopenmpi-dev
-
+ sudo apt install -y tar wget make cmake gcc g++ python3 python3-dev "openmpi-*" libopenmpi-dev libblas-dev liblapack-dev
+
- name: Build
run: |
(cd pr && /bin/bash mfc.sh build -j $(nproc) --debug 2> ../pr.txt)
diff --git a/.github/workflows/coverage.yml b/.github/workflows/coverage.yml
index 7487d8e550..d2c1b4ea4a 100644
--- a/.github/workflows/coverage.yml
+++ b/.github/workflows/coverage.yml
@@ -30,7 +30,7 @@ jobs:
- name: Setup Ubuntu
run: |
sudo apt update -y
- sudo apt install -y tar wget make cmake gcc g++ python3 python3-dev "openmpi-*" libopenmpi-dev
+ sudo apt install -y tar wget make cmake gcc g++ python3 python3-dev "openmpi-*" libopenmpi-dev libblas-dev liblapack-dev
- name: Build
run: /bin/bash mfc.sh build -j $(nproc) --gcov
diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml
index db618bea46..7eecc105c8 100644
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -53,7 +53,7 @@ jobs:
run: |
brew update
brew upgrade
- brew install coreutils python cmake fftw hdf5 gcc@15 boost open-mpi
+ brew install coreutils python cmake fftw hdf5 gcc@15 boost open-mpi lapack
echo "FC=gfortran-15" >> $GITHUB_ENV
echo "BOOST_INCLUDE=/opt/homebrew/include/" >> $GITHUB_ENV
@@ -62,7 +62,8 @@ jobs:
run: |
sudo apt update -y
sudo apt install -y cmake gcc g++ python3 python3-dev hdf5-tools \
- libfftw3-dev libhdf5-dev openmpi-bin libopenmpi-dev
+ libfftw3-dev libhdf5-dev openmpi-bin libopenmpi-dev \
+ libblas-dev liblapack-dev
- name: Setup Ubuntu (Intel)
if: matrix.os == 'ubuntu' && matrix.intel == true
diff --git a/README.md b/README.md
index ca3b73ca9d..80cf296db9 100644
--- a/README.md
+++ b/README.md
@@ -90,7 +90,7 @@ It's rather straightforward.
We'll give a brief intro. here for MacOS.
Using [brew](https://brew.sh), install MFC's dependencies:
```shell
-brew install coreutils python cmake fftw hdf5 gcc boost open-mpi
+brew install coreutils python cmake fftw hdf5 gcc boost open-mpi lapack
```
You're now ready to build and test MFC!
Put it to a convenient directory via
@@ -199,7 +199,7 @@ They are organized below.
* [Fypp](https://fypp.readthedocs.io/en/stable/fypp.html) metaprogramming for code readability, performance, and portability
* Continuous Integration (CI)
- * Approx. 500 Regression tests with each PR.
+ * > 500 Regression tests with each PR.
* Performed with GNU (GCC), Intel (oneAPI), Cray (CCE), and NVIDIA (NVHPC) compilers on NVIDIA and AMD GPUs.
* Line-level test coverage reports via [Codecov](https://app.codecov.io/gh/MFlowCode/MFC) and `gcov`
* Benchmarking to avoid performance regressions and identify speed-ups
diff --git a/docs/documentation/getting-started.md b/docs/documentation/getting-started.md
index d407a59a1c..885c528cb9 100644
--- a/docs/documentation/getting-started.md
+++ b/docs/documentation/getting-started.md
@@ -14,7 +14,6 @@ cd MFC
MFC can be built in multiple ways on various operating systems.
Please select your desired configuration from the list bellow:
-
*nix
- **On supported clusters:** Load environment modules
@@ -31,15 +30,14 @@ sudo apt upgrade
sudo apt install tar wget make cmake gcc g++ \
python3 python3-dev python3-venv \
openmpi-bin libopenmpi-dev \
- libhdf5-dev libfftw3-dev
+ libhdf5-dev libfftw3-dev \
+ libblas-dev liblapack-dev
```
-If you wish to build MFC using [NVidia's NVHPC SDK](https://developer.nvidia.com/hpc-sdk),
+If you wish to build MFC using [NVIDIA's NVHPC SDK](https://developer.nvidia.com/hpc-sdk),
first follow the instructions [here](https://developer.nvidia.com/nvidia-hpc-sdk-downloads).
-
-
Windows
On Windows, you can either use Intel Compilers with the standard Microsoft toolchain,
@@ -96,16 +94,13 @@ You will also have access to the `.sln` Microsoft Visual Studio solution files f
-
-
-
MacOS
Using [Homebrew](https://brew.sh/) you can install the necessary dependencies
before configuring your environment:
```shell
-brew install coreutils python cmake fftw hdf5 gcc boost open-mpi
+brew install coreutils python cmake fftw hdf5 gcc boost open-mpi lapack
echo -e "export BOOST_INCLUDE='$(brew --prefix --installed boost)/include'" | tee -a ~/.bash_profile ~/.zshrc
. ~/.bash_profile 2>/dev/null || . ~/.zshrc 2>/dev/null
! [ -z "${BOOST_INCLUDE+x}" ] && echo 'Environment is ready!' || echo 'Error: $BOOST_INCLUDE is unset. Please adjust the previous commands to fit with your environment.'
diff --git a/docs/documentation/gpuDebugging.md b/docs/documentation/gpuDebugging.md
deleted file mode 100644
index 3400137c39..0000000000
--- a/docs/documentation/gpuDebugging.md
+++ /dev/null
@@ -1,156 +0,0 @@
-# Debugging Tools and Tips for GPUs
-
-## Compiler agnostic tools
-
-## OpenMP tools
-```bash
-OMP_DISPLAY_ENV=true | false | verbose
-```
-- Prints out the internal control values and environment variables at the beginning of the program if `true` or `verbose`
-- `verbose` will also print out vendor-specific internal control values and environment variables
-
-```bash
-OMP_TARGET_OFFLOAD = MANDATORY | DISABLED | DEFAULT
-```
-- Quick way to turn off off-load (`DISABLED`) or make it abort if a GPU isn't found (`MANDATORY`)
-- Great first test: does the problem disappear when you drop back to the CPU?
-
-```bash
-OMP_THREAD_LIMIT=
-```
-- Sets the maximum number of OpenMP threads to use in a contention group
-- Might be useful in checking for issues with contention or race conditions
-
-```bash
-OMP_DISPLAY_AFFINITY=TRUE
-```
-- Will display affinity bindings for each OpenMP thread, containing hostname, process identifier, OS thread identifier, OpenMP thread identifier, and affinity binding.
-
-## Cray Compiler Tools
-
-### Cray General Options
-
-```bash
-CRAY_ACC_DEBUG: 0 (off), 1, 2, 3 (very noisy)
-```
-
-- Dumps a time-stamped log line (`ACC: ...`) for every allocation, data transfer, kernel launch, wait, etc. Great first stop when "nothing seems to run on the GPU".
-- Outputs on STDERR by default. Can be changed by setting `CRAY_ACC_DEBUG_FILE`.
- - Recognizes `stderr`, `stdout`, and `process`.
- - `process` automatically generates a new file based on `pid` (each MPI process will have a different file)
-- While this environment variable specifies ACC, it can be used for both OpenACC and OpenMP
-
-```bash
-CRAY_ACC_FORCE_EARLY_INIT=1
-```
-
-- Force full GPU initialization at program start so you can see start-up hangs immediately
-- Default behavior without an environment variable is to defer initialization on first use
-- Device initialization includes initializing the GPU vendor’s low-level device runtime library (e.g., libcuda for NVIDIA GPUs) and establishing all necessary software contexts for interacting with the device
-
-### Cray OpenACC Options
-
-```bash
-CRAY_ACC_PRESENT_DUMP_SAVE_NAMES=1
-```
-- Will cause `acc_present_dump()` to output variable names and file locations in addition to variable mappings
-- Add `acc_present_dump()` around hotspots to help find problems with data movements
- - Helps more if adding `CRAY_ACC_DEBUG` environment variable
-
-## NVHPC Compiler Options
-
-### NVHPC General Options
-
-```bash
-STATIC_RANDOM_SEED=1
-```
-- Forces the seed returned by `RANDOM_SEED` to be constant, so it generates the same sequence of random numbers
-- Useful for testing issues with randomized data
-
-```bash
-NVCOMPILER_TERM=option[,option]
-```
-- `[no]debug`: Enables/disables just-in-time debugging (debugging invoked on error)
-- `[no]trace`: Enables/disables stack traceback on error
-
-### NVHPC OpenACC Options
-
-```bash
-NVCOMPILER_ACC_NOTIFY=
-```
-- Assign the environment variable to a bitmask to print out information to stderr for the following
- - kernel launches: 1
- - data transfers: 2
- - region entry/exit: 4
- - wait operation of synchronizations with the device: 8
- - device memory allocations and deallocations: 16
-- 1 (kernels only) is the usual first step.3 (kernels + copies) is great for "why is it so slow?"
-
-```bash
-NVCOMPILER_ACC_TIME=1
-```
-- Lightweight profiler
-- prints a tidy end-of-run table with per-region and per-kernel times and bytes moved
-- Do not use with CUDA profiler at the same time
-
-```bash
-NVCOMPILER_ACC_DEBUG=1
-```
-- Spews everything the runtime sees: host/device addresses, mapping events, present-table look-ups, etc.
-- Great for "partially present" or "pointer went missing" errors.
-- [Doc for NVCOMPILER_ACC_DEBUG](https://docs.nvidia.com/hpc-sdk/archive/20.9/pdf/hpc209openacc_gs.pdf)
- - Ctrl+F for `NVCOMPILER_ACC_DEBUG`
-
-### NVHPC OpenMP Options
-
-```bash
-LIBOMPTARGET_PROFILE=run.json
-```
-- Emits a Chrome-trace (JSON) timeline you can open in chrome://tracing or Speedscope
-- Great lightweight profiler when Nsight is overkill.
-- Granularity in µs via `LIBOMPTARGET_PROFILE_GRANULARITY` (default 500).
-
-```bash
-LIBOMPTARGET_INFO=
-```
-- Prints out different types of runtime information
-- Human-readable log of data-mapping inserts/updates, kernel launches, copies, waits.
-- Perfect first stop for "why is nothing copied?"
-- Flags
- - Print all data arguments upon entering an OpenMP device kernel: 0x01
- - Indicate when a mapped address already exists in the device mapping table: 0x02
- - Dump the contents of the device pointer map at kernel exit: 0x04
- - Indicate when an entry is changed in the device mapping table: 0x08
- - Print OpenMP kernel information from device plugins: 0x10
- - Indicate when data is copied to and from the device: 0x20
-
-```bash
-LIBOMPTARGET_DEBUG=1
-```
-- Developer-level trace (host-side)
-- Much noisier than `INFO`
-- Only works if the runtime was built with `-DOMPTARGET_DEBUG`.
-
-```bash
-LIBOMPTARGET_JIT_OPT_LEVEL=-O{0,1,2,3}
-```
-- This environment variable can be used to change the optimization pipeline used to optimize the embedded device code as part of the device JIT.
-- The value corresponds to the `-O{0,1,2,3}` command line argument passed to clang.
-
-```bash
-LIBOMPTARGET_JIT_SKIP_OPT=1
-```
-- This environment variable can be used to skip the optimization pipeline during JIT compilation.
-- If set, the image will only be passed through the backend.
-- The backend is invoked with the `LIBOMPTARGET_JIT_OPT_LEVEL` flag.
-
-## Compiler Documentation
-
-- [Cray & OpenMP Docs](https://cpe.ext.hpe.com/docs/24.11/cce/man7/intro_openmp.7.html#environment-variables)
-- [Cray & OpenACC Docs](https://cpe.ext.hpe.com/docs/24.11/cce/man7/intro_openacc.7.html#environment-variables)
-- [NVHPC & OpenACC Docs](https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html?highlight=NVCOMPILER_#environment-variables)
-- [NVHPC & OpenMP Docs](https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html?highlight=NVCOMPILER_#id2)
-- [LLVM & OpenMP Docs](https://openmp.llvm.org/design/Runtimes.html)
- - NVHPC is built on top of LLVM
-- [OpenMP Docs](https://www.openmp.org/spec-html/5.1/openmp.html)
-- [OpenACC Docs](https://www.openacc.org/sites/default/files/inline-files/OpenACC.2.7.pdf)
diff --git a/docs/documentation/gpuParallelization.md b/docs/documentation/gpuParallelization.md
index 8579914485..c40d3c57d9 100644
--- a/docs/documentation/gpuParallelization.md
+++ b/docs/documentation/gpuParallelization.md
@@ -564,3 +564,160 @@ Uses FYPP eval directive using `$:`
------------------------------------------------------------------------------------------
+
+# Debugging Tools and Tips for GPUs
+
+## Compiler agnostic tools
+
+## OpenMP tools
+```bash
+OMP_DISPLAY_ENV=true | false | verbose
+```
+- Prints out the internal control values and environment variables at the beginning of the program if `true` or `verbose`
+- `verbose` will also print out vendor-specific internal control values and environment variables
+
+```bash
+OMP_TARGET_OFFLOAD = MANDATORY | DISABLED | DEFAULT
+```
+- Quick way to turn off off-load (`DISABLED`) or make it abort if a GPU isn't found (`MANDATORY`)
+- Great first test: does the problem disappear when you drop back to the CPU?
+
+```bash
+OMP_THREAD_LIMIT=
+```
+- Sets the maximum number of OpenMP threads to use in a contention group
+- Might be useful in checking for issues with contention or race conditions
+
+```bash
+OMP_DISPLAY_AFFINITY=TRUE
+```
+- Will display affinity bindings for each OpenMP thread, containing hostname, process identifier, OS thread identifier, OpenMP thread identifier, and affinity binding.
+
+## Cray Compiler Tools
+
+### Cray General Options
+
+```bash
+CRAY_ACC_DEBUG: 0 (off), 1, 2, 3 (very noisy)
+```
+
+- Dumps a time-stamped log line (`ACC: ...`) for every allocation, data transfer, kernel launch, wait, etc. Great first stop when "nothing seems to run on the GPU".
+- Outputs on STDERR by default. Can be changed by setting `CRAY_ACC_DEBUG_FILE`.
+ - Recognizes `stderr`, `stdout`, and `process`.
+ - `process` automatically generates a new file based on `pid` (each MPI process will have a different file)
+- While this environment variable specifies ACC, it can be used for both OpenACC and OpenMP
+
+```bash
+CRAY_ACC_FORCE_EARLY_INIT=1
+```
+
+- Force full GPU initialization at program start so you can see start-up hangs immediately
+- Default behavior without an environment variable is to defer initialization on first use
+- Device initialization includes initializing the GPU vendor’s low-level device runtime library (e.g., libcuda for NVIDIA GPUs) and establishing all necessary software contexts for interacting with the device
+
+### Cray OpenACC Options
+
+```bash
+CRAY_ACC_PRESENT_DUMP_SAVE_NAMES=1
+```
+- Will cause `acc_present_dump()` to output variable names and file locations in addition to variable mappings
+- Add `acc_present_dump()` around hotspots to help find problems with data movements
+ - Helps more if adding `CRAY_ACC_DEBUG` environment variable
+
+## NVHPC Compiler Options
+
+### NVHPC General Options
+
+```bash
+STATIC_RANDOM_SEED=1
+```
+- Forces the seed returned by `RANDOM_SEED` to be constant, so it generates the same sequence of random numbers
+- Useful for testing issues with randomized data
+
+```bash
+NVCOMPILER_TERM=option[,option]
+```
+- `[no]debug`: Enables/disables just-in-time debugging (debugging invoked on error)
+- `[no]trace`: Enables/disables stack traceback on error
+
+### NVHPC OpenACC Options
+
+```bash
+NVCOMPILER_ACC_NOTIFY=
+```
+- Assign the environment variable to a bitmask to print out information to stderr for the following
+ - kernel launches: 1
+ - data transfers: 2
+ - region entry/exit: 4
+ - wait operation of synchronizations with the device: 8
+ - device memory allocations and deallocations: 16
+- 1 (kernels only) is the usual first step.3 (kernels + copies) is great for "why is it so slow?"
+
+```bash
+NVCOMPILER_ACC_TIME=1
+```
+- Lightweight profiler
+- prints a tidy end-of-run table with per-region and per-kernel times and bytes moved
+- Do not use with CUDA profiler at the same time
+
+```bash
+NVCOMPILER_ACC_DEBUG=1
+```
+- Spews everything the runtime sees: host/device addresses, mapping events, present-table look-ups, etc.
+- Great for "partially present" or "pointer went missing" errors.
+- [Doc for NVCOMPILER_ACC_DEBUG](https://docs.nvidia.com/hpc-sdk/archive/20.9/pdf/hpc209openacc_gs.pdf)
+ - Ctrl+F for `NVCOMPILER_ACC_DEBUG`
+
+### NVHPC OpenMP Options
+
+```bash
+LIBOMPTARGET_PROFILE=run.json
+```
+- Emits a Chrome-trace (JSON) timeline you can open in chrome://tracing or Speedscope
+- Great lightweight profiler when Nsight is overkill.
+- Granularity in µs via `LIBOMPTARGET_PROFILE_GRANULARITY` (default 500).
+
+```bash
+LIBOMPTARGET_INFO=
+```
+- Prints out different types of runtime information
+- Human-readable log of data-mapping inserts/updates, kernel launches, copies, waits.
+- Perfect first stop for "why is nothing copied?"
+- Flags
+ - Print all data arguments upon entering an OpenMP device kernel: 0x01
+ - Indicate when a mapped address already exists in the device mapping table: 0x02
+ - Dump the contents of the device pointer map at kernel exit: 0x04
+ - Indicate when an entry is changed in the device mapping table: 0x08
+ - Print OpenMP kernel information from device plugins: 0x10
+ - Indicate when data is copied to and from the device: 0x20
+
+```bash
+LIBOMPTARGET_DEBUG=1
+```
+- Developer-level trace (host-side)
+- Much noisier than `INFO`
+- Only works if the runtime was built with `-DOMPTARGET_DEBUG`.
+
+```bash
+LIBOMPTARGET_JIT_OPT_LEVEL=-O{0,1,2,3}
+```
+- This environment variable can be used to change the optimization pipeline used to optimize the embedded device code as part of the device JIT.
+- The value corresponds to the `-O{0,1,2,3}` command line argument passed to clang.
+
+```bash
+LIBOMPTARGET_JIT_SKIP_OPT=1
+```
+- This environment variable can be used to skip the optimization pipeline during JIT compilation.
+- If set, the image will only be passed through the backend.
+- The backend is invoked with the `LIBOMPTARGET_JIT_OPT_LEVEL` flag.
+
+## Compiler Documentation
+
+- [Cray & OpenMP Docs](https://cpe.ext.hpe.com/docs/24.11/cce/man7/intro_openmp.7.html#environment-variables)
+- [Cray & OpenACC Docs](https://cpe.ext.hpe.com/docs/24.11/cce/man7/intro_openacc.7.html#environment-variables)
+- [NVHPC & OpenACC Docs](https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html?highlight=NVCOMPILER_#environment-variables)
+- [NVHPC & OpenMP Docs](https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html?highlight=NVCOMPILER_#id2)
+- [LLVM & OpenMP Docs](https://openmp.llvm.org/design/Runtimes.html)
+ - NVHPC is built on top of LLVM
+- [OpenMP Docs](https://www.openmp.org/spec-html/5.1/openmp.html)
+- [OpenACC Docs](https://www.openacc.org/sites/default/files/inline-files/OpenACC.2.7.pdf)
diff --git a/docs/documentation/papers.md b/docs/documentation/papers.md
index 4250c78639..51b7629f09 100644
--- a/docs/documentation/papers.md
+++ b/docs/documentation/papers.md
@@ -1,5 +1,19 @@
# Papers
+MFC 5.0: An exascale many-physics flow solver. [Wilfong, B., Le Berre, H., Radhakrishnan, A., Gupta, A., Vaca-Revelo, D., Adam , D., Yu, H., Lee, H., Chreim, J. R., Carcana Barbosa, M., Zhang, Y., Cisneros-Garibay, E., Gnanaskandan, A., Rodriguez Jr., M., Budiardja, R. D., Abbott, S., Colonius, T., & Bryngelson, S. H. (2025) MFC 5.0: An exascale many-physics flow solver. arXiv:2503.07953. Equal contribution.](https://doi.org/10.48550/arXiv.2503.07953)
+
+```bibtex
+@article{Wilfong_2025,
+ author = {Wilfong, Benjamin and {Le Berre}, Henry and Radhakrishnan, Anand and Gupta, Ansh and Vaca-Revelo, Diego and Adam, Dimitrios and Yu, Haocheng and Lee, Hyeoksu and Chreim, Jose Rodolfo and {Carcana Barbosa}, Mirelys and Zhang, Yanjun and Cisneros-Garibay, Esteban and Gnanaskandan, Aswin and {Rodriguez Jr.}, Mauro and Budiardja, Reuben D. and Abbott, Stephen and Colonius, Tim and Bryngelson, Spencer H.},
+ title = {{MFC 5.0: A}n exascale many-physics flow solver},
+ journal = {arXiv preprint arXiv:2503.07953},
+ year = {2025},
+ doi = {10.48550/arXiv.2503.07953}
+}
+```
+
+
+
MFC: An open-source high-order multi-component, multi-phase, and multi-scale compressible flow solver. [S. H. Bryngelson, K. Schmidmayer, V. Coralic, K. Maeda, J. Meng, T. Colonius (2021) Computer Physics Communications **266**, 107396](https://doi.org/10.1016/j.cpc.2020.107396)
```bibtex
diff --git a/docs/documentation/readme.md b/docs/documentation/readme.md
index 5ca45150e4..4ce5b6b6e1 100644
--- a/docs/documentation/readme.md
+++ b/docs/documentation/readme.md
@@ -3,15 +3,14 @@
## User Documentation
- [Getting Started](getting-started.md)
-- [Testing MFC](testing.md)
+- [Testing](testing.md)
- [Case Files](case.md)
- [Example Cases](examples.md)
-- [Running MFC](running.md)
+- [Running](running.md)
- [Flow Visualization](visualization.md)
- [Performance](expectedPerformance.md)
-- [GPU Parallelization](gpuParallelization.md)
-- [GPU Debugging](gpuDebugging.md)
-- [MFC's Authors](authors.md)
+- [GPU Offloading](gpuParallelization.md)
+- [Authors](authors.md)
- [References](references.md)
## Code/API Documentation