From 063f5549c38823c9c2a02701c5b760c8a8820869 Mon Sep 17 00:00:00 2001 From: Spencer Bryngelson Date: Sun, 3 Aug 2025 14:51:33 -0400 Subject: [PATCH 1/2] lapack stuff --- .github/workflows/cleanliness.yml | 4 ++-- .github/workflows/coverage.yml | 2 +- .github/workflows/test.yml | 5 +++-- README.md | 4 ++-- docs/documentation/getting-started.md | 13 ++++--------- docs/documentation/papers.md | 14 ++++++++++++++ 6 files changed, 26 insertions(+), 16 deletions(-) diff --git a/.github/workflows/cleanliness.yml b/.github/workflows/cleanliness.yml index ec472dce98..b02df12898 100644 --- a/.github/workflows/cleanliness.yml +++ b/.github/workflows/cleanliness.yml @@ -41,8 +41,8 @@ jobs: - name: Setup Ubuntu run: | sudo apt update -y - sudo apt install -y tar wget make cmake gcc g++ python3 python3-dev "openmpi-*" libopenmpi-dev - + sudo apt install -y tar wget make cmake gcc g++ python3 python3-dev "openmpi-*" libopenmpi-dev libblas-dev liblapack-dev + - name: Build run: | (cd pr && /bin/bash mfc.sh build -j $(nproc) --debug 2> ../pr.txt) diff --git a/.github/workflows/coverage.yml b/.github/workflows/coverage.yml index 7487d8e550..d2c1b4ea4a 100644 --- a/.github/workflows/coverage.yml +++ b/.github/workflows/coverage.yml @@ -30,7 +30,7 @@ jobs: - name: Setup Ubuntu run: | sudo apt update -y - sudo apt install -y tar wget make cmake gcc g++ python3 python3-dev "openmpi-*" libopenmpi-dev + sudo apt install -y tar wget make cmake gcc g++ python3 python3-dev "openmpi-*" libopenmpi-dev libblas-dev liblapack-dev - name: Build run: /bin/bash mfc.sh build -j $(nproc) --gcov diff --git a/.github/workflows/test.yml b/.github/workflows/test.yml index db618bea46..7eecc105c8 100644 --- a/.github/workflows/test.yml +++ b/.github/workflows/test.yml @@ -53,7 +53,7 @@ jobs: run: | brew update brew upgrade - brew install coreutils python cmake fftw hdf5 gcc@15 boost open-mpi + brew install coreutils python cmake fftw hdf5 gcc@15 boost open-mpi lapack echo "FC=gfortran-15" >> $GITHUB_ENV echo "BOOST_INCLUDE=/opt/homebrew/include/" >> $GITHUB_ENV @@ -62,7 +62,8 @@ jobs: run: | sudo apt update -y sudo apt install -y cmake gcc g++ python3 python3-dev hdf5-tools \ - libfftw3-dev libhdf5-dev openmpi-bin libopenmpi-dev + libfftw3-dev libhdf5-dev openmpi-bin libopenmpi-dev \ + libblas-dev liblapack-dev - name: Setup Ubuntu (Intel) if: matrix.os == 'ubuntu' && matrix.intel == true diff --git a/README.md b/README.md index ca3b73ca9d..80cf296db9 100644 --- a/README.md +++ b/README.md @@ -90,7 +90,7 @@ It's rather straightforward. We'll give a brief intro. here for MacOS. Using [brew](https://brew.sh), install MFC's dependencies: ```shell -brew install coreutils python cmake fftw hdf5 gcc boost open-mpi +brew install coreutils python cmake fftw hdf5 gcc boost open-mpi lapack ``` You're now ready to build and test MFC! Put it to a convenient directory via @@ -199,7 +199,7 @@ They are organized below. * [Fypp](https://fypp.readthedocs.io/en/stable/fypp.html) metaprogramming for code readability, performance, and portability * Continuous Integration (CI) - * Approx. 500 Regression tests with each PR. + * > 500 Regression tests with each PR. * Performed with GNU (GCC), Intel (oneAPI), Cray (CCE), and NVIDIA (NVHPC) compilers on NVIDIA and AMD GPUs. * Line-level test coverage reports via [Codecov](https://app.codecov.io/gh/MFlowCode/MFC) and `gcov` * Benchmarking to avoid performance regressions and identify speed-ups diff --git a/docs/documentation/getting-started.md b/docs/documentation/getting-started.md index d407a59a1c..885c528cb9 100644 --- a/docs/documentation/getting-started.md +++ b/docs/documentation/getting-started.md @@ -14,7 +14,6 @@ cd MFC MFC can be built in multiple ways on various operating systems. Please select your desired configuration from the list bellow: -

*nix

- **On supported clusters:** Load environment modules @@ -31,15 +30,14 @@ sudo apt upgrade sudo apt install tar wget make cmake gcc g++ \ python3 python3-dev python3-venv \ openmpi-bin libopenmpi-dev \ - libhdf5-dev libfftw3-dev + libhdf5-dev libfftw3-dev \ + libblas-dev liblapack-dev ``` -If you wish to build MFC using [NVidia's NVHPC SDK](https://developer.nvidia.com/hpc-sdk), +If you wish to build MFC using [NVIDIA's NVHPC SDK](https://developer.nvidia.com/hpc-sdk), first follow the instructions [here](https://developer.nvidia.com/nvidia-hpc-sdk-downloads). -
-

Windows

On Windows, you can either use Intel Compilers with the standard Microsoft toolchain, @@ -96,16 +94,13 @@ You will also have access to the `.sln` Microsoft Visual Studio solution files f
- - -

MacOS

Using [Homebrew](https://brew.sh/) you can install the necessary dependencies before configuring your environment: ```shell -brew install coreutils python cmake fftw hdf5 gcc boost open-mpi +brew install coreutils python cmake fftw hdf5 gcc boost open-mpi lapack echo -e "export BOOST_INCLUDE='$(brew --prefix --installed boost)/include'" | tee -a ~/.bash_profile ~/.zshrc . ~/.bash_profile 2>/dev/null || . ~/.zshrc 2>/dev/null ! [ -z "${BOOST_INCLUDE+x}" ] && echo 'Environment is ready!' || echo 'Error: $BOOST_INCLUDE is unset. Please adjust the previous commands to fit with your environment.' diff --git a/docs/documentation/papers.md b/docs/documentation/papers.md index 4250c78639..51b7629f09 100644 --- a/docs/documentation/papers.md +++ b/docs/documentation/papers.md @@ -1,5 +1,19 @@ # Papers +MFC 5.0: An exascale many-physics flow solver. [Wilfong, B., Le Berre, H., Radhakrishnan, A., Gupta, A., Vaca-Revelo, D., Adam , D., Yu, H., Lee, H., Chreim, J. R., Carcana Barbosa, M., Zhang, Y., Cisneros-Garibay, E., Gnanaskandan, A., Rodriguez Jr., M., Budiardja, R. D., Abbott, S., Colonius, T., & Bryngelson, S. H. (2025) MFC 5.0: An exascale many-physics flow solver. arXiv:2503.07953. Equal contribution.](https://doi.org/10.48550/arXiv.2503.07953) + +```bibtex +@article{Wilfong_2025, + author = {Wilfong, Benjamin and {Le Berre}, Henry and Radhakrishnan, Anand and Gupta, Ansh and Vaca-Revelo, Diego and Adam, Dimitrios and Yu, Haocheng and Lee, Hyeoksu and Chreim, Jose Rodolfo and {Carcana Barbosa}, Mirelys and Zhang, Yanjun and Cisneros-Garibay, Esteban and Gnanaskandan, Aswin and {Rodriguez Jr.}, Mauro and Budiardja, Reuben D. and Abbott, Stephen and Colonius, Tim and Bryngelson, Spencer H.}, + title = {{MFC 5.0: A}n exascale many-physics flow solver}, + journal = {arXiv preprint arXiv:2503.07953}, + year = {2025}, + doi = {10.48550/arXiv.2503.07953} +} +``` + +
+ MFC: An open-source high-order multi-component, multi-phase, and multi-scale compressible flow solver. [S. H. Bryngelson, K. Schmidmayer, V. Coralic, K. Maeda, J. Meng, T. Colonius (2021) Computer Physics Communications **266**, 107396](https://doi.org/10.1016/j.cpc.2020.107396) ```bibtex From c52fff2417e9fafb9ea4bb586adc786d9352da7f Mon Sep 17 00:00:00 2001 From: Spencer Bryngelson Date: Sun, 3 Aug 2025 14:54:21 -0400 Subject: [PATCH 2/2] fixup gpu docs --- docs/documentation/gpuDebugging.md | 156 ---------------------- docs/documentation/gpuParallelization.md | 157 +++++++++++++++++++++++ docs/documentation/readme.md | 9 +- 3 files changed, 161 insertions(+), 161 deletions(-) delete mode 100644 docs/documentation/gpuDebugging.md diff --git a/docs/documentation/gpuDebugging.md b/docs/documentation/gpuDebugging.md deleted file mode 100644 index 3400137c39..0000000000 --- a/docs/documentation/gpuDebugging.md +++ /dev/null @@ -1,156 +0,0 @@ -# Debugging Tools and Tips for GPUs - -## Compiler agnostic tools - -## OpenMP tools -```bash -OMP_DISPLAY_ENV=true | false | verbose -``` -- Prints out the internal control values and environment variables at the beginning of the program if `true` or `verbose` -- `verbose` will also print out vendor-specific internal control values and environment variables - -```bash -OMP_TARGET_OFFLOAD = MANDATORY | DISABLED | DEFAULT -``` -- Quick way to turn off off-load (`DISABLED`) or make it abort if a GPU isn't found (`MANDATORY`) -- Great first test: does the problem disappear when you drop back to the CPU? - -```bash -OMP_THREAD_LIMIT= -``` -- Sets the maximum number of OpenMP threads to use in a contention group -- Might be useful in checking for issues with contention or race conditions - -```bash -OMP_DISPLAY_AFFINITY=TRUE -``` -- Will display affinity bindings for each OpenMP thread, containing hostname, process identifier, OS thread identifier, OpenMP thread identifier, and affinity binding. - -## Cray Compiler Tools - -### Cray General Options - -```bash -CRAY_ACC_DEBUG: 0 (off), 1, 2, 3 (very noisy) -``` - -- Dumps a time-stamped log line (`ACC: ...`) for every allocation, data transfer, kernel launch, wait, etc. Great first stop when "nothing seems to run on the GPU". -- Outputs on STDERR by default. Can be changed by setting `CRAY_ACC_DEBUG_FILE`. - - Recognizes `stderr`, `stdout`, and `process`. - - `process` automatically generates a new file based on `pid` (each MPI process will have a different file) -- While this environment variable specifies ACC, it can be used for both OpenACC and OpenMP - -```bash -CRAY_ACC_FORCE_EARLY_INIT=1 -``` - -- Force full GPU initialization at program start so you can see start-up hangs immediately -- Default behavior without an environment variable is to defer initialization on first use -- Device initialization includes initializing the GPU vendor’s low-level device runtime library (e.g., libcuda for NVIDIA GPUs) and establishing all necessary software contexts for interacting with the device - -### Cray OpenACC Options - -```bash -CRAY_ACC_PRESENT_DUMP_SAVE_NAMES=1 -``` -- Will cause `acc_present_dump()` to output variable names and file locations in addition to variable mappings -- Add `acc_present_dump()` around hotspots to help find problems with data movements - - Helps more if adding `CRAY_ACC_DEBUG` environment variable - -## NVHPC Compiler Options - -### NVHPC General Options - -```bash -STATIC_RANDOM_SEED=1 -``` -- Forces the seed returned by `RANDOM_SEED` to be constant, so it generates the same sequence of random numbers -- Useful for testing issues with randomized data - -```bash -NVCOMPILER_TERM=option[,option] -``` -- `[no]debug`: Enables/disables just-in-time debugging (debugging invoked on error) -- `[no]trace`: Enables/disables stack traceback on error - -### NVHPC OpenACC Options - -```bash -NVCOMPILER_ACC_NOTIFY= -``` -- Assign the environment variable to a bitmask to print out information to stderr for the following - - kernel launches: 1 - - data transfers: 2 - - region entry/exit: 4 - - wait operation of synchronizations with the device: 8 - - device memory allocations and deallocations: 16 -- 1 (kernels only) is the usual first step.3 (kernels + copies) is great for "why is it so slow?" - -```bash -NVCOMPILER_ACC_TIME=1 -``` -- Lightweight profiler -- prints a tidy end-of-run table with per-region and per-kernel times and bytes moved -- Do not use with CUDA profiler at the same time - -```bash -NVCOMPILER_ACC_DEBUG=1 -``` -- Spews everything the runtime sees: host/device addresses, mapping events, present-table look-ups, etc. -- Great for "partially present" or "pointer went missing" errors. -- [Doc for NVCOMPILER_ACC_DEBUG](https://docs.nvidia.com/hpc-sdk/archive/20.9/pdf/hpc209openacc_gs.pdf) - - Ctrl+F for `NVCOMPILER_ACC_DEBUG` - -### NVHPC OpenMP Options - -```bash -LIBOMPTARGET_PROFILE=run.json -``` -- Emits a Chrome-trace (JSON) timeline you can open in chrome://tracing or Speedscope -- Great lightweight profiler when Nsight is overkill. -- Granularity in µs via `LIBOMPTARGET_PROFILE_GRANULARITY` (default 500). - -```bash -LIBOMPTARGET_INFO= -``` -- Prints out different types of runtime information -- Human-readable log of data-mapping inserts/updates, kernel launches, copies, waits. -- Perfect first stop for "why is nothing copied?" -- Flags - - Print all data arguments upon entering an OpenMP device kernel: 0x01 - - Indicate when a mapped address already exists in the device mapping table: 0x02 - - Dump the contents of the device pointer map at kernel exit: 0x04 - - Indicate when an entry is changed in the device mapping table: 0x08 - - Print OpenMP kernel information from device plugins: 0x10 - - Indicate when data is copied to and from the device: 0x20 - -```bash -LIBOMPTARGET_DEBUG=1 -``` -- Developer-level trace (host-side) -- Much noisier than `INFO` -- Only works if the runtime was built with `-DOMPTARGET_DEBUG`. - -```bash -LIBOMPTARGET_JIT_OPT_LEVEL=-O{0,1,2,3} -``` -- This environment variable can be used to change the optimization pipeline used to optimize the embedded device code as part of the device JIT. -- The value corresponds to the `-O{0,1,2,3}` command line argument passed to clang. - -```bash -LIBOMPTARGET_JIT_SKIP_OPT=1 -``` -- This environment variable can be used to skip the optimization pipeline during JIT compilation. -- If set, the image will only be passed through the backend. -- The backend is invoked with the `LIBOMPTARGET_JIT_OPT_LEVEL` flag. - -## Compiler Documentation - -- [Cray & OpenMP Docs](https://cpe.ext.hpe.com/docs/24.11/cce/man7/intro_openmp.7.html#environment-variables) -- [Cray & OpenACC Docs](https://cpe.ext.hpe.com/docs/24.11/cce/man7/intro_openacc.7.html#environment-variables) -- [NVHPC & OpenACC Docs](https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html?highlight=NVCOMPILER_#environment-variables) -- [NVHPC & OpenMP Docs](https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html?highlight=NVCOMPILER_#id2) -- [LLVM & OpenMP Docs](https://openmp.llvm.org/design/Runtimes.html) - - NVHPC is built on top of LLVM -- [OpenMP Docs](https://www.openmp.org/spec-html/5.1/openmp.html) -- [OpenACC Docs](https://www.openacc.org/sites/default/files/inline-files/OpenACC.2.7.pdf) diff --git a/docs/documentation/gpuParallelization.md b/docs/documentation/gpuParallelization.md index 8579914485..c40d3c57d9 100644 --- a/docs/documentation/gpuParallelization.md +++ b/docs/documentation/gpuParallelization.md @@ -564,3 +564,160 @@ Uses FYPP eval directive using `$:`
------------------------------------------------------------------------------------------ + +# Debugging Tools and Tips for GPUs + +## Compiler agnostic tools + +## OpenMP tools +```bash +OMP_DISPLAY_ENV=true | false | verbose +``` +- Prints out the internal control values and environment variables at the beginning of the program if `true` or `verbose` +- `verbose` will also print out vendor-specific internal control values and environment variables + +```bash +OMP_TARGET_OFFLOAD = MANDATORY | DISABLED | DEFAULT +``` +- Quick way to turn off off-load (`DISABLED`) or make it abort if a GPU isn't found (`MANDATORY`) +- Great first test: does the problem disappear when you drop back to the CPU? + +```bash +OMP_THREAD_LIMIT= +``` +- Sets the maximum number of OpenMP threads to use in a contention group +- Might be useful in checking for issues with contention or race conditions + +```bash +OMP_DISPLAY_AFFINITY=TRUE +``` +- Will display affinity bindings for each OpenMP thread, containing hostname, process identifier, OS thread identifier, OpenMP thread identifier, and affinity binding. + +## Cray Compiler Tools + +### Cray General Options + +```bash +CRAY_ACC_DEBUG: 0 (off), 1, 2, 3 (very noisy) +``` + +- Dumps a time-stamped log line (`ACC: ...`) for every allocation, data transfer, kernel launch, wait, etc. Great first stop when "nothing seems to run on the GPU". +- Outputs on STDERR by default. Can be changed by setting `CRAY_ACC_DEBUG_FILE`. + - Recognizes `stderr`, `stdout`, and `process`. + - `process` automatically generates a new file based on `pid` (each MPI process will have a different file) +- While this environment variable specifies ACC, it can be used for both OpenACC and OpenMP + +```bash +CRAY_ACC_FORCE_EARLY_INIT=1 +``` + +- Force full GPU initialization at program start so you can see start-up hangs immediately +- Default behavior without an environment variable is to defer initialization on first use +- Device initialization includes initializing the GPU vendor’s low-level device runtime library (e.g., libcuda for NVIDIA GPUs) and establishing all necessary software contexts for interacting with the device + +### Cray OpenACC Options + +```bash +CRAY_ACC_PRESENT_DUMP_SAVE_NAMES=1 +``` +- Will cause `acc_present_dump()` to output variable names and file locations in addition to variable mappings +- Add `acc_present_dump()` around hotspots to help find problems with data movements + - Helps more if adding `CRAY_ACC_DEBUG` environment variable + +## NVHPC Compiler Options + +### NVHPC General Options + +```bash +STATIC_RANDOM_SEED=1 +``` +- Forces the seed returned by `RANDOM_SEED` to be constant, so it generates the same sequence of random numbers +- Useful for testing issues with randomized data + +```bash +NVCOMPILER_TERM=option[,option] +``` +- `[no]debug`: Enables/disables just-in-time debugging (debugging invoked on error) +- `[no]trace`: Enables/disables stack traceback on error + +### NVHPC OpenACC Options + +```bash +NVCOMPILER_ACC_NOTIFY= +``` +- Assign the environment variable to a bitmask to print out information to stderr for the following + - kernel launches: 1 + - data transfers: 2 + - region entry/exit: 4 + - wait operation of synchronizations with the device: 8 + - device memory allocations and deallocations: 16 +- 1 (kernels only) is the usual first step.3 (kernels + copies) is great for "why is it so slow?" + +```bash +NVCOMPILER_ACC_TIME=1 +``` +- Lightweight profiler +- prints a tidy end-of-run table with per-region and per-kernel times and bytes moved +- Do not use with CUDA profiler at the same time + +```bash +NVCOMPILER_ACC_DEBUG=1 +``` +- Spews everything the runtime sees: host/device addresses, mapping events, present-table look-ups, etc. +- Great for "partially present" or "pointer went missing" errors. +- [Doc for NVCOMPILER_ACC_DEBUG](https://docs.nvidia.com/hpc-sdk/archive/20.9/pdf/hpc209openacc_gs.pdf) + - Ctrl+F for `NVCOMPILER_ACC_DEBUG` + +### NVHPC OpenMP Options + +```bash +LIBOMPTARGET_PROFILE=run.json +``` +- Emits a Chrome-trace (JSON) timeline you can open in chrome://tracing or Speedscope +- Great lightweight profiler when Nsight is overkill. +- Granularity in µs via `LIBOMPTARGET_PROFILE_GRANULARITY` (default 500). + +```bash +LIBOMPTARGET_INFO= +``` +- Prints out different types of runtime information +- Human-readable log of data-mapping inserts/updates, kernel launches, copies, waits. +- Perfect first stop for "why is nothing copied?" +- Flags + - Print all data arguments upon entering an OpenMP device kernel: 0x01 + - Indicate when a mapped address already exists in the device mapping table: 0x02 + - Dump the contents of the device pointer map at kernel exit: 0x04 + - Indicate when an entry is changed in the device mapping table: 0x08 + - Print OpenMP kernel information from device plugins: 0x10 + - Indicate when data is copied to and from the device: 0x20 + +```bash +LIBOMPTARGET_DEBUG=1 +``` +- Developer-level trace (host-side) +- Much noisier than `INFO` +- Only works if the runtime was built with `-DOMPTARGET_DEBUG`. + +```bash +LIBOMPTARGET_JIT_OPT_LEVEL=-O{0,1,2,3} +``` +- This environment variable can be used to change the optimization pipeline used to optimize the embedded device code as part of the device JIT. +- The value corresponds to the `-O{0,1,2,3}` command line argument passed to clang. + +```bash +LIBOMPTARGET_JIT_SKIP_OPT=1 +``` +- This environment variable can be used to skip the optimization pipeline during JIT compilation. +- If set, the image will only be passed through the backend. +- The backend is invoked with the `LIBOMPTARGET_JIT_OPT_LEVEL` flag. + +## Compiler Documentation + +- [Cray & OpenMP Docs](https://cpe.ext.hpe.com/docs/24.11/cce/man7/intro_openmp.7.html#environment-variables) +- [Cray & OpenACC Docs](https://cpe.ext.hpe.com/docs/24.11/cce/man7/intro_openacc.7.html#environment-variables) +- [NVHPC & OpenACC Docs](https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html?highlight=NVCOMPILER_#environment-variables) +- [NVHPC & OpenMP Docs](https://docs.nvidia.com/hpc-sdk/compilers/hpc-compilers-user-guide/index.html?highlight=NVCOMPILER_#id2) +- [LLVM & OpenMP Docs](https://openmp.llvm.org/design/Runtimes.html) + - NVHPC is built on top of LLVM +- [OpenMP Docs](https://www.openmp.org/spec-html/5.1/openmp.html) +- [OpenACC Docs](https://www.openacc.org/sites/default/files/inline-files/OpenACC.2.7.pdf) diff --git a/docs/documentation/readme.md b/docs/documentation/readme.md index 5ca45150e4..4ce5b6b6e1 100644 --- a/docs/documentation/readme.md +++ b/docs/documentation/readme.md @@ -3,15 +3,14 @@ ## User Documentation - [Getting Started](getting-started.md) -- [Testing MFC](testing.md) +- [Testing](testing.md) - [Case Files](case.md) - [Example Cases](examples.md) -- [Running MFC](running.md) +- [Running](running.md) - [Flow Visualization](visualization.md) - [Performance](expectedPerformance.md) -- [GPU Parallelization](gpuParallelization.md) -- [GPU Debugging](gpuDebugging.md) -- [MFC's Authors](authors.md) +- [GPU Offloading](gpuParallelization.md) +- [Authors](authors.md) - [References](references.md) ## Code/API Documentation