Skip to content

Conversation

sbryngelson
Copy link
Member

@sbryngelson sbryngelson commented Aug 3, 2025

User description

Lapack fixups. Adds some package installs to runners, fixes up some docs.


PR Type

Enhancement, Documentation


Description

  • Add LAPACK dependencies to CI workflows and documentation

  • Consolidate GPU debugging documentation into main file

  • Update README and documentation formatting

  • Fix minor typos and improve documentation structure


Diagram Walkthrough

flowchart LR
  CI["CI Workflows"] -- "add dependencies" --> LAPACK["LAPACK Libraries"]
  DOCS["Documentation"] -- "consolidate" --> GPU["GPU Debugging Guide"]
  README["README"] -- "update" --> DEPS["Dependencies"]
  STRUCTURE["Doc Structure"] -- "improve" --> FORMAT["Formatting"]
Loading

File Walkthrough

Relevant files
Dependencies
3 files
cleanliness.yml
Add LAPACK dependencies to Ubuntu setup                                   
+2/-2     
coverage.yml
Add LAPACK dependencies to Ubuntu setup                                   
+1/-1     
test.yml
Add LAPACK to MacOS and Ubuntu setups                                       
+3/-2     
Documentation
6 files
README.md
Update dependencies and test count description                     
+2/-2     
getting-started.md
Add LAPACK deps and fix formatting                                             
+4/-9     
gpuDebugging.md
Remove standalone GPU debugging file                                         
+0/-156 
gpuParallelization.md
Consolidate GPU debugging content into main file                 
+157/-0 
papers.md
Add new MFC 5.0 paper citation                                                     
+14/-0   
readme.md
Update documentation index structure and titles                   
+4/-5     

@sbryngelson sbryngelson marked this pull request as ready for review August 4, 2025 00:57
@Copilot Copilot AI review requested due to automatic review settings August 4, 2025 00:57
@sbryngelson sbryngelson requested a review from a team as a code owner August 4, 2025 00:57
@sbryngelson sbryngelson merged commit 9fbc566 into MFlowCode:master Aug 4, 2025
27 of 29 checks passed
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR focuses on LAPACK-related improvements and documentation reorganization. The changes add LAPACK library dependencies to build systems and consolidate GPU-related documentation.

Key changes:

  • Adds LAPACK package installation to CI workflows and documentation
  • Consolidates GPU debugging documentation into the main GPU parallelization file
  • Updates documentation navigation structure with cleaner naming

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
docs/documentation/readme.md Simplifies navigation link names by removing "MFC" prefix
docs/documentation/papers.md Adds new MFC 5.0 paper citation and BibTeX entry
docs/documentation/gpuParallelization.md Incorporates GPU debugging content from separate file
docs/documentation/gpuDebugging.md File deleted - content moved to gpuParallelization.md
docs/documentation/getting-started.md Adds LAPACK dependencies to Ubuntu and macOS installation instructions
README.md Adds LAPACK to macOS installation command and updates test count description
.github/workflows/test.yml Adds LAPACK packages to both Ubuntu and macOS CI environments
.github/workflows/coverage.yml Adds LAPACK packages to Ubuntu CI environment
.github/workflows/cleanliness.yml Adds LAPACK packages to Ubuntu CI environment

sudo apt update -y
sudo apt install -y tar wget make cmake gcc g++ python3 python3-dev "openmpi-*" libopenmpi-dev
sudo apt install -y tar wget make cmake gcc g++ python3 python3-dev "openmpi-*" libopenmpi-dev libblas-dev liblapack-dev
Copy link
Preview

Copilot AI Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are trailing spaces at the end of this line that should be removed.

Suggested change
sudo apt install -y tar wget make cmake gcc g++ python3 python3-dev "openmpi-*" libopenmpi-dev libblas-dev liblapack-dev
sudo apt install -y tar wget make cmake gcc g++ python3 python3-dev "openmpi-*" libopenmpi-dev libblas-dev liblapack-dev

Copilot uses AI. Check for mistakes.

Copy link

qodo-merge-pro bot commented Aug 4, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Duplicate Content

The GPU debugging content from the deleted gpuDebugging.md file has been appended to gpuParallelization.md, creating a very long file that mixes two different topics. This could make the documentation harder to navigate and maintain.

# Debugging Tools and Tips for GPUs

## Compiler agnostic tools

## OpenMP tools
```bash
OMP_DISPLAY_ENV=true | false | verbose
  • Prints out the internal control values and environment variables at the beginning of the program if true or verbose
  • verbose will also print out vendor-specific internal control values and environment variables
OMP_TARGET_OFFLOAD = MANDATORY | DISABLED | DEFAULT
  • Quick way to turn off off-load (DISABLED) or make it abort if a GPU isn't found (MANDATORY)
  • Great first test: does the problem disappear when you drop back to the CPU?
OMP_THREAD_LIMIT=<positive_integer>
  • Sets the maximum number of OpenMP threads to use in a contention group
  • Might be useful in checking for issues with contention or race conditions
OMP_DISPLAY_AFFINITY=TRUE
  • Will display affinity bindings for each OpenMP thread, containing hostname, process identifier, OS thread identifier, OpenMP thread identifier, and affinity binding.

Cray Compiler Tools

Cray General Options

CRAY_ACC_DEBUG: 0 (off), 1, 2, 3 (very noisy)
  • Dumps a time-stamped log line (ACC: ...) for every allocation, data transfer, kernel launch, wait, etc. Great first stop when "nothing seems to run on the GPU".
  • Outputs on STDERR by default. Can be changed by setting CRAY_ACC_DEBUG_FILE.
    • Recognizes stderr, stdout, and process.
    • process automatically generates a new file based on pid (each MPI process will have a different file)
  • While this environment variable specifies ACC, it can be used for both OpenACC and OpenMP
CRAY_ACC_FORCE_EARLY_INIT=1
  • Force full GPU initialization at program start so you can see start-up hangs immediately
  • Default behavior without an environment variable is to defer initialization on first use
  • Device initialization includes initializing the GPU vendor’s low-level device runtime library (e.g., libcuda for NVIDIA GPUs) and establishing all necessary software contexts for interacting with the device

Cray OpenACC Options

CRAY_ACC_PRESENT_DUMP_SAVE_NAMES=1
  • Will cause acc_present_dump() to output variable names and file locations in addition to variable mappings
  • Add acc_present_dump() around hotspots to help find problems with data movements
    • Helps more if adding CRAY_ACC_DEBUG environment variable

NVHPC Compiler Options

NVHPC General Options

STATIC_RANDOM_SEED=1
  • Forces the seed returned by RANDOM_SEED to be constant, so it generates the same sequence of random numbers
  • Useful for testing issues with randomized data
NVCOMPILER_TERM=option[,option]
  • [no]debug: Enables/disables just-in-time debugging (debugging invoked on error)
  • [no]trace: Enables/disables stack traceback on error

NVHPC OpenACC Options

NVCOMPILER_ACC_NOTIFY= <bitmask>
  • Assign the environment variable to a bitmask to print out information to stderr for the following
    • kernel launches: 1
    • data transfers: 2
    • region entry/exit: 4
    • wait operation of synchronizations with the device: 8
    • device memory allocations and deallocations: 16
  • 1 (kernels only) is the usual first step.3 (kernels + copies) is great for "why is it so slow?"
NVCOMPILER_ACC_TIME=1
  • Lightweight profiler
  • prints a tidy end-of-run table with per-region and per-kernel times and bytes moved
  • Do not use with CUDA profiler at the same time
NVCOMPILER_ACC_DEBUG=1
  • Spews everything the runtime sees: host/device addresses, mapping events, present-table look-ups, etc.
  • Great for "partially present" or "pointer went missing" errors.
  • Doc for NVCOMPILER_ACC_DEBUG
    • Ctrl+F for NVCOMPILER_ACC_DEBUG

NVHPC OpenMP Options

LIBOMPTARGET_PROFILE=run.json
  • Emits a Chrome-trace (JSON) timeline you can open in chrome://tracing or Speedscope
  • Great lightweight profiler when Nsight is overkill.
  • Granularity in µs via LIBOMPTARGET_PROFILE_GRANULARITY (default 500).
LIBOMPTARGET_INFO=<bitmask>
  • Prints out different types of runtime information
  • Human-readable log of data-mapping inserts/updates, kernel launches, copies, waits.
  • Perfect first stop for "why is nothing copied?"
  • Flags
    • Print all data arguments upon entering an OpenMP device kernel: 0x01
    • Indicate when a mapped address already exists in the device mapping table: 0x02
    • Dump the contents of the device pointer map at kernel exit: 0x04
    • Indicate when an entry is changed in the device mapping table: 0x08
    • Print OpenMP kernel information from device plugins: 0x10
    • Indicate when data is copied to and from the device: 0x20
LIBOMPTARGET_DEBUG=1
  • Developer-level trace (host-side)
  • Much noisier than INFO
  • Only works if the runtime was built with -DOMPTARGET_DEBUG.
LIBOMPTARGET_JIT_OPT_LEVEL=-O{0,1,2,3}
  • This environment variable can be used to change the optimization pipeline used to optimize the embedded device code as part of the device JIT.
  • The value corresponds to the -O{0,1,2,3} command line argument passed to clang.
LIBOMPTARGET_JIT_SKIP_OPT=1
  • This environment variable can be used to skip the optimization pipeline during JIT compilation.
  • If set, the image will only be passed through the backend.
  • The backend is invoked with the LIBOMPTARGET_JIT_OPT_LEVEL flag.

Compiler Documentation


</details>

<details><summary><a href='https://github.com/MFlowCode/MFC/pull/974/files#diff-f0ed60cdca489a7d35d2ca0e370c91a362590e65e4499a4a1e852635058ee11bR17-R41'><strong>Formatting Issue</strong></a>

HTML details tags have been removed inconsistently, leaving orphaned summary tags and potentially breaking the document structure. The removal appears incomplete.
</summary>

```markdown
  <summary><h2>*nix</h2></summary>

- **On supported clusters:** Load environment modules

```shell
. ./mfc.sh load
  • Via Aptitude:
sudo apt update
sudo apt upgrade
sudo apt install tar wget make cmake gcc g++      \
                 python3 python3-dev python3-venv \
                 openmpi-bin libopenmpi-dev       \
                 libhdf5-dev libfftw3-dev         \
                 libblas-dev liblapack-dev

If you wish to build MFC using NVIDIA's NVHPC SDK,
first follow the instructions here.

Windows

```

Copy link

qodo-merge-pro bot commented Aug 4, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Remove orphaned summary tag

The Windows section has a

tag without an opening
tag, creating malformed HTML.
This will cause rendering issues in documentation.

docs/documentation/getting-started.md [37-41]

 If you wish to build MFC using [NVIDIA's NVHPC SDK](https://developer.nvidia.com/hpc-sdk),
 first follow the instructions [here](https://developer.nvidia.com/nvidia-hpc-sdk-downloads).
 
 
-  <summary><h2>Windows</h2></summary>
+## Windows
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies an orphaned <summary> tag in the new code, which results from the PR removing its corresponding <details> tag, and proposes a valid fix to improve the HTML structure.

Medium
  • More

@sbryngelson sbryngelson deleted the lapack branch August 10, 2025 03:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

1 participant