Create automated benchmark for OLCF Frontier #998

wilfonba · 2025-09-13T00:38:11Z

User description

Description

This PR adds scripts for automated testing of weak scaling, strong scaling, and absolute performance on OLCF Frontier. Details on how to run the tests are included in examples/scaling/FRONTIER_BENCHMARK.md. An example of the benchmark output is


Weak Scaling - Memory: ~64GB, RDMA: F
 nodes  time_avg  efficiency  rel_perf
    16  1.040951    1.000000       1.0
   128  1.047134    0.994095       1.0
  1024  1.063446    0.978847       1.0
  8192  1.068788    0.973955       1.0

Weak Scaling - Memory: ~64GB, RDMA: T
 nodes  time_avg  efficiency  rel_perf
    16  0.959884    1.000000       1.0
   128  0.962885    0.996884       1.0
  1024  0.965518    0.994165       1.0
  8192  0.988542    0.971010       1.0

Strong Scaling - Memory: ~4096GB, RDMA: T
 nodes  time_avg   speedup  efficiency  rel_perf
     8  0.955644  1.000000    1.000000  1.000000
    64  0.149160  6.406820    0.800852  1.000003
   512  0.040367 23.674092    0.369908  0.999991
  4096  0.021175 45.130758    0.088146  1.000000

Strong Scaling - Memory: ~4096GB, RDMA: F
 nodes  time_avg   speedup  efficiency  rel_perf
     8  1.034303  1.000000    1.000000  1.000000
    64  0.171773  6.021347    0.752668  0.999998
   512  0.046694 22.150719    0.346105  0.999997
  4096  0.026555 38.948862    0.076072  1.000015

Grind Time - Single Device
 memory  grind_time  rel_perf
      8    1.309068       1.0
     16    1.258899       1.0
     32    1.144731       1.0
     64    1.144664       1.0

PR Type

Tests, Enhancement

Description

Add automated benchmark suite for OLCF Frontier
Implement weak/strong scaling and grind time tests
Create analysis tools with reference data comparison
Provide comprehensive documentation and submission scripts

Diagram Walkthrough

flowchart LR
  A["Submit Scripts"] --> B["Case Generation"]
  B --> C["Pre-process"]
  C --> D["Simulation"]
  D --> E["Log Files"]
  E --> F["Analysis Script"]
  F --> G["Performance Report"]
  H["Reference Data"] --> F

File Walkthrough

Relevant files

Tests

5 files

analyze.py `Performance analysis and comparison tool`	+197/-0
submit_all.sh `Master submission script for all benchmarks`	+27/-0
submit_grind.sh `Single device performance test submission`	+102/-0
submit_strong.sh `Strong scaling benchmark submission script`	+116/-0
submit_weak.sh `Weak scaling benchmark submission script`	+116/-0

Enhancement

1 files

case.py `Enhanced case configuration with scaling options`	+117/-170

Miscellaneous

2 files

export.py `Simplified data export functionality`	+0/-5
submit.sh `Remove old submission script`	+0/-73

Configuration changes

1 files

build.sh `Build script with module loading`	+3/-1

Documentation

3 files

FRONTIER_BENCH.md `Comprehensive benchmark documentation`	+92/-0
README.md `Updated scaling test description`	+3/-4
reference.metadata `Reference data collection metadata`	+18/-0

Copilot

Pull Request Overview

This PR introduces automated benchmarking scripts for OLCF Frontier to test weak scaling, strong scaling, and absolute performance of the MFC simulation code. The system provides comprehensive testing capabilities with configurable problem sizes and node counts.

Key changes include:

Addition of automated job submission scripts for different scaling scenarios
Implementation of performance analysis tools with comparison to reference data
Consolidation of case configuration with improved parameter handling

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
examples/scaling/submit_weak.sh	Automated weak scaling job submission script with configurable nodes and memory
examples/scaling/submit_strong.sh	Automated strong scaling job submission script with configurable nodes and memory
examples/scaling/submit_grind.sh	Single device performance testing script for grind time measurements
examples/scaling/submit_all.sh	Master script to orchestrate all benchmark types
examples/scaling/submit.sh	Legacy script removal
examples/scaling/reference.metadata	Reference data collection metadata and environment information
examples/scaling/reference.dat	Baseline performance data for comparison
examples/scaling/export.py	Updated data export functionality with formatting improvements
examples/scaling/case.py	Enhanced case configuration with improved scaling logic and parameter handling
examples/scaling/build.sh	Updated build script with proper module loading
examples/scaling/analyze.py	New analysis script for processing benchmark results and comparing to reference data
examples/scaling/README.md	Updated documentation for new benchmark system
examples/scaling/FRONTIER_BENCH.md	Comprehensive documentation for running benchmarks on OLCF Frontier

examples/scaling/submit_weak.sh

examples/scaling/submit_strong.sh

examples/scaling/submit_grind.sh

examples/scaling/case.py

qodo-merge-pro · 2025-09-13T00:41:29Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review Duplicate Code The parsing logic for weak and strong scaling sections is nearly identical with only minor differences in field names. This code duplication makes maintenance harder and increases the risk of bugs when modifications are needed. if header.startswith("Weak Scaling"): # Parse metadata from header mem_match = re.search(r"Memory: ~(\d+)GB", header) rdma_match = re.search(r"RDMA: (\w)", header) memory = int(mem_match.group(1)) if mem_match else None rdma = rdma_match.group(1) if rdma_match else None for _, row in df.iterrows(): records.append({ "scaling": "weak", "nodes": int(row["nodes"]), "memory": memory, "rdma": rdma, "phase": "sim", "time_avg": row["time_avg"], "efficiency": row["efficiency"] }) elif header.startswith("Strong Scaling"): mem_match = re.search(r"Memory: ~(\d+)GB", header) rdma_match = re.search(r"RDMA: (\w)", header) memory = int(mem_match.group(1)) if mem_match else None rdma = rdma_match.group(1) if rdma_match else None for _, row in df.iterrows(): records.append({ "scaling": "strong", "nodes": int(row["nodes"]), "memory": memory, "rdma": rdma, "phase": "sim", "time_avg": row["time_avg"], "speedup": row["speedup"], "efficiency": row["efficiency"] }) Possible Issue The `closest_three_factors` function may return None if no valid triplet is found, but the calling code doesn't handle this case. This could lead to unpacking None values and runtime errors. def closest_three_factors(n): best_triplet = None min_range = float('inf') # Iterate over possible first factor a for a in range(1, int(n (1/3)) + 2): # a should be around the cube root of n if n % a == 0: n1 = n // a # Remaining part # Iterate over possible second factor b for b in range(a, int(math.sqrt(n1)) + 2): # b should be around sqrt of n1 if n1 % b == 0: c = n1 // b # Third factor triplet_range = c - a # Spread of the numbers if triplet_range < min_range: min_range = triplet_range best_triplet = (a, b, c) return best_triplet Logic Error** In the `nxyz_from_ncells_weak` function, there's a check for ranks > 64 but the function uses a fixed threshold of 4 partitions per direction. This logic may not scale properly for different node configurations and could cause unexpected failures. if any(N < 4 for N in ND) and nranks > 64: raise RuntimeError(f"Cannot represent {nranks} ranks with at least 4 partitions in each direction.")

qodo-merge-pro · 2025-09-13T00:44:54Z

PR Code Suggestions ✨

No code suggestions found for the PR.

codecov · 2025-09-13T03:20:16Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.91%. Comparing base (1bf4e9a) to head (7ed700d).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master     #998   +/-   ##
=======================================
  Coverage   40.91%   40.91%           
=======================================
  Files          70       70           
  Lines       20270    20270           
  Branches     2520     2520           
=======================================
  Hits         8293     8293           
  Misses      10439    10439           
  Partials     1538     1538

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

wilfonba and others added 8 commits September 8, 2025 15:24

Update case file and submit scripts

9dac8c2

add submit files

8687a4e

improvements to perforamnce suite and testing

4658192

update results and add reference data

c3547ae

add missing bash scripts

8e482d8

add reference.dat file

7f0655d

finish reference.dat

e100d19

finish reference.dat

5202494

Copilot AI review requested due to automatic review settings September 13, 2025 00:38

wilfonba requested a review from sbryngelson as a code owner September 13, 2025 00:38

Copilot AI reviewed Sep 13, 2025

View reviewed changes

qodo-merge-pro bot added the Review effort 4/5 label Sep 13, 2025

wilfonba and others added 2 commits September 12, 2025 20:49

format, lint, spelling, and final touchups

80e818c

Merge branch 'master' into OLCFBenchmark

7ed700d

sbryngelson approved these changes Sep 15, 2025

View reviewed changes

sbryngelson merged commit fbdaecf into MFlowCode:master Sep 15, 2025
50 of 55 checks passed

sbryngelson deleted the OLCFBenchmark branch September 15, 2025 18:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Create automated benchmark for OLCF Frontier #998

Create automated benchmark for OLCF Frontier #998

Uh oh!

wilfonba commented Sep 13, 2025 •

edited by qodo-merge-pro bot

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qodo-merge-pro bot commented Sep 13, 2025

Uh oh!

qodo-merge-pro bot commented Sep 13, 2025 •

edited

Loading

Uh oh!

codecov bot commented Sep 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

Create automated benchmark for OLCF Frontier #998

Create automated benchmark for OLCF Frontier #998

Uh oh!

Conversation

wilfonba commented Sep 13, 2025 • edited by qodo-merge-pro bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Description

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

qodo-merge-pro bot commented Sep 13, 2025

PR Reviewer Guide 🔍

Uh oh!

qodo-merge-pro bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Uh oh!

codecov bot commented Sep 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

wilfonba commented Sep 13, 2025 •

edited by qodo-merge-pro bot

Loading

qodo-merge-pro bot commented Sep 13, 2025 •

edited

Loading

codecov bot commented Sep 13, 2025 •

edited

Loading