Skip to content

Conversation

@wilfonba
Copy link
Collaborator

@wilfonba wilfonba commented Sep 13, 2025

User description

Description

This PR adds scripts for automated testing of weak scaling, strong scaling, and absolute performance on OLCF Frontier. Details on how to run the tests are included in examples/scaling/FRONTIER_BENCHMARK.md. An example of the benchmark output is


Weak Scaling - Memory: ~64GB, RDMA: F
 nodes  time_avg  efficiency  rel_perf
    16  1.040951    1.000000       1.0
   128  1.047134    0.994095       1.0
  1024  1.063446    0.978847       1.0
  8192  1.068788    0.973955       1.0

Weak Scaling - Memory: ~64GB, RDMA: T
 nodes  time_avg  efficiency  rel_perf
    16  0.959884    1.000000       1.0
   128  0.962885    0.996884       1.0
  1024  0.965518    0.994165       1.0
  8192  0.988542    0.971010       1.0

Strong Scaling - Memory: ~4096GB, RDMA: T
 nodes  time_avg   speedup  efficiency  rel_perf
     8  0.955644  1.000000    1.000000  1.000000
    64  0.149160  6.406820    0.800852  1.000003
   512  0.040367 23.674092    0.369908  0.999991
  4096  0.021175 45.130758    0.088146  1.000000

Strong Scaling - Memory: ~4096GB, RDMA: F
 nodes  time_avg   speedup  efficiency  rel_perf
     8  1.034303  1.000000    1.000000  1.000000
    64  0.171773  6.021347    0.752668  0.999998
   512  0.046694 22.150719    0.346105  0.999997
  4096  0.026555 38.948862    0.076072  1.000015

Grind Time - Single Device
 memory  grind_time  rel_perf
      8    1.309068       1.0
     16    1.258899       1.0
     32    1.144731       1.0
     64    1.144664       1.0


PR Type

Tests, Enhancement


Description

  • Add automated benchmark suite for OLCF Frontier

  • Implement weak/strong scaling and grind time tests

  • Create analysis tools with reference data comparison

  • Provide comprehensive documentation and submission scripts


Diagram Walkthrough

flowchart LR
  A["Submit Scripts"] --> B["Case Generation"]
  B --> C["Pre-process"]
  C --> D["Simulation"]
  D --> E["Log Files"]
  E --> F["Analysis Script"]
  F --> G["Performance Report"]
  H["Reference Data"] --> F
Loading

File Walkthrough

Relevant files
Tests
5 files
analyze.py
Performance analysis and comparison tool                                 
+197/-0 
submit_all.sh
Master submission script for all benchmarks                           
+27/-0   
submit_grind.sh
Single device performance test submission                               
+102/-0 
submit_strong.sh
Strong scaling benchmark submission script                             
+116/-0 
submit_weak.sh
Weak scaling benchmark submission script                                 
+116/-0 
Enhancement
1 files
case.py
Enhanced case configuration with scaling options                 
+117/-170
Miscellaneous
2 files
export.py
Simplified data export functionality                                         
+0/-5     
submit.sh
Remove old submission script                                                         
+0/-73   
Configuration changes
1 files
build.sh
Build script with module loading                                                 
+3/-1     
Documentation
3 files
FRONTIER_BENCH.md
Comprehensive benchmark documentation                                       
+92/-0   
README.md
Updated scaling test description                                                 
+3/-4     
reference.metadata
Reference data collection metadata                                             
+18/-0   

Copilot AI review requested due to automatic review settings September 13, 2025 00:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces automated benchmarking scripts for OLCF Frontier to test weak scaling, strong scaling, and absolute performance of the MFC simulation code. The system provides comprehensive testing capabilities with configurable problem sizes and node counts.

Key changes include:

  • Addition of automated job submission scripts for different scaling scenarios
  • Implementation of performance analysis tools with comparison to reference data
  • Consolidation of case configuration with improved parameter handling

Reviewed Changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
examples/scaling/submit_weak.sh Automated weak scaling job submission script with configurable nodes and memory
examples/scaling/submit_strong.sh Automated strong scaling job submission script with configurable nodes and memory
examples/scaling/submit_grind.sh Single device performance testing script for grind time measurements
examples/scaling/submit_all.sh Master script to orchestrate all benchmark types
examples/scaling/submit.sh Legacy script removal
examples/scaling/reference.metadata Reference data collection metadata and environment information
examples/scaling/reference.dat Baseline performance data for comparison
examples/scaling/export.py Updated data export functionality with formatting improvements
examples/scaling/case.py Enhanced case configuration with improved scaling logic and parameter handling
examples/scaling/build.sh Updated build script with proper module loading
examples/scaling/analyze.py New analysis script for processing benchmark results and comparing to reference data
examples/scaling/README.md Updated documentation for new benchmark system
examples/scaling/FRONTIER_BENCH.md Comprehensive documentation for running benchmarks on OLCF Frontier

@qodo-merge-pro
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Duplicate Code

The parsing logic for weak and strong scaling sections is nearly identical with only minor differences in field names. This code duplication makes maintenance harder and increases the risk of bugs when modifications are needed.

if header.startswith("Weak Scaling"):
    # Parse metadata from header
    mem_match = re.search(r"Memory: ~(\d+)GB", header)
    rdma_match = re.search(r"RDMA: (\w)", header)
    memory = int(mem_match.group(1)) if mem_match else None
    rdma = rdma_match.group(1) if rdma_match else None

    for _, row in df.iterrows():
        records.append({
            "scaling": "weak",
            "nodes": int(row["nodes"]),
            "memory": memory,
            "rdma": rdma,
            "phase": "sim",
            "time_avg": row["time_avg"],
            "efficiency": row["efficiency"]
        })

elif header.startswith("Strong Scaling"):
    mem_match = re.search(r"Memory: ~(\d+)GB", header)
    rdma_match = re.search(r"RDMA: (\w)", header)
    memory = int(mem_match.group(1)) if mem_match else None
    rdma = rdma_match.group(1) if rdma_match else None

    for _, row in df.iterrows():
        records.append({
            "scaling": "strong",
            "nodes": int(row["nodes"]),
            "memory": memory,
            "rdma": rdma,
            "phase": "sim",
            "time_avg": row["time_avg"],
            "speedup": row["speedup"],
            "efficiency": row["efficiency"]
        })
Possible Issue

The closest_three_factors function may return None if no valid triplet is found, but the calling code doesn't handle this case. This could lead to unpacking None values and runtime errors.

def closest_three_factors(n):
    best_triplet = None
    min_range = float('inf')

    # Iterate over possible first factor a
    for a in range(1, int(n ** (1/3)) + 2):  # a should be around the cube root of n
        if n % a == 0:
            n1 = n // a  # Remaining part

            # Iterate over possible second factor b
            for b in range(a, int(math.sqrt(n1)) + 2):  # b should be around sqrt of n1
                if n1 % b == 0:
                    c = n1 // b  # Third factor

                    triplet_range = c - a  # Spread of the numbers
                    if triplet_range < min_range:
                        min_range = triplet_range
                        best_triplet = (a, b, c)

    return best_triplet
Logic Error

In the nxyz_from_ncells_weak function, there's a check for ranks > 64 but the function uses a fixed threshold of 4 partitions per direction. This logic may not scale properly for different node configurations and could cause unexpected failures.

if any(N < 4 for N in ND) and nranks > 64:
    raise RuntimeError(f"Cannot represent {nranks} ranks with at least 4 partitions in each direction.")

@qodo-merge-pro
Copy link
Contributor

qodo-merge-pro bot commented Sep 13, 2025

PR Code Suggestions ✨

No code suggestions found for the PR.

@codecov
Copy link

codecov bot commented Sep 13, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 40.91%. Comparing base (1bf4e9a) to head (7ed700d).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #998   +/-   ##
=======================================
  Coverage   40.91%   40.91%           
=======================================
  Files          70       70           
  Lines       20270    20270           
  Branches     2520     2520           
=======================================
  Hits         8293     8293           
  Misses      10439    10439           
  Partials     1538     1538           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sbryngelson sbryngelson merged commit fbdaecf into MFlowCode:master Sep 15, 2025
50 of 55 checks passed
@sbryngelson sbryngelson deleted the OLCFBenchmark branch September 15, 2025 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

2 participants