Skip to content

Update GPU configuration for graupel#1104

Merged
iomaganaris merged 2 commits intomainfrom
graupel_gpu_opt
Mar 12, 2026
Merged

Update GPU configuration for graupel#1104
iomaganaris merged 2 commits intomainfrom
graupel_gpu_opt

Conversation

@iomaganaris
Copy link
Collaborator

Set the gpu_maxnreg to a lower value. This allows increased occupancy of the GPU (tested on GH200). Even if there is excessive register spilling that's not a problem for graupel because I expect that the register spilling will only happen when the mask is enabled. When the mask is disabled (most of the time) we get the benefit of higher occupancy

@iomaganaris iomaganaris requested review from edopao and havogt March 11, 2026 14:03
@edopao
Copy link
Contributor

edopao commented Mar 12, 2026

Can you please measure (approximately) the speedup provided by this change on top of main? I would like to know if its benefits are visible without self-copy removal, to decide whether this change should go separately or together with self-copy removal.

@github-actions
Copy link

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default
  • cscs-ci run distributed

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

  • cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

@iomaganaris
Copy link
Collaborator Author

R02B06 main

For 100 iterations it took 0.6509003639221191 seconds!
For 100 iterations it took 0.6452944278717041 seconds!
For 100 iterations it took 0.6502292156219482 seconds!
For 100 iterations it took 0.6481423377990723 seconds!

R02B08 main

For 100 iterations it took 8.613034963607788 seconds!
For 100 iterations it took 8.910228967666626 seconds!
For 100 iterations it took 8.931719303131104 seconds!
For 100 iterations it took 8.839605808258057 seconds!

vs
R02B06 graupel_gpu_opt

For 100 iterations it took 0.6054491996765137 seconds!
For 100 iterations it took 0.6052908897399902 seconds!
For 100 iterations it took 0.5998942852020264 seconds!
For 100 iterations it took 0.5863747596740723 seconds!

R02B08 graupel_gpu_opt

For 100 iterations it took 8.290613174438477 seconds!
For 100 iterations it took 8.158649444580078 seconds!
For 100 iterations it took 8.326894760131836 seconds!
For 100 iterations it took 8.067954063415527 seconds!

That's an ~8% increase in performance on average for R02B06 and ~7% for R02B08

@iomaganaris
Copy link
Collaborator Author

cscs-ci run default

@iomaganaris
Copy link
Collaborator Author

cscs-ci run distributed

Copy link
Contributor

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the time measurement, very good results!

@iomaganaris iomaganaris merged commit 3ce46a0 into main Mar 12, 2026
48 checks passed
jcanton added a commit that referenced this pull request Mar 18, 2026
* main: (29 commits)
  Scheduled Halo Exchange (#980)
  Add missing metrics fields to `test_parallel_grid_manager.py` test (#1114)
  Muphys: Lowering with single precision (#1101)
  Add single-rank lsq pseudoinv factory test (#1099)
  Cleanup Diffusion config (#1060)
  Fortran bindings: fix numpy allocation and cleanups (#1112)
  fix: fix gt4py metrics extractor in the StencilTest benchmarking (#1111)
  py2fgen: don't recompile if unchanged (#1110)
  CI for standalone_driver (#1070)
  Update mpi4py and pymetis groups to make them optional (#1100)
  Bump mshick/add-pr-comment from 2 to 3 (#1109)
  Use inout fields for full_muphys as well (#1108)
  Update GPU configuration for graupel (#1104)
  Move the mask of _q_t_update outside in graupel (#1093)
  Update gt4py to v1.1.7 (#1105)
  cleanup for ugly if condition of single node default in lsq coeffs (#1103)
  Domain decomposition and halo construction (#540)
  Muphys: Add flag to wait for graupel completion (#1095)
  Give each gt4py program a return type hint (#1087)
  Turn data download off for distributed CI (#1092)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants