Update GPU configuration for graupel#1104
Conversation
|
Can you please measure (approximately) the speedup provided by this change on top of main? I would like to know if its benefits are visible without self-copy removal, to decide whether this change should go separately or together with self-copy removal. |
|
Mandatory Tests Please make sure you run these tests via comment before you merge!
Optional Tests To run benchmarks you can use:
To run tests and benchmarks with the DaCe backend you can use:
To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:
For more detailed information please look at CI in the EXCLAIM universe. |
|
vs
That's an ~8% increase in performance on average for |
|
cscs-ci run default |
|
cscs-ci run distributed |
edopao
left a comment
There was a problem hiding this comment.
Thank you for the time measurement, very good results!
* main: (29 commits) Scheduled Halo Exchange (#980) Add missing metrics fields to `test_parallel_grid_manager.py` test (#1114) Muphys: Lowering with single precision (#1101) Add single-rank lsq pseudoinv factory test (#1099) Cleanup Diffusion config (#1060) Fortran bindings: fix numpy allocation and cleanups (#1112) fix: fix gt4py metrics extractor in the StencilTest benchmarking (#1111) py2fgen: don't recompile if unchanged (#1110) CI for standalone_driver (#1070) Update mpi4py and pymetis groups to make them optional (#1100) Bump mshick/add-pr-comment from 2 to 3 (#1109) Use inout fields for full_muphys as well (#1108) Update GPU configuration for graupel (#1104) Move the mask of _q_t_update outside in graupel (#1093) Update gt4py to v1.1.7 (#1105) cleanup for ugly if condition of single node default in lsq coeffs (#1103) Domain decomposition and halo construction (#540) Muphys: Add flag to wait for graupel completion (#1095) Give each gt4py program a return type hint (#1087) Turn data download off for distributed CI (#1092) ...
Set the
gpu_maxnregto a lower value. This allows increased occupancy of the GPU (tested on GH200). Even if there is excessive register spilling that's not a problem for graupel because I expect that the register spilling will only happen when the mask is enabled. When the mask is disabled (most of the time) we get the benefit of higher occupancy