Update RDHPCS Hera resource for `eupd` task #2636

HenryRWinterbottom · 2024-05-29T16:51:45Z

This PR addresses issue #2454. The following is accomplished:

As per @wx20jjung, the resource for the eupd task have been updated for RDHPCS Hera to account for memory issues for C384 gdaseupd job fails.

Resolves gdaseupd memory issues on Hera #2454

Type of change

Bug fix (fixes something broken)

Change characteristics

Is this a breaking change (a change in existing functionality)? NO
Does this change require a documentation update? NO

How has this been tested?

This has been tested by @wx20jjung during experiment applications. A Rocoto workflow (e.g., XML) was provided containing the following job information for the gdaseupd task:

<task name="enkfgdaseupd" cycledefs="gdas" maxtries="&MAXTRIES;">

        <command>/scratch1/NCEPDEV/da/Henry.Winterbottom/trunk/global-workflow.gwdev_issue_2454/jobs/rocoto/eupd.sh</command>

        <jobname><cyclestr>HW_test_enkfgdaseupd_@H</cyclestr></jobname>
        <account>da-cpu</account>
        <queue>batch</queue>
        <partition>hera</partition>
        <walltime>00:30:00</walltime>
        <nodes>16:ppn=5:tpp=8</nodes>
        <native>--export=NONE</native>

The changes to parm/config/gfs/config.resources results in the following:

<task name="enkfgdaseupd" cycledefs="gdas" maxtries="&MAXTRIES;">

        <command>/scratch1/NCEPDEV/da/Henry.Winterbottom/trunk/global-workflow.gwdev_issue_2454/jobs/rocoto/eupd.sh</command>

        <jobname><cyclestr>x002_gwdev_issue_2454_enkfgdaseupd_@H</cyclestr></jobname>
        <account>fv3-cpu</account>
        <queue>batch</queue>
        <partition>hera</partition>
        <walltime>00:30:00</walltime>
        <nodes>16:ppn=5:tpp=8</nodes>
        <native>--export=NONE</native>

Checklist

Any dependent changes have been merged and published
My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
New and existing tests pass with my changes
I have made corresponding changes to the documentation if necessary

parm/config/gfs/config.resources

Co-authored-by: Walter Kolczynski - NOAA <Walter.Kolczynski@noaa.gov>

KateFriedman-NOAA · 2024-05-30T14:12:23Z

@wx20jjung found a solution to be to change the runtime layout to 5 PEs per node with 8 threads (instead of 8 PEs/5 threads) and 80 PEs total (instead of 270). This resulted in much shorter wait times and only about 5 minutes longer run time.

@wx20jjung @HenryWinterbottom-NOAA Interesting, I'm not used to seeing a resource fix that means fewer tasks and nodes...if this is a memory issue then doesn't fewer nodes mean not enough memory? The change in this PR really only reduces the task number for this job on Hera, it will still be using 8 threads and 5 ppn (as it was before). Since the issue, that this PR aims to fix, reported that the issue was intermittent and resolvable upon rerun, was this tested over many cycles to ensure it's good and fixes the problem?

Another question...why change the default nth_eupd to be 5 threads instead of 8? That change means that every machine not already specified with a machine if-block in that section will now use 5 threads instead of 8 (e.g. Orion, Hercules, and Jet). Was that change tested on those machines at C384?

wx20jjung · 2024-05-30T14:47:06Z

@kate Friedman - NOAA Federal ***@***.***> The underlying problem is memory per node, not total memory. The eupd step does not have openmp statements so having threads greater than 1 just shuts down cores on the node. Using 8 tasks can sometimes cause a memory use problem within a node. Adding more nodes does not solve this memory failure as it is not a total memory problem. I am not allowed to login to the cluster nodes to monitor memory usage so I do not know what the optimum configuration should be. The global workflow is also not setup to call tasks-per-node for this step, which would help optimize the node (and memory) usage. I suspect 6 or 7 tasks (and 1 thread) would be the optimum use for the 40 core nodes on hera and jet. I have been running the 5 task / 8 thread combination at C384 on hera and jet (kjet, 40 core nodes) for several months now with no failures. I can't comment on any of the other machines or model resolutions.

…

On Thu, May 30, 2024 at 10:12 AM Kate Friedman ***@***.***> wrote: @wx20jjung <https://github.com/wx20jjung> found a solution to be to change the runtime layout to 5 PEs per node with 8 threads (instead of 8 PEs/5 threads) and 80 PEs total (instead of 270). This resulted in much shorter wait times and only about 5 minutes longer run time. @wx20jjung <https://github.com/wx20jjung> @HenryWinterbottom-NOAA <https://github.com/HenryWinterbottom-NOAA> Interesting, I'm not used to seeing a resource fix that means fewer tasks and nodes...if this is a memory issue then doesn't fewer nodes mean not enough memory? The change in this PR really only reduces the task number for this job on Hera, it will still be using 8 threads and 5 ppn (as it was before). Since the issue, that this PR aims to fix, reported that the issue was intermittent and resolvable upon rerun, was this tested over many cycles to ensure it's good and fixes the problem? Another question...why change the default nth_eupd to be 5 threads instead of 8? That change means that every machine not already specified with a machine if-block in that section will now use 5 threads instead of 8 (e.g. Orion, Hercules, and Jet). Was that change tested on those machines at C384? — Reply to this email directly, view it on GitHub <#2636 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMPASA4LSQFMI5DH2WHNNQLZE4XV5AVCNFSM6AAAAABIPKNKQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZZGY2DSNBQGI> . You are receiving this because you were mentioned.Message ID: ***@***.***>

KateFriedman-NOAA · 2024-05-30T15:23:54Z

@wx20jjung Thanks for that explanation, that helps my understanding of the issue!

I have been running the 5 task / 8 thread combination at C384 on hera

So, the global-workflow already has/had 8 threads and 5 tasks (ppn = 40/8threads) for C384 so this PR is only lowering the total task number and thus resulting node number. It seems like we already have the thread/ppn solution that was working for you. Does lowering the task number and node help further then (what this PR does)? I suspect a different resource configuration is needed.

The global workflow is also not setup to call tasks-per-node for this step, which would help optimize the node (and
memory) usage.

The global-workflow config.resources has npe_node_JOB variables that can be adjusted if needed. We generally just set them like this for each job:

export npe_node_eupd=$(( npe_node_max / nth_eupd ))

...but, if needed, we can set this differently for a job/resolution.

Currently we set resources based on the following variables and calculations (showing this PR eupd resources as example):

npe_node_max=40 (total number of PEs per node on Hera)
npe_eupd=80 (total number of tasks for job)
nth_eupd=8 (threads for job)
npe_node_eupd=40/8=5 (PEs per node for job)

--> nodes=npe_eupd/npe_node_eupd=80/5=16

I am not allowed to login to the cluster nodes to monitor memory usage so I do not know what the optimum
configuration should be.

We can add a memory command to a job to get the memory information printed in the log if needed. It's messy output with some error messages that can be ignored...which is why we don't have it on by default on Hera. Let me know if that would help to determine the memory needed.

I suspect 6 or 7 tasks (and 1 thread) would be the optimum use for the 40 core nodes on hera and jet.

Perhaps stepping back from what I went through above...what would you suggest for the resulting xml resource statement? The current result from this PR would be: <nodes>16:ppn=5:tpp=8</nodes>
Sounds like this may be what you're suggesting: <nodes>13:ppn=6:tpp=1</nodes> (note, the node value is a round down using 6ppn, which doesn't divide evenly into 80 tasks, it may end up as 14 nodes, one would have to run setup_xml step to see)

Let us know what would potentially be a better resource configuration for C384 eupd.

Note: the resource configuration method in global-workflow is being redesigned now, so feel free to provide a resource suggestion that doesn't have the current calculation constraints and we can see if we can accommodate it

wx20jjung · 2024-05-30T18:02:06Z

@kate Friedman - NOAA Federal ***@***.***> First, a clarification. The version(s) of global-workflow I am using have the "old" configuration of npe_eupd=270, nth_eupd=5. I changed these in my versions so that npe_eupd=80, ppn=5,tpp=8, or <nodes>16:ppn=5:tpp=8</nodes> to keep the jobs from failing on hera and jet. This keeps the *.xml consistent with the config.* file. From this point on, I have to be careful as grant funding is not allowed to "transition items to operations" and I am already in trouble for transitioning code to EMC. So, these are only suggestions. My first suggestion is to identify the total memory needed for a specific number of ensembles and resolution and observation data volume. You only need this info for a few cycles. This should identify how many nodes you will need. If possible, also check the memory requirements for each mpi task. The nature of this failure suggests the memory requirement for each task is not balanced. There are probably one or more "outliers". The compiler and hardware vendors, and RDHPCS should be able to help with this. You will need to assume all the tasks use the maximum (outlier) memory. There is no "one size fits all' configuration for the complex workflow you have. Your configurations seem to be setup for task/thread ratios per node. SLURM gives you a lot of options on how to pack a node. I do not know what the defaults are on the various machines. S4 was setup to fill a node before moving on to the next node. Some put a task on each node (round robin) until it runs out of tasks. I suggest a scenario where you put as many tasks on a node as possible to keep the MPI communication traffic across the network to a minimum. Any distribution scenario is messy and will have to be tailored for each job and the node configuration.

…

On Thu, May 30, 2024 at 11:24 AM Kate Friedman ***@***.***> wrote: @wx20jjung <https://github.com/wx20jjung> Thanks for that explanation, that helps my understanding of the issue! I have been running the 5 task / 8 thread combination at C384 on hera So, the global-workflow already has/had 8 threads <https://github.com/NOAA-EMC/global-workflow/blob/develop/parm/config/gfs/config.resources#L1048> and 5 tasks (ppn = 40/8threads) for C384 so this PR is only lowering the total task number and thus resulting node number. It seems like we already have the thread/ppn solution that was working for you. Does lowering the task number and node help further then (what this PR does)? I suspect a different resource configuration is needed. The global workflow is also not setup to call tasks-per-node for this step, which would help optimize the node (and memory) usage. The global-workflow config.resources has npe_node_JOB variables <https://github.com/NOAA-EMC/global-workflow/blob/develop/parm/config/gfs/config.resources#L1069> that can be adjusted if needed. We generally just set them like this for each job: export npe_node_eupd=$(( npe_node_max / nth_eupd )) ...but, if needed, we can set this differently for a job/resolution. Currently we set resources based on the following variables and calculations (showing this PR eupd resources as example): npe_node_max=40 (total number of PEs per node on Hera) npe_eupd=80 (total number of tasks for job) nth_eupd=8 (threads for job) npe_node_eupd=40/8=5 (PEs per node for job) --> nodes=npe_eupd/npe_node_eupd=80/5=16 I am not allowed to login to the cluster nodes to monitor memory usage so I do not know what the optimum configuration should be. We can add a memory command to a job to get the memory information printed in the log if needed. It's messy output with some error messages that can be ignored...which is why we don't have it on by default on Hera. Let me know if that would help to determine the memory needed. I suspect 6 or 7 tasks (and 1 thread) would be the optimum use for the 40 core nodes on hera and jet. Perhaps stepping back from what I went through above...what would you suggest for the resulting xml resource statement? The current result from this PR would be: <nodes>16:ppn=5:tpp=8</nodes> Sounds like this may be what you're suggesting: <nodes>13:ppn=6:tpp=1</nodes> (note, the node value is a round down using 6ppn, which doesn't divide evenly into 80 tasks, it may end up as 14 nodes, one would have to run setup_xml step to see) Let us know what would potentially be a better resource configuration for C384 eupd. Note: the resource configuration method in global-workflow is being redesigned now, so feel free to provide a resource suggestion that doesn't have the current calculation constraints and we can see if we can accommodate it — Reply to this email directly, view it on GitHub <#2636 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AMPASAZA53PYJTQUW4KOPB3ZE5ACBAVCNFSM6AAAAABIPKNKQCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZZHEZDSMRSHE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

aerorahul · 2024-06-03T14:16:26Z

parm/config/gfs/config.resources

@@ -1045,13 +1045,16 @@ case ${step} in
        ;;
      "C384")
        export npe_eupd=270
-        export nth_eupd=8
+        export nth_eupd=5


@HenryWinterbottom-NOAA
Is this change necessary? This will impact all machines except the ones noted in the if block below.
Also, in the if-block, Hera is reset to 8. 8 is the develop version.
Is the only change needed here on line 1056? For Hera, npe_eupd=80?

No, thank you for catching my oversight. Updating only line 1056 results in

<task name="enkfgdaseupd" cycledefs="gdas" maxtries="&MAXTRIES;"> <command>/scratch1/NCEPDEV/da/Henry.Winterbottom/trunk/global-workflow.gwdev_issue_2454/jobs/rocoto/eupd.sh</command> <jobname><cyclestr>x002_gwdev_issue_2454_enkfgdaseupd_@H</cyclestr></jobname> <account>fv3-cpu</account> <queue>batch</queue> <partition>hera</partition> <walltime>00:30:00</walltime> <nodes>16:ppn=5:tpp=8</nodes> <native>--export=NONE</native>

Which is consistent with @wx20jjung configuration:

<task name="enkfgdaseupd" cycledefs="gdas" maxtries="&MAXTRIES;"> <command>/scratch1/NCEPDEV/da/Henry.Winterbottom/trunk/global-workflow.gwdev_issue_2454/jobs/rocoto/eupd.sh</command> <jobname><cyclestr>x002_gwdev_issue_2454_enkfgdaseupd_@H</cyclestr></jobname> <account>fv3-cpu</account> <queue>batch</queue> <partition>hera</partition> <walltime>00:30:00</walltime> <nodes>16:ppn=5:tpp=8</nodes> <native>--export=NONE</native>

I pushed the correction.

aerorahul · 2024-06-03T16:52:34Z

@wx20jjung Do the changes in this PR resolve the issue reported in #2454?
Thank you for your time

DavidHuber-NOAA

LGTM

WalterKolczynski-NOAA

Conditionally approved

emcbot · 2024-06-06T23:03:42Z

CI Passed Hera at
Built and ran in directory /scratch1/NCEPDEV/global/CI/2636

…bal-workflow into feature/move_jcb * 'feature/move_jcb' of https://github.com/danholdaway/global-workflow: Add COM template for JEDI obs (NOAA-EMC#2678) Link both global-nest fix files and non-nest ones at the same time (NOAA-EMC#2632) Update ufs-weather-model (NOAA-EMC#2663) Add ability to process ocean/ice products specific to GEFS (NOAA-EMC#2561) Update cleanup job to use COMIN/COMOUT (NOAA-EMC#2649) Add overwrite to creat experiment in BASH CI (NOAA-EMC#2676) Add handling to select CRTM cloud optical table based on cloud scheme and update calcanal_gfs.py (NOAA-EMC#2645) Update RDHPCS Hera resource for `eupd` task (NOAA-EMC#2636)

HenryRWinterbottom and others added 6 commits May 24, 2024 17:57

Updated resources.

c3e0302

Merge branch 'NOAA-EMC:develop' into feature/gwdev_issue_2454

658b39f

Corrected oversight.

8172bfc

Debugging.

ab976bf

Merge branch 'NOAA-EMC:develop' into feature/gwdev_issue_2454

a26d95f

RDHPCS Hera resource update for eupd task.

01ba970

HenryRWinterbottom marked this pull request as ready for review May 29, 2024 16:52

HenryRWinterbottom requested review from WalterKolczynski-NOAA and KateFriedman-NOAA May 29, 2024 16:52

WalterKolczynski-NOAA requested changes May 29, 2024

View reviewed changes

parm/config/gfs/config.resources Outdated Show resolved Hide resolved

Update parm/config/gfs/config.resources

d6563ac

Co-authored-by: Walter Kolczynski - NOAA <Walter.Kolczynski@noaa.gov>

KateFriedman-NOAA requested a review from CatherineThomas-NOAA May 30, 2024 14:03

Merge branch 'develop' into feature/gwdev_issue_2454

9d287a9

aerorahul reviewed Jun 3, 2024

View reviewed changes

HenryRWinterbottom and others added 2 commits June 3, 2024 10:33

Merge branch 'NOAA-EMC:develop' into feature/gwdev_issue_2454

f8d3ca6

Addressed reviewer comment.

5c8110c

aerorahul requested review from WalterKolczynski-NOAA and DavidHuber-NOAA June 3, 2024 16:51

Merge branch 'develop' into feature/gwdev_issue_2454

34b2cdb

DavidHuber-NOAA approved these changes Jun 4, 2024

View reviewed changes

WalterKolczynski-NOAA approved these changes Jun 5, 2024

View reviewed changes

WalterKolczynski-NOAA added the CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera label Jun 6, 2024

emcbot added CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera and removed CI-Hera-Ready **CM use only** PR is ready for CI testing on Hera CI-Hera-Building **Bot use only** CI testing is cloning/building on Hera labels Jun 6, 2024

emcbot added the CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress label Jun 6, 2024

Merge branch 'NOAA-EMC:develop' into feature/gwdev_issue_2454

c79ebcb

emcbot added CI-Hera-Passed **Bot use only** CI testing on Hera for this PR has completed successfully and removed CI-Hera-Running **Bot use only** CI testing on Hera for this PR is in-progress labels Jun 6, 2024

WalterKolczynski-NOAA merged commit 9caa51d into NOAA-EMC:develop Jun 7, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update RDHPCS Hera resource for `eupd` task #2636

Update RDHPCS Hera resource for `eupd` task #2636

HenryRWinterbottom commented May 29, 2024

KateFriedman-NOAA commented May 30, 2024

wx20jjung commented May 30, 2024 via email

KateFriedman-NOAA commented May 30, 2024

wx20jjung commented May 30, 2024 via email

aerorahul Jun 3, 2024

HenryRWinterbottom Jun 3, 2024 •

edited

Loading

aerorahul commented Jun 3, 2024

DavidHuber-NOAA left a comment

WalterKolczynski-NOAA left a comment

emcbot commented Jun 6, 2024

Update RDHPCS Hera resource for eupd task #2636

Update RDHPCS Hera resource for eupd task #2636

Conversation

HenryRWinterbottom commented May 29, 2024

Type of change

Change characteristics

How has this been tested?

Checklist

KateFriedman-NOAA commented May 30, 2024

wx20jjung commented May 30, 2024 via email

KateFriedman-NOAA commented May 30, 2024

wx20jjung commented May 30, 2024 via email

aerorahul Jun 3, 2024

Choose a reason for hiding this comment

HenryRWinterbottom Jun 3, 2024 • edited Loading

Choose a reason for hiding this comment

aerorahul commented Jun 3, 2024

DavidHuber-NOAA left a comment

Choose a reason for hiding this comment

WalterKolczynski-NOAA left a comment

Choose a reason for hiding this comment

emcbot commented Jun 6, 2024

Update RDHPCS Hera resource for `eupd` task #2636

Update RDHPCS Hera resource for `eupd` task #2636

HenryRWinterbottom Jun 3, 2024 •

edited

Loading