ARCTICGRIS PE-layout is very slow... #1098

ekluzek · 2020-08-06T20:12:57Z

The ARCTICGRIS PE-layout is only using 8 nodes and is running at about a half a year per wallclock day on cheyenne.
It has 5X as many points as f09, a time-step that's an 8th of the size and running with a 6th of the number of processors.

Note, CONUS and ARCTIC grids are also only using 8 nodes (for any machine). They don't have particular setups for cheyenne.

And furthermore the fv3 grids have particular setups that seem to be setup for cheyenne, but labeled as any machine. They probably should have separate general setups from the cheyenne specific ones.

adamrher · 2020-08-07T20:42:31Z

Erik, can you dig out the core hours / syr for the ARCTICGRIS and ARCTIC grids? I am trying to compare these numbers with my runs, and I tend to run with more than 8 nodes.

ekluzek · 2020-08-07T21:43:28Z

For a short test run this is what I have (which wouldn't be very accurate, but maybe ballpark):

ARCTICGRIS:
total pes active : 288
mpi tasks per node : 36
pe count for cost estimate : 288

ARCTIC:

total pes active : 288
mpi tasks per node : 36
pe count for cost estimate : 288

Overall Metrics:
Model Cost: 654.98 pe-hrs/simulated_year
Model Throughput: 10.55 simulated_years/day

Overall Metrics:
Model Cost: 25263.68 pe-hrs/simulated_year
Model Throughput: 0.27 simulated_years/day

Can you post what PE layouts you used for your simulations?

adamrher · 2020-08-07T21:56:42Z

For an ARCTIC I compset I have:
total pes active : 1800
mpi tasks per node : 36
pe count for cost estimate : 1800

Overall Metrics:
Model Cost: 656.34 pe-hrs/simulated_year
Model Throughput: 65.82 simulated_years/day

I don't have any ARCTICGRIS I-compset runs. But I can estimate it from a F-compset run:
total pes active : 7680
mpi tasks per node : 36
pe count for cost estimate : 7704

Overall Metrics:
Model Cost: 55893.74 pe-hrs/simulated_year
Model Throughput: 3.31 simulated_years/day

Estimating CTSM cost = pe-hrs/syr * (LND Run Time/TOT Run Time) = 55893.74 * (403.304 s / 13166.622 s) = 1712.0692 pe-hrs / syr. This seems right to me, as the cost of ARCTICGRIS is about (2-3)X of the ARCTIC grid.

How long was your ARCTICGRIS run? Are those numbers inflated due to a long intialization?

Adam

ekluzek · 2020-08-07T22:08:07Z

Mine are just from really short 9-step test runs. It should take into account initialization, but they aren't going to be at all accurate for such a short test like that. I don't think they had DEBUG on, but the ARCTICGRIS simulation might have been running out of memory and was dog-slow because of that.

adamrher · 2020-08-07T22:33:36Z

Just for clarity, how confident are you that those numbers take into account initialization? I thought @PeterHjortLauritzen conveyed to me that they don't account for initialization.

ekluzek · 2020-08-07T22:51:19Z

I'm not very confident. But, it does report the initialization time, so there's no reason it couldn't take it into account. I could also look into the code to check for sure. But, I also know to not really believe this really short test with too few nodes.

adamrher · 2020-08-07T23:22:09Z

ok. not a high priority, so don't feel obliged to do so.

ekluzek added the enhancement new capability or improved behavior of existing capability label Aug 6, 2020

ekluzek self-assigned this Aug 6, 2020

ekluzek added the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Aug 11, 2020

ekluzek mentioned this issue Aug 11, 2020

Default PE layouts for some of the new SE/FV3 grids are problematic #1105

Closed

billsacks removed the next this should get some attention in the next week or two. Normally each Thursday SE meeting. label Aug 13, 2020

ekluzek mentioned this issue Aug 18, 2020

Adjust FV3/SE PE layouts #1111

Merged

ekluzek closed this as completed in #1111 Aug 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARCTICGRIS PE-layout is very slow... #1098

ARCTICGRIS PE-layout is very slow... #1098

ekluzek commented Aug 6, 2020

adamrher commented Aug 7, 2020 •

edited

Loading

ekluzek commented Aug 7, 2020

adamrher commented Aug 7, 2020 •

edited

Loading

ekluzek commented Aug 7, 2020

adamrher commented Aug 7, 2020

ekluzek commented Aug 7, 2020

adamrher commented Aug 7, 2020

ARCTICGRIS PE-layout is very slow... #1098

ARCTICGRIS PE-layout is very slow... #1098

Comments

ekluzek commented Aug 6, 2020

adamrher commented Aug 7, 2020 • edited Loading

ekluzek commented Aug 7, 2020

adamrher commented Aug 7, 2020 • edited Loading

ekluzek commented Aug 7, 2020

adamrher commented Aug 7, 2020

ekluzek commented Aug 7, 2020

adamrher commented Aug 7, 2020

adamrher commented Aug 7, 2020 •

edited

Loading

adamrher commented Aug 7, 2020 •

edited

Loading