-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARCTICGRIS PE-layout is very slow... #1098
Comments
Erik, can you dig out the core hours / syr for the ARCTICGRIS and ARCTIC grids? I am trying to compare these numbers with my runs, and I tend to run with more than 8 nodes. |
For a short test run this is what I have (which wouldn't be very accurate, but maybe ballpark): ARCTICGRIS: ARCTIC: total pes active : 288 Overall Metrics: Overall Metrics: Can you post what PE layouts you used for your simulations? |
For an ARCTIC I compset I have: Overall Metrics: I don't have any ARCTICGRIS I-compset runs. But I can estimate it from a F-compset run: Overall Metrics: Estimating CTSM cost = pe-hrs/syr * (LND Run Time/TOT Run Time) = 55893.74 * (403.304 s / 13166.622 s) = 1712.0692 pe-hrs / syr. This seems right to me, as the cost of ARCTICGRIS is about (2-3)X of the ARCTIC grid. How long was your ARCTICGRIS run? Are those numbers inflated due to a long intialization? Adam |
Mine are just from really short 9-step test runs. It should take into account initialization, but they aren't going to be at all accurate for such a short test like that. I don't think they had DEBUG on, but the ARCTICGRIS simulation might have been running out of memory and was dog-slow because of that. |
Just for clarity, how confident are you that those numbers take into account initialization? I thought @PeterHjortLauritzen conveyed to me that they don't account for initialization. |
I'm not very confident. But, it does report the initialization time, so there's no reason it couldn't take it into account. I could also look into the code to check for sure. But, I also know to not really believe this really short test with too few nodes. |
ok. not a high priority, so don't feel obliged to do so. |
The ARCTICGRIS PE-layout is only using 8 nodes and is running at about a half a year per wallclock day on cheyenne.
It has 5X as many points as f09, a time-step that's an 8th of the size and running with a 6th of the number of processors.
Note, CONUS and ARCTIC grids are also only using 8 nodes (for any machine). They don't have particular setups for cheyenne.
And furthermore the fv3 grids have particular setups that seem to be setup for cheyenne, but labeled as any machine. They probably should have separate general setups from the cheyenne specific ones.
The text was updated successfully, but these errors were encountered: