Auto merge partitioned netcdf files#5
Conversation
…and remove existing combined file when root pe opens a partitioned one.
|
Mentioning @mnlevy1981 and @gustavo-marques. |
|
What is the suggested MOM IO strategy for fully coupled compsets? Is this something that can be set as default or does it require user modifications? |
jedwards4b
left a comment
There was a problem hiding this comment.
I think that it would have been better to integrate MOM IO with PIO as with other component models, but I understand the motivation for doing it this way.
|
@jedwards4b I think this approach is lightweight and more maintainable in the long run. I do have an FMS branch where I integrated PIO into the FMS diagnostic manager. That integration is mostly complete, with two remaining issues: compatibility with auto-masking and handling of section files. However, that branch introduces significant changes, over 15,000 new lines of code, which makes me concerned about maintainability and potential fragility. Given the scope of these modifications, I opted for this simpler approach. That said, I’m happy to continue pursuing the full PIO integration if I can get support for resolving the remaining issues and, more importantly, assistance with ongoing maintenance, including porting to newer versions of FMS as needed. Here’s the branch with the PIO integration: |
…mbine via a barrier
mnlevy1981
left a comment
There was a problem hiding this comment.
I compared a 1 year run with G1850MARBL_JRA on the 2/3° grid with the parallel I/O to the current out-of-the-box configuration, and saw a modest speedup with parallel I/O:
- The actual runtime of the model (from "
case.run starting" to "case.run success" in theCaseStatusfile) decreased from 7852 sec (6142 pe-hrs/sim-year, 11 SYPD) to 7594 sec (5940.2 pe-hrs/sim-year, 11.4 SYPD) -- a 3% reduction in cost of the run - The reported Model Cost in the timing file decreased from 5920.28 pe-hrs/simulated_year (11.42 SYPD) to 5644.11 pe-hrs/simulated_year (11.97 SYPD) -- a 5% reduction in cost of the run
- Another number of interest: my first test with the parallel I/O enabled was running at a reported 12.3 SYPD; this run failed in the "combine the various netCDF patches" stage, but a rough estimate would be an "actual runtime" (per
CaseStatus) of 7395 sec (5785 pe-hrs/sim-year, 11.68 SYPD), or a 5.75% decrease in runtime over the serial I/O.
So I think this is good for a 3% - 6% performance boost out of the box with the MARBL tracers active
|
@mnlevy1981, I wonder if MARBL would benefit from increased |
|
If you don't mind running the tests, my CESM sandbox is I didn't fix the PE layout until after running |
|
Sorry, my last comment was pointing to testing for merging the |
by determining which IO PE combines which file based on the order of IO PES (as opposed to IO PE index, which is not guaranteed to be contiguous when land block elimination is enabled)
This is one of three PRs to enable automated parallel IO in MOM6 within CESM by making use of the existing parallel IO implementation that comes with FMS. These series of PRs (1) enable FMS parallel IO, (2) make sure the IO PE Layout is compatible with auto-land block elimination (i.e., masking), and (3) that partitioned files are automatically merged after a run is completed.
The major change in this PR is the introduction of mppnccombine tool into the FMS diag_manager. When an IO_LAYOUT is specified, FMS now automatically merges the partitioned files when diag manager closes the open netcdf files. For domain files, this is done in a round robin fashion, where each IO PE takes care of the auto-merging of a history file stream. For section files, the IO PE with the smallest PE index takes care of the merging. The auto merge feature is enabled by the newly added
auto_merge_ncnamelist parameter.testing: aux_mom.derecho
status: b4b, except for masking in static files due to changing compute layout.
Performance: varies depending on the resolution, pe count and IO layout, from no enhancement or degradation to more than 50% enhancement.
This PR should be evaluated in conjunction with:
NCAR/MOM6#342
ESCOMP/MOM_interface#235