Skip to content

Enable auto IO LAYOUT (via FMS parallel IO)#235

Merged
alperaltuntas merged 4 commits intoESCOMP:mainfrom
alperaltuntas:auto_merge_changes
Apr 11, 2025
Merged

Enable auto IO LAYOUT (via FMS parallel IO)#235
alperaltuntas merged 4 commits intoESCOMP:mainfrom
alperaltuntas:auto_merge_changes

Conversation

@alperaltuntas
Copy link
Copy Markdown
Member

@alperaltuntas alperaltuntas commented Mar 4, 2025

This is one of three PRs to enable automated parallel IO in MOM6 within CESM by making use of the existing parallel IO implementation that comes with FMS. These series of PRs (1) enable FMS parallel IO, (2) make sure the IO PE Layout is compatible with auto-land block elimination (i.e., masking), and (3) that partitioned files are automatically merged after a run is completed.

Changes:

  • In MOM_input, set a default value for TARGET_IO_PES to enable FMS parallel IO for tx2_3v2 and tx0.25v1.
  • In input.nml, turn on auto merging of partitioned netcdf files by setting the newly added parameter auto_merge_nc.
  • Update lbe.py, which is used only for prototyping, to reflect the actual, new land block elimination code in Add option to auto-compute an IO LAYOUT when auto masking is on NCAR/MOM6#342

testing: aux_mom.derecho
status: b4b, except for masking in static files due to changing compute layout.
Performance: varies depending on the resolution, pe count and IO layout, from no enhancement or degradation to more than 50% enhancement.

This PR should be evaulated in conjunction with:
NCAR/MOM6#342
ESCOMP/FMS#5

@alperaltuntas alperaltuntas changed the title Enable auto IO LAYOUT Enable auto IO LAYOUT (via FMS parallel IO) Mar 4, 2025
Copy link
Copy Markdown
Collaborator

@mnlevy1981 mnlevy1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two runs going -- one using cesm3_0_alpha06c out of the box, and one with updates to FMS, MOM_interface, and MOM6 to use the new I/O features; I'll make sure they are bit-for-bit and also look at performance changes in the 2/3° grid with MARBL tracers turned on before approving / requesting changes. I did have a couple of comments based on reading through the diffs in this PR

Comment on lines +52 to +53
$OCN_GRID in ["tx2_3v2", "tx0.25v1"]: .true.
else: .false.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the description field in this file? Or just add comments? I'm assuming you only want to set auto_merge_nc = .true. for grids with a reasonable default for TARGET_IO_PES, but it would be nice to have a comment either confirming that or explaining why these are the only two grids that use auto_merge_nc out of the box

Comment thread cime_config/tools/lbe.py
Comment on lines +145 to +148
if max_feasible_p == 0: # first iteration
p_up = int(np.ceil(npes / glob_ocn_frac))
else:
p_up = max_feasible_p
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment about this being the first iteration makes me wonder why we don't just initialize max_feasible_p = int(np.ceil(npes / glob_ocn_frac)) and then we can always set p_up = max_feasible_p; but then I see

                if max_feasible_p == 0:
                    print("^^^^^^^^^^^^^^^ first feasible layout ^^^^^^^^^^^^^^^")
                    max_feasible_p = p

And I'm not sure if that's changing max_feasible_p from int(np.ceil(npes / glob_ocn_frac)) or not (maybe p = p_up could result in an extreme aspect ratio and then the first time in this block is when p = p_up - 1?)

I guess I have two points in this comment:

  1. The way this loop works isn't completely clear, and maybe I should spend a few minutes walking through the steps in detail
  2. At first glance, it seems like we could start with a non-zero max_feasible_p and skip a couple of if blocks in this code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a number of divisions p to be feasible, it must (1) have a reasonable aspect ratio and (2) eliminate enough land blocks. The first p that meets these criteria is max_feasible_p (which is usually far less than p_up). Additionally, there is now a third criterion, the existence of a compatible IO_LAYOUT, but this only becomes relevant after identifying max_feasible_p (which is identified at the end of the first iteration of the outer loop).

I'm not sure why we'd set max_feasible_p to the initial upper bound p_up = int(np.ceil(npes / glob_ocn_frac)). But p_up instead can indeed be initialized as int(np.ceil(npes / glob_ocn_frac)) and later updated to max_feasible_p when it's found (at the end of the first iteration of the outer for loop). However, I opted for an explicit if block in the beginning of the inner for loop to make it clear that it has a varying iteration bound.

Copy link
Copy Markdown
Collaborator

@mnlevy1981 mnlevy1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Details of my testing in ESCOMP/FMS#5 (review)

@mnlevy1981
Copy link
Copy Markdown
Collaborator

Also, based on my testing I think we want to keep the current layout (so it would be nice to bring in #233 )

@mnlevy1981
Copy link
Copy Markdown
Collaborator

Can we merge this in? I'd like to get 66749e3 into #148 and merge that as well so we fix #232 as well as an issue with requesting too many nodes on derecho and letting some of them sit idle in G1850MARBL_JRA compsets

@alperaltuntas
Copy link
Copy Markdown
Member Author

Can we merge this in? I'd like to get 66749e3 into #148 and merge that as well so we fix #232 as well as an issue with requesting too many nodes on derecho and letting some of them sit idle in G1850MARBL_JRA compsets

There is an issue I've identified. I'll try to get it fixed and merge this PR within this week.

@mnlevy1981 mnlevy1981 added this to the cesm3_0_beta06 milestone Apr 10, 2025
@alperaltuntas alperaltuntas self-assigned this Apr 10, 2025
@alperaltuntas alperaltuntas merged commit 8c8dffb into ESCOMP:main Apr 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants