Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support the creation of multiple user_nl files for a component #3999

Closed
billsacks opened this issue Jun 15, 2021 · 23 comments · Fixed by #4058
Closed

Support the creation of multiple user_nl files for a component #3999

billsacks opened this issue Jun 15, 2021 · 23 comments · Fixed by #4058

Comments

@billsacks
Copy link
Member

CISM in CESM will soon support running multiple ice sheets in the same case. Each of these ice sheets will have its own config file (similar to a namelist file), with different settings. Therefore, in a run containing Greenland and Antarctica (for example) we need a way for users to be able to separately specify settings in the user_nl_cism file that apply (1) to both ice sheets; (2) just Greenland; (3) just Antarctica.

Based on discussion here ESCOMP/CISM-wrapper#50 the user_nl_cism file will look like this:

! Settings that apply to all ice sheets go here
myvar = myvalue

[gris]
! Settings that apply to Greenland (if present) go here
foo = value1
bar = value2

[ais]
! Settings that apply to Antarctica (if present) go here
foo = value3
baz = value4

I propose adding a function to CIME (in user_mod_support.py) to parse a file like this, outputting a new file that just contains the requested subset of settings. In CISM's buildnml, we would then call this function three times:

First, we'd request a file that just contains settings that apply to all ice sheets (this is needed to create the top-level cism_in file): separate_user_nl(..., section=None), which would produce:

! Settings that apply to all ice sheets go here
myvar = myvalue

Then we would call separate_user_nl(..., section='gris'), which would produce:

! Settings that apply to all ice sheets go here
myvar = myvalue

[gris]
! Settings that apply to Greenland (if present) go here
foo = value1
bar = value2

Finally, we would call separate_user_nl(..., section='ais'), which would produce:

! Settings that apply to all ice sheets go here
myvar = myvalue

[ais]
! Settings that apply to Antarctica (if present) go here
foo = value3
baz = value4

(Note: It's intentional that the settings that apply to all ice sheets appear in all cases. This section can contain a mix of settings that (a) go in cism_in, which contains non-icesheet-specific settings, and (b) go in all cism.config files – i.e., ice sheet-specific settings, but for which the user wants the same setting to apply to all ice sheets.)

I hope to implement this in the next week or so. I am opening this to gather any feedback on:

  1. Are there any objections to putting this functionality in CIME? I could put it in CISM, but I prefer putting it in CIME for two reasons: (i) I would like to cover this functionality with unit tests; putting it in CISM would require me to build up a new python unit testing infrastructure in CISM. (ii) Maybe this could be useful for other components for some related or different purpose.

  2. Are there any objections to the format of the user_nl file proposed above? We are not completely tied to this format, though we (@whlipscomb, @mvertens and I) prefer it over other options we considered (see Determine how to handle user_nl_cism for multiple ice sheets ESCOMP/CISM-wrapper#50).

@jedwards4b
Copy link
Contributor

@billsacks What about a separate file for each grid? So you would have user_nl_cism, user_nl_cism.gris and user_nl_cism.ais ? I believe that this implementation could be done with fewer less disruptive changes in cime.

@gold2718
Copy link

I agree with @jedwards4b, this feels more like you are running two instances of an ice sheet model so using multiple namelist files feels most CIME-like.

@mvertens
Copy link
Contributor

@jedwards4b @gold2718 - As @whlipscomb pointed out, we will eventually want to support multiple paleo ice sheets: Laurentide, Eurasian, Fennoscandian, and so on. In addition to Greenland (gris) and Antarctica (ais), we'd have to carry around other user_nl_* files for ice sheets that are irrelevant to most people running the model. I believe that this would be awkward for the user and support. What if a user wanted to use data from a previous run - but use less ice sheets. Rather than just editing one user_nl_cism file they would have to separately edit and create only a required subset of files.

@billsacks
Copy link
Member Author

billsacks commented Jun 15, 2021

What about a separate file for each grid? So you would have user_nl_cism, user_nl_cism.gris and user_nl_cism.ais ?

The problem is related to the CIME code for prestaging the user_nl files to the case directory. As alluded to by @whlipscomb and @mvertens, I think it comes down to how much we care about having (potentially many) unnecessary user_nl files staged in the case directory. It would be easy to extend the CIME code that prestages files so that, instead of just staging user_nl_COMP, it stages any file named user_nl_COMP or user_nl_COMP.* (for example). What would be harder, and possibly substantially harder, would be having CIME prestage only those files that are actually needed for the given case. I can't see a way to do this without tying CIME to CISM's XML variables, and it's possible that it would require changing CIME whenever someone wants to introduce a new ice sheet, which is something we'd like to avoid.

To be clear, even with the proposed approach of having a single user_nl_cism with different sections, there may still be unused sections in your user_nl_cism file. For example, in a Greenland-only run, you would ignore the [ais] section. But I think the feeling is that it's more confusing to have completely ignored files (like user_nl_cism.ais) rather than just ignored sections in the user_nl_cism file.

@jedwards4b I'm also wondering if you can explain why you view my proposal as disruptive. It would mean adding one function that could be invoked by components, but wouldn't impact any existing code. That said, I'm fine with keeping this in CISM if you object to its inclusion in CIME.

@jedwards4b
Copy link
Contributor

@mvertens I don't think that we need to "carry around" any user_nl_ files that are not used for the current case. I think that most of this functionality can be handled in cism's buildnml file, using multiple files will require few if any changes in cime. It is known when buildnml is invoked what ice sheets are active for the case. I think that the distinction you are trying to make between support for multiple files and support for multiple sections in a single file is weak at best.

@billsacks
Copy link
Member Author

@jedwards4b I have a feeling you were writing your reply before mine came through. The tie-in with CIME is the need to initially stage the user_nl files to the case directory, which is something handled by CIME, not by CISM's buildnml.

@ghost
Copy link

ghost commented Jun 15, 2021 via email

@jedwards4b
Copy link
Contributor

Perhaps this points out a flaw in cime - the component buildnml knows what user_nl_* to expect in a given case, the responsibility to put them there, if not already present, should be added here. I think the addition of new files with the already defined and tested file format is going to be less disruptive than adding a new format modification which may have multiple unintended consequences.

@jedwards4b
Copy link
Contributor

jedwards4b commented Jun 15, 2021

jedwards - very sorry for your trouble - I don't see how we can remove you from the thread once referenced - you will have to remove yourself using the unsubscribe button on the lower right of the issue page.

@billsacks
Copy link
Member Author

billsacks commented Jun 15, 2021

jedwards sorry that was my fault for not adding the "4b". I apologize.

@billsacks
Copy link
Member Author

@jedwards4b - after talking with @mvertens we would really like to move ahead with the originally-proposed solution of having a single user_nl_cism file with multiple sections. We feel this will be most clear / least confusing to users.

So the question for you at this point is just whether you object to having a function that parses this file in the CIME lib. I want to reiterate that this function would not be called in the normal / existing flow of code. It would just be available for component buildnmls to call this function if they want. So it has no potential for unintended consequences for other components. That said, if you object to putting it in CIME, I can put it in CISM instead.

@gold2718
Copy link

@jedwards4b, I am trying to remove jedwards by editing the comments that had a typo in a reference.

@gold2718
Copy link

I think I withdraw my opposition to this change as long as sections are not required. We are moving in a different direction in CAM but I am interested in learning more about your approach.

... outputting a new file that just contains the requested subset of settings ...

Where is this file output? Why is a file needed instead of some sort of return object?

What is output in the run directory (i.e., one file, multiple files)?

@jedwards4b
Copy link
Contributor

@billsacks I think that (1) would be easier and require few if any changes in cime, but I don't object if you want to proceed with adding separate_user_nl(...) - I'm curious - what is produced in the run directory, that is what does cism actually read for input?

@billsacks
Copy link
Member Author

Perhaps this points out a flaw in cime - the component buildnml knows what user_nl_* to expect in a given case, the responsibility to put them there, if not already present, should be added here.

I see your rationale, but it seems like this would require a bigger change – introducing a whole new step at which a component-specific script is called, in the create_newcase process.

as long as sections are not required

Right, this would be completely optional and not impact other components in any way.

We are moving in a different direction in CAM

I would be interested to hear what that direction is – at least if you are solving a similar problem. One (important) rationale I see for putting this new code in CIME is to try to facilitate code sharing between components if other components have substantially similar needs. Also, if two components are solving basically the same problem, it would be nice for users if the user interface mechanism is the same, rather than having arbitrary differences between components. We could adjust our approach in CISM to remain consistent with other components if there is already a very similar thing being done by others.

Where is this file output? Why is a file needed instead of some sort of return object?

CISM's buildnml code (as for other components) currently calls CIME's function, create_namelist_infile:

###############################################################################
def create_namelist_infile(case, user_nl_file, namelist_infile, infile_text=""):
###############################################################################
lines_input = []
if os.path.isfile(user_nl_file):
with open(user_nl_file, "r") as file_usernl:
lines_input = file_usernl.readlines()
else:
logger.warning("WARNING: No file {} found in case directory".format(user_nl_file))
lines_output = []
lines_output.append("&comp_inparm \n")
if infile_text:
lines_output.append(infile_text)
logger.debug("file_infile {} ".format(infile_text))
for line in lines_input:
match1 = re.search(r"^[\&\/\!]", line)
match2 = re.search(r"\$([\w\_])+", line)
if match1 is None and match2 is not None:
line = case.get_resolved_value(line)
if match1 is None:
lines_output.append(line)
lines_output.append("/ \n")
with open(namelist_infile, "w") as file_infile:
file_infile.write("\n".join(lines_output))

(Side note: this argues for putting a new function in this buildnml.py, not in user_mod_support.py.)

This function expects a pre-existing file (typically $CASEDIR/user_nl_COMP) and creates a new file after doing a bit of reformatting / processing (I think typically $CASEDIR/Buildconf/namelist). Then the path to this temporary namelist file is passed to the namelist generation code.

What I'm proposing is keeping effectively that same flow, but now, instead of using the whole contents of user_nl_COMP, there will be an initial subsetting step before calling create_namelist_infile. I could do this all at once via an extra option to the existing create_namelist_infile function, but in the interest of modularity, testability and avoiding potential disruption, I feel it's best to do this in a separate preprocessing step. So instead of the flow, user_nl_COMP -> [create_namelist_infile] -> Buildconf/namelist, CISM (and any other component that chooses to use this new function) will have the flow user_nl_COMP -> [separate_user_nl] -> Buildconf/user_nl_COMP_subset -> [create_namelist_infile] -> Buildconf/namelist (where names in brackets are function calls and others are file names).

What is output in the run directory (i.e., one file, multiple files)?

I'm curious - what is produced in the run directory, that is what does cism actually read for input?

If you are running with N ice sheets, the final result is N+1 files: a cism_in file that has general settings that apply to all ice sheets (read by the "cism-wrapper" layer), and N cism.config files (read by CISM itself), one for each ice sheet. e.g., in a run with Greenland & Antarctica, you would have cism_in, cism.ais.config and cism.gris.config.

@jedwards4b
Copy link
Contributor

If you are running with N ice sheets, the final result is N+1 files: a cism_in file that has general settings that apply to all ice sheets (read by the "cism-wrapper" layer), and N cism.config files (read by CISM itself), one for each ice sheet. e.g., in a run with Greenland & Antarctica, you would have cism_in, cism.ais.config and cism.gris.config.

This would seem to argue for separate files in the case directory as the easiest solution for the user. They would simply copy the file back from the run directory to user_{filename} in the case directory and edit as needed. I think that this is a common practice, especially if the changes are complex.

@billsacks
Copy link
Member Author

I have been thinking about this more, especially inspired by this comment from @jedwards4b:

the component buildnml knows what user_nl_* to expect in a given case, the responsibility to put them there, if not already present, should be added here.

That idea is growing on me, if (1) @whlipscomb is happy with having multiple user_nl files, and (2) @jedwards4b (and anyone else who cares to weigh in) is comfortable with adding another linkage to the component's buildnml script. (@mvertens has given her okay if others are happy with this.) Here are some thoughts and questions. For questions, please see bold text below.

Specifically, I think it would be relatively easy to add functionality to CISM's buildnml to achieve staging an overall user_nl_cism plus one user_nl_cism_ICESHEET for each ice sheet actually present in this case.

(Q1) @whlipscomb - would you be happy with this solution (i.e., option (1) from ESCOMP/CISM-wrapper#50) if we would only have user_nl files for the ice sheets actually present in this case?

@jedwards4b and other cime developers: The implementation I imagine for this is: In case_setup.py: _build_usernl_files: Add a call to a function that determines which user_nl files are needed for this component. This is done by looking for a function named get_user_nl_list in the component's buildnml module – I guess using the same re.search approach as is used in run_sub_or_cmd (though we wouldn't actually use that function, because we don't want to run this as a command):

# Before attempting to load the script make sure it contains the subroutine
# we are expecting
with open(cmd, 'r') as fd:
for line in fd.readlines():
if re.search(r"^def {}\(".format(subname), line):
do_run_cmd = False
break

If present, then it calls that function to get a list of user_nl files needed for this component; if not present, then it will assume that we just need user_nl_COMP for the given COMP (i.e., the same as now).

(Q2) @jedwards4b and anyone else: does this implementation – and specifically the optional new connection from cime to components' buildnml – seem reasonable?

If people like this, then I have two additional questions, for @jedwards4b, @whlipscomb, @mvertens and anyone else:

(Q3) Tradeoff between ease of adding a new ice sheet and specificity of explanatory text at the top of each file: I can see two ways to implement this:

  1. Store separate template files for each possible ice sheet (e.g., user_nl_cism_gris, user_nl_cism_ais, etc. in the CISM-wrapper repository, then just copy each one into place.
  2. Store only a generic user_nl_cism in the CISM-wrapper repository, then copy that file to identical files named user_nl_cism, user_nl_cism_gris, user_nl_cism_ais, etc. in the case directory.

The advantage of (1) is that we could have comments specific to this ice sheet. For example, in user_nl_cism we could have: ! This file is for settings that apply only to the Greenland Ice Sheet; with (2), we would just have a generic comment like, ! The user_nl_cism file is for settings applying to all ice sheets, user_nl_cism.gris is for settings that apply only to the Greenland Ice Sheet, etc.. The disadvantage of (1) is that it adds one more step that someone would need to do when adding a new ice sheet. I'm leaning slightly towards (2), but don't feel strongly about it, and want to hear other people's thoughts on how important it is to have specific comments for each user_nl file.

(Q4) How to name the files: I would propose names like user_nl_cism_gris, using the short name that we are using in other contexts, including for the grid names and cism.config file names. Are people (especially @whlipscomb) happy with this? An alternative is to use the full icesheet name (like user_nl_cism_greenland). (Note that I prefer an underscore separator over a period because in user_nl_cism.gris, gris can be interpreted as the file type extension.)

Finally, I'll just point out that this:

They would simply copy the file back from the run directory to user_{filename} in the case directory and edit as needed. I think that this is a common practice, especially if the changes are complex.

doesn't work (I think) for CISM, even with a single ice sheet, because I think that the format of the cism.config file is incompatible with the format of the user_nl file. I also think that this is a bad practice that we should discourage, for a number of reasons (documenting what you've changed in your case, and the potential to accidentally override some setting whose default value depends on another namelist variable's value).

@whlipscomb
Copy link
Contributor

@billsacks, thanks for continuing to think about the problem carefully.

For (Q1), I'm OK with having one user_nl_cism file for changes common to all ice sheets, along with additional files like user_nl_cism_gris for changes unique to a single instance. (Provided we don't have user_nl_* files for ice sheets that aren't part of the case, as you said.) I think this is no harder for users to understand than having multiple sections in a single user_nl_cism file, and might actually be less error-prone.

If I understand correctly, a user would have two ways to make a change common to all ice sheets: (1) Change user_nl_cism, or (2) Change both user_nl_cism_gris and user_nl_cism_ais. Is that right? The second way takes more typing but requires editing fewer files. I could imagine that some users might prefer (2).

For (Q3), I don't feel strongly one way or the other.

For (Q4), my preference is to use the short names: gris, ais, and so on. I'm fine with underscores.

Thanks again!

@billsacks
Copy link
Member Author

Thanks @whlipscomb !

If I understand correctly, a user would have two ways to make a change common to all ice sheets: (1) Change user_nl_cism, or (2) Change both user_nl_cism_gris and user_nl_cism_ais. Is that right? The second way takes more typing but requires editing fewer files. I could imagine that some users might prefer (2).

That's right. However, we still need the overall user_nl_cism for changes to variables in cism_in (i.e., CISM-wrapper's namelist). In addition, it could help with backwards compatibility with old case setups, and I think this will help a bit with system testing, though wouldn't be critical for that purpose.

We could say that user_nl_cism is just for changes to cism_in, and you need to use the other files for any changes to the cism.config files. I'm not sure if you were starting to go there with your thinking. However, I don't think it will take much extra code to support having user_nl_cism changes apply to all cism.config files, so as long as you feel that supporting this gives more advantage than confusion, then I'll stick with that plan. Either way is fine with me, so just let me know your preference.

@whlipscomb
Copy link
Contributor

@billsacks, I like the idea that changes to user_nl_cism apply to all cism.config files as well as cism_in, and I'm OK with having more than way to change settings for a particular ice sheet. I just wanted to make sure I understood the logic.

@jedwards4b
Copy link
Contributor

@billsacks This looks good, and I think that the way you are implementing it will allow other components to take advantage of the feature if and when they are ready to do so. Thanks

@billsacks
Copy link
Member Author

billsacks commented Jun 17, 2021

Thanks @jedwards4b - and thanks for your great suggestion of having the components' buildnml handle this responsibility. Once I saw how this could be done, this really did feel like the right approach to me.

@billsacks
Copy link
Member Author

Darn, sorry again jedwards . Since I made the mistake once now your name is coming up in the auto-complete in this thread. I'll try to be more careful!

@billsacks billsacks changed the title Add code to separate user_nl settings into different groups Support the creation of multiple user_nl files for a component Jul 30, 2021
billsacks added a commit to billsacks/cime that referenced this issue Jul 30, 2021
billsacks added a commit to billsacks/cime that referenced this issue Jul 30, 2021
billsacks added a commit that referenced this issue Aug 4, 2021
Support the creation of multiple user_nl files for a component

In CESM, we are extending CISM to allow multiple user_nl_cism* files: an overall user_nl_cism, plus a separate file for each ice sheet (user_nl_cism_ais, user_nl_cism_gris, etc.). This PR adds support for staging multiple user_nl files for a component at case.setup time, via an optional function that can be defined in the component's buildnml. See https://github.com/ESCOMP/CISM-wrapper/blob/23b9a83c81db4a952f889496f5ca1dbe62f91178/cime_config/buildnml#L66-L78 for the definition of this function for CISM.

To support this change, I have rewritten the CIME functionality that determines if a given function is present in a python module so that it is more robust and covered by unit tests.

Test suite:
- scripts_regression_tests on izumi: most tests pass; for failures, there were similar failures in a run from an unmodified copy of cime6.0.3 (some of the same tests failed, though there were a few differences that appeared to be random system issues)
- manual testing of the changes to `run_sub_or_cmd`, ensuring that the `file_contains_python_function` logic is working correctly
- extensive manual testing of the case_setup changes in this PR together with https://github.com/ESCOMP/CISM-wrapper/tree/multiple_user_nl_cisms/ – both with and without the new `get_user_nl_list` function

Test baseline: N/A
Test namelist changes: none
Test status: bit for bit

Fixes #3999

User interface changes?: Allows components to define a `get_user_nl_list` function in their buildnml file, which supports the creation of multiple user_nl files for the component

Update gh-pages html (Y/N)?: N
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants