Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stream data input now set in new config_component.xml variables #1929

Closed
wants to merge 31 commits into from

Conversation

mvertens
Copy link

@mvertens mvertens commented Dec 19, 2022

Description of changes

stream data input now set in new config_component.xml variables

Specific notes

Stream input for the following modules:

FireDataBaseType.F90
SoilMoistureStreamMod.F90
UrbanTimeVarType.F90
ch4FInundatedStreamType.F90
laiStreamMod.F90
ndepStreamMod.F90

is now set in config_components.xml (instead of namelist_defaults_ctsm.xml) via the following new xml variables:

<entry id="CLM_STREAM_XXX_YEAR_FIRST">
<entry id="CLM_STREAM_XXX_YEAR_LAST">
<entry id="CLM_STREAM_XXX_YEAR_ALIGN">
<entry id="CLM_STREAM_XXX_DATA_FILENAME">
<entry id="CLM_STREAM_XXX_MESH_FILENAME">
<entry id="CLM_STREAM_XXX_MAPALGO">
<entry id="CLM_STREAM_XXX_TINTALGO">

where

XXX= [NDEP, POPDENS, LIGHTNG,URBANTV, LAI, SOILM, CH4FINUNDATED]

Moving forward the NDEP stream will be removed from CTSM totally as input will always come from either DATM or CAM.
These changes greatly simplify both the current use case logic, remove a lot of entries from namelist_defaults_ctsm.xml and
also simplify the logic in CLMBuildNamelist.pm.

NOTE: build-namelist_test.pl still needs to have some updates to work with these changes. @ekluzek - Its not clear to me how often this tool is still used.

Contributors other than yourself, if any: @billsacks

CTSM Issues Fixed (include github issue #):

Are answers expected to change (and if so in what way)? No

Any User Interface Changes (namelist or namelist defaults changes)? Yes (see explanation above)

Testing performed, if any: ./run_sys_tests -s aux_clm -c ctsm5.1.dev115 --skip-generate were run on both cheyenne and izumi and both showed all test passing that were not expected failures

@billsacks
Copy link
Member

Notes from a meeting with @mvertens and @ekluzek -

@ekluzek plans to lead the review of this. I'm removing myself as a formal reviewer but will be available for any discussions needed or if @ekluzek wants to turn it over to me.

Mariana hasn't tried to get the build-namelist perl test working. This would require significant rework. She asks for this to be deferred to later in order to move this PR along. @ekluzek would like to look into whether he can get it working fairly quickly. I didn't say this in the meeting, but my feeling is that, if @ekluzek can't get this working fairly quickly, I'd support bringing this in with those tests failing for now, and then we get them working in a follow-up PR so that this doesn't hold up other work.

NEON use cases have been changed; Mariana thinks she has this working, but still needs to run the izumi testing.

Mariana has put in a CAM PR with the ndep changes in CAM; has put in xml variables to parallel the ones in CTSM. There are relatively small differences, maybe due to time sequencing? It would be interesting to compare CAM vs. datm; Mariana feels those should be close. In any case, Mariana feels pretty comfortable that CAM is working correctly; if datm and CAM are close then she would feel even more comfortable.

This PR can come in before the CAM one (or anything else). But the CAM one needs to come in before changing CTSM to get ndep from atm rather than internally.

@mvertens
Copy link
Author

@ekluzek @billsacks - my izumi tests are now passing with no unexpected errors and are bfb with ctsm5.1.dev115 .

@jedwards4b
Copy link
Contributor

What is the status of this PR? Please provide a timeline for merging. @ekluzek @billsacks

@ekluzek ekluzek self-assigned this Jan 9, 2023
@ekluzek ekluzek added the priority: high High priority task to fix soon, e.g., because it is a problem in important configurations label Jan 9, 2023
@ekluzek
Copy link
Contributor

ekluzek commented Jan 9, 2023

@jedwards4b @billsacks and I discussed this last Friday. I'm going to do a review and suggest some level of changes. Both @billsacks and I will work on the changes if @mvertens is unable to. We get the impression she is not in a position to work on this right now, which is fine we can take it on. It is important for the build-namelist unit testing to be functional, so that is a task that I will take on. The changes here also make me want to reexamine #585 and the CTSM software team will look into that as a longer term change.

@wwieder
Copy link
Contributor

wwieder commented Jan 9, 2023

As this comes to main, it's also likely worth bringing up relevant changes at a CTSM science meeting so others are aware of what's going on. To echo's @jedwards4b question, what's the timeline for this PR?

Copy link
Contributor

@ekluzek ekluzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the work in here to make streams control names to be similar to each other. The names came in at different times previously and weren't synchronized. It helps to keep them consistent. That does improve readability which is great.

This does have both user-facing and developer facing changes though, so I want to discuss with the CTSM community about how this impacts them. This also requires a pretty extensive update to the user's guide, which is not done here.

However, I think there are some real disadvantages to this approach. Simply moving data from namelist defaults to config_component means that preview_namelist can no longer adapt and change the namelist based on other settings. And config_component is tied to the COMPSET which is fixed for a given case, so many changes now require setting up a new case whereas they didn't before. If it could be dependent on different variables the best we could do is to have the user rerun case.setup to get the updates. But, that's not something that users typically do.

Another problem is that most of the streams variables aren't used for a typical case, but the user can change them and not realize that it won't make a difference to the simulation. The error checking that prevents that from happening was stripped out of this. This makes the model harder to use and understand. Since we have a large user-base that uses CTSM, but isn't an expert on it (because they are primarily interested in the atmosphere for example), it's bad for the users to allow them to change things and not tell them that it won't affect their simulation. It's also bad to have them set to something that it LOOKS like it will do something -- but it really doesn't.

An example case for this is to setup a SP case. In that case there are 47 stream xml variables that the user will see, but only 6 of them actually impact the simualtion. But, the other 41 are set to values such that it looks like they do. We can improve this so it doesn't work this way, but that will take effort and it still has the limitation that I talk about at the beginning that the STREAMS xml variables are more or less locked in after create_newcase (in terms of reading the config_component.xml database anyway).

I do appreciate the fact that a lot of work was done here to figure this out and make sure it worked. However, I'm also not seeing how this offers improvement on current behavior (other than a consistent naming convention).

@@ -216,7 +216,51 @@ OPTIONS
-version Echo the SVN tag name used to check out this CLM distribution.
-vichydro Toggle to turn on VIC hydrologic parameterizations (default is off)
This turns on the namelist variable: use_vichydro

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This long list of command line options highlights some issues with the current perl build-namelist. In general a user-interface for a script that has dozens of command line inputs is indicative of something that needs to be redesigned. But, really this is just pointing out the need for #585. So the CTSM group will be discussing this and come up with a plan for a better long term solution.

@@ -168,7 +168,7 @@ OPTIONS
(default is 0) (standard option with land-ice model is 10)
-glc_use_antarctica Set defaults appropriate for runs that include Antarctica
-help [or -h] Print usage to STDOUT.
-light_res <value> Resolution of lightning dataset to use for CN fire (360x720 or 94x192)
-light_res <value> Resolution of lightning dataset to use for CN fire (none or active)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A capability that is now removed with this PR is that you can't easily change the resolution for the lightning data between half-degree to T62. It does hardwire it so that T62 is done for older versions and half-degree for CLM5.2. And technically you can get the other if you look up the code and figure out what the file needs to be. So it's still possible -- just not as easy as flipping one number in CLM_BLDNML_OPTS.

I will check with CTSM folks if there is an importance in being able to easily change the resolution of lightning datasets. If so a way to do it would be with a CLM_LIGHT_RES xml variable.

@@ -283,57 +327,162 @@ sub process_commandline {
envxml_dir => ".",
vichydro => 0,
maxpft => "default",
);
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way another way of doing this and perhaps one that we should use here is to implement this similar to GLC_TWO_WAY_COUPLING and have build-namelist read the env_run.xml file directly, rather than inputting all of this in the command line.

my $fire_method = remove_leading_and_trailing_quotes( $nl->get_value('fire_method') );
if ( defined($fire_method) && $val ne "none" ) {
if ( $fire_method eq "nofire" ) {
$log->fatal_error("-$var option used with fire_method='nofire'. -$var can ONLY be used without the nofire option");
}
}
my $stream_fldfilename_lightng = remove_leading_and_trailing_quotes( $nl->get_value('stream_fldfilename_lightng') );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be taken out.

@@ -495,6 +644,10 @@ sub read_envxml_case_files {
}
foreach my $attr (keys %envxml) {
if ( $envxml{$attr} =~ m/\$/ ) {
my $subst = $nl->{'inputdata_rootdir'};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking this one over.. It's switching a $DIN_LOC_ROOT prefix to env variables that have the path to $DIN_LOC_ROOT hardcoded in the value. It might be better to construct the config_component.xml file to be this way from the get go.

@@ -161,6 +159,544 @@
used by expert users.</desc>
</entry>

<!-- ================================== -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A problem with moving the namelist defaults into config_component is that the namelist defaults can be reevaluated anytime that preview_namelist is rerun. However, the config_component can only be reevaluated when the case is remade or case.setup is rerun. So the values are more locked in than in the case with existing inside the namelist defaults.

<!-- NDEP stream functionality -->
<!-- ================================== -->

<entry id="CLM_STREAM_NDEP_YEAR_FIRST">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the stream xml values are being set to something that looks like it's being used when it actually may not be. LAI-streams and soil-moisture streams are default off for almost all cases. And some streams are only used for BGC type cases, so SP cases will set roughly 40 variables that look like they do something when they actually don't.

Minimally this needs to be made clear in the documentation. Better, would be to have them set to something like UNSET -- unless they actually are used. But, doing that well would also tie the settings to looking at the "-bgc" option that's part of the CLM_BLDNMML_OPTS variable, and I'm not sure that can be done.

</values>
<group>run_component_ctsm</group>
<file>env_run.xml</file>
<desc>Nitrogen deposition data year first</desc>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also require changes to the user's guide to explain the change in variables.

Simulation year that aligns with stream_year_first_ndep value
</entry>

<entry id="stream_fldfilename_ndep" type="char*256" category="datasets"
input_pathname="abs" group="ndepdyn_nml" valid_values="" >
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the input_pathname part shoudl be removed here. That is basically the way to indicate it's a file.

defined($nl->get_value('model_year_align_lightng')) ||
defined($nl->get_value('lightng_tintalgo' )) ||
defined($nl->get_value('stream_fldfilename_lightng')) ) {
$log->fatal_error("When bgc is SP (NOT CN or BGC or FATES) or fire is turned off none of: stream_year_first_lightng,\n" .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes a lot of the error checking which I really don't think should be done here. That ensures that the user isn't changing things that won't have any affect on their specific case. A real downside of this is that you can freely change the CLM_STREAMS_* settings for variables that would have no affect on your case and not know about it.

@ekluzek ekluzek added the tag: next this issue should get some attention in the next week or two label Jan 11, 2023
@billsacks
Copy link
Member

I talked about this yesterday with @mvertens . Based on this discussion, I think the best path forward would be to keep the streamlined and consolidated logic for how default values are set for the streams variables, but move those settings into namelist_defaults rather than making all of these streams variables appear as xml variables in config_component.xml. i.e., most streams variables would go back to just being namelist variables rather than being env_run xml variables. The main downside we see of this is that there would be inconsistency in how different stream variables are treated throughout CESM – some being handled as xml variables and others as namelist variables – but @ekluzek raises some good points about why there are downsides to having all of these variables appear in env_run.xml.

One piece of this that would take a bit of work might be getting the time period from the compset passed through to build-namelist so that it can be used to set the defaults for xml variables. Currently (i.e., before this PR) that was handled via use cases, but @mvertens and I would really like to move away from use cases for this purpose (and maybe more broadly), because that was a very confusing aspect of the original implementation. We imagine that it shouldn't be too hard to get the compset's time period to pass through to build-namelist, but we haven't looked into it carefully.

@mvertens feels that it makes sense to keep the ndep-related variables in xml, because that provides a stepping-stone and consistency with the move to having NDEP come from CAM... and since these will be removed soon anyway, it shouldn't matter too much what is done for them.

@mvertens also clarified that other component changes do not directly depend on this PR, but there is still some urgency on this to allow us to move to getting NDEP from CAM and to run cplhist-forced cases. But this can wait for a couple of weeks - e.g., until @mvertens is back from some upcoming travel.

My initial feeling is that it might be easiest to get to this compromise solution by starting over from master, rather than trying to back out the changes here... we would be guided heavily by the changes here, but my sense is that the compromise solution would end up backing out many of the changes in this PR. However, I haven't looked at this closely enough to know if that's really what would work best.

@ekluzek ekluzek added type: code cleanup improving internal code structure PR status: later PR: CTSM admin. team is NOT ready for this to come in at this time, but will later and removed priority: high High priority task to fix soon, e.g., because it is a problem in important configurations tag: next this issue should get some attention in the next week or two labels Jan 12, 2023
@ekluzek
Copy link
Contributor

ekluzek commented Jan 12, 2023

We discussed this at the CTSM software meeting this morning. I think we should do @billsacks recommendations, but will need to identify who is going to work on this. @mvertens is on travel right now, so we will wait until she is back from that and discuss with her. We've clarified that this work is not tied up in work that requires it, so there is flexibility on the timing for this.

@billsacks asks if we should start fresh with the proposed changes. I propose that when we get ready to work on this that we close this PR, but still start with this branch. Some of the changes here will be reverted, but other changes will retain @mvertens authorship on lines that are kept.

@ekluzek ekluzek marked this pull request as draft January 12, 2023 21:37
@mvertens mvertens closed this May 12, 2023
Upcoming tags automation moved this from In progress - master to Master Tags/Issues Done May 12, 2023
@mvertens mvertens deleted the feature/ndep_changes branch July 17, 2023 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR status: later PR: CTSM admin. team is NOT ready for this to come in at this time, but will later type: code cleanup improving internal code structure
Projects
Upcoming tags
Done (non release/external)
Development

Successfully merging this pull request may close these issues.

None yet

5 participants