Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forthcoming FATES satellite phenology ERS test fails #1485

Closed
glemieux opened this issue Sep 13, 2021 · 4 comments · Fixed by #1562
Closed

Forthcoming FATES satellite phenology ERS test fails #1485

glemieux opened this issue Sep 13, 2021 · 4 comments · Fixed by #1562
Labels
bug something is working incorrectly

Comments

@glemieux
Copy link
Collaborator

glemieux commented Sep 13, 2021

Brief summary of bug

In upcoming PR #1182, I've added two exact restart (ERS) regression tests for checking the FATES nocomp and sp modes. Both tests run, but the sp mode fail the COMPARE_base_rest check.

General bug information

CTSM version you are using: [output of git describe]: ctsm5.1.dev054 (merge into PR branch)

Does this bug cause significantly incorrect results in the model's science? [Yes / No]: No

Configurations affected: [Fill this in if known.]

Details of bug

As this is a new test the scienctific 'baseline' that we have been comparing against is: /glade/scratch/rfisher/archive/seb_CLM5-SPfates-def_rewindtest0

We've tried a number of modifications to the code to develop a hypothesis on what might be at issue here. The following result in a passing restart check, although they change science relative to expected values, but may be helpful in refining the direction of the investigation:

  1. If the SatellitePhenology input filter is reverted to match the dev054 master use of the nolakep filter
  2. If hlm_sp_tlai is hardcoded to 1.0 in the dynamics_driv procedure here

Important details of your setup / configuration so we can reproduce the bug

[Specify anything relevant: the compset, resolution, machine, compiler, any xml or namelist changes, etc. You don't have to repeat anything that you have already noted above.]

Important output or errors that show the problem

An example of the RMS values:

 RMS GPP                              1.2990E-15            NORMALIZED  9.3304E-11
 RMS GPP                              2.1805E-15            NORMALIZED  1.5623E-10
 RMS NPP                              2.9692E-15            NORMALIZED  2.3944E-10
 RMS NPP_BY_AGE                       1.1223E-15            NORMALIZED  6.3350E-10
 RMS GPP                              4.2219E-15            NORMALIZED  2.9210E-10
 RMS NPP                              2.9692E-15            NORMALIZED  2.3321E-10
 RMS NPP_BY_AGE                       1.1223E-15            NORMALIZED  6.1702E-10
 RMS GPP                              8.6293E-15            NORMALIZED  5.8525E-10
 RMS NPP                              8.9077E-15            NORMALIZED  6.8931E-10
 RMS NPP_BY_AGE                       3.3668E-15            NORMALIZED  1.8237E-09
 RMS GPP                              1.3454E-14            NORMALIZED  9.1870E-10
 RMS NEP                              2.6937E-14            NORMALIZED  1.5189E-09
 RMS NPP                              1.4846E-14            NORMALIZED  1.1480E-09
 RMS NPP_BY_AGE                       5.6113E-15            NORMALIZED  3.0373E-09
 RMS GPP                              5.7336E-14            NORMALIZED  3.9440E-09
 RMS NEP                              6.8923E-14            NORMALIZED  3.9319E-09
 RMS NPP                              5.6746E-14            NORMALIZED  4.4236E-09
 RMS NPP_BY_AGE                       2.1448E-14            NORMALIZED  1.1704E-08
 RMS GPP                              7.8817E-14            NORMALIZED  5.3616E-09
 RMS NEP                              1.0230E-13            NORMALIZED  5.7934E-09
 RMS NPP                              7.9399E-14            NORMALIZED  6.1155E-09
 RMS NPP_BY_AGE                       3.0010E-14            NORMALIZED  1.6180E-08
 RMS BTRAN                            4.1001E-08            NORMALIZED  2.0020E-07
 RMS GPP                              6.2544E-14            NORMALIZED  4.2769E-09
 RMS H2OSOI                           8.6843E-09            NORMALIZED  5.3278E-08
 RMS HR                               3.7262E-12            NORMALIZED  2.6965E-06
 RMS NEP                              4.8798E-12            NORMALIZED  2.7815E-07
 RMS NPP                              8.8459E-14            NORMALIZED  6.8576E-09
 RMS NPP_BY_AGE                       3.3434E-14            NORMALIZED  1.8143E-08
 RMS T_SCALAR                         1.9021E-07            NORMALIZED  5.1479E-07
 RMS BTRAN                            6.8671E-07            NORMALIZED  3.3903E-06
 RMS GPP                              1.2209E-13            NORMALIZED  8.5802E-09
 RMS H2OSOI                           1.9069E-07            NORMALIZED  1.1676E-06
 RMS HR                               4.9060E-11            NORMALIZED  3.5217E-05
 RMS NEP                              6.0937E-11            NORMALIZED  3.5792E-06
 RMS NPP                              1.3527E-12            NORMALIZED  1.0798E-07
 RMS NPP_BY_AGE                       5.1125E-13            NORMALIZED  2.8570E-07
 RMS TOTSOMC                          2.2544E-06            NORMALIZED  3.7139E-09
 RMS T_SCALAR                         2.8031E-06            NORMALIZED  7.5689E-06

Screenshot of the comparison(right) between base (left) and rest(middle) output files for GPP:
Screenshot from 2021-09-13 13-52-27

It should be noted that TLAI is reported in the output and passes the restart.

@billsacks billsacks added the bug something is working incorrectly label Sep 13, 2021
@billsacks
Copy link
Member

It should be noted that TLAI is reported in the output and passes the restart.

The history file variable would give the value at the end of the time step. Is it possible that there is some use of TLAI in the first time step, before it is read in? i.e., could there be some sequence like this?:

  • Use current value of TLAI
  • Read TLAI
  • Update history variables

@glemieux glemieux mentioned this issue Sep 16, 2021
5 tasks
@glemieux glemieux changed the title Forthcoming FATES satellite phenology ERS test fails COMPARE_base_rest Forthcoming FATES satellite phenology ERS test fails Nov 29, 2021
@glemieux
Copy link
Collaborator Author

This test has been failing on the RUN soon after the restart for a few tags. The order of magnitude of the differences appears to be the same as with the COMPARE_base_rest differences reported previously. After running various combinations of ctsm and fates tags, I determined that this test switched from a COMPARE_base_rest failure mode to RUN with fates tag sci.1.47.1_api.17.0.0.

Talking with @rgknox and reviewing the changes, our hypothesis is that the change NGEET/fates@343ab78 in update_hlm_dynamics is causing the RUN failure. I'm going to attempt to add a logic check for use_fates_sp to this part of the code to avoid updating the treelai and similar variables to test the hypothesis.

@glemieux
Copy link
Collaborator Author

glemieux commented Nov 30, 2021

Talking with @rgknox and reviewing the changes, our hypothesis is that the change NGEET/fates@343ab78 in update_hlm_dynamics is causing the RUN failure. I'm going to attempt to add a logic check for use_fates_sp to this part of the code to avoid updating the treelai and similar variables to test the hypothesis.

Adding the above logic check, ended up working to avoid the RUN failure as hypothesized. The test now results in the previously noted COMPARE_base_rest failure.

One thing I hadn't noted previously is that the comparison is also failing on coupler output as well as fates output. I'm not sure how these variables are updated yet or if it is necessarily illuminating of the issue at hand:

 RMS l2x_Sl_tref                      4.5231E-07            NORMALIZED  1.6989E-09
 RMS l2x_Sl_qref                      5.3033E-12            NORMALIZED  1.1714E-09
 RMS l2x_Sl_t                         1.0785E-06            NORMALIZED  4.0713E-09
 RMS l2x_Sl_snowh                     2.0129E-11            NORMALIZED  3.4402E-10
 RMS l2x_Sl_tsrf00                    1.4686E-05            NORMALIZED  5.4073E-08
 RMS l2x_Fall_lat                     5.0102E-08            NORMALIZED  4.6301E-09
 RMS l2x_Fall_sen                     9.2369E-06            NORMALIZED  4.1709E-07
 RMS l2x_Fall_lwup                    3.2997E-06            NORMALIZED  1.1303E-08
 RMS l2x_Fall_evap                    1.7829E-14            NORMALIZED  4.1769E-09

@glemieux
Copy link
Collaborator Author

Update: After isolating the output to a single site in the high northern latitudes (as seen in the GPP plot above), I isolated the issue down to a problem in which some of the pfts that are coming in are getting their cohort level vcmax25top, jmax25top, tpu25top, and kp25top values set to zero in this check:
https://github.com/NGEET/fates/blob/b27149270b459fc2518dc2a5bacfc48ccd256528/biogeochem/EDCohortDynamicsMod.F90#L1995-L2021

It appears that sum(frac_leaf_aclass(1:nleafage) is evaulating as zero here for some of those pfts immediately after the restart. I'm not sure why this would be yet, but @ckoven had me test out replacing the assignment to zero, with the relevat, non-scaled values, and this results in the ERS passing COMPARE_base_rest for sp mode.

I've tested this out on the full fates suite comparing against the latest fates dev tag, and everything expected passes, although most of the non-FatesColdDef test mods have DIFFs against the baseline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something is working incorrectly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants