Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C13_HR, C13_NBP, FPI values result in numeric conversion not representable error #741

Closed
olyson opened this issue May 31, 2019 · 22 comments · Fixed by #883
Closed

C13_HR, C13_NBP, FPI values result in numeric conversion not representable error #741

olyson opened this issue May 31, 2019 · 22 comments · Fixed by #883
Assignees
Labels
tag: bug - impacts science bug causing incorrect science results type: bug something is working incorrectly

Comments

@olyson
Copy link
Contributor

olyson commented May 31, 2019

Brief summary of bug

C13_HR, C13_NBP, FPI values result in numeric conversion not representable error

General bug information

CTSM version you are using: relcisofix.n01_release-clm5.0.20

Does this bug cause significantly incorrect results in the model's science? Yes

Configurations affected: --compset BSSP585_BPRPcmip6 --res f09_g17

Details of bug

Original message from Keith Lindsay:

I'm running the esm-ssp585 CESM2 CMIP6 experiment.
The model is aborting when writing the monthly clm history files for 2046-01 with the error message
NetCDF: Numeric conversion not representable

Suspecting that the problem is that CLM is trying to write an r8 value that is outside of the range of r4 into a netCDF float(=r4),
I've rerun the last segment with hist_ndens=1, to get r8 output.

Sure enough, the run went past where the original run aborted, and there are 3 variables with values outside the range of r4 values.

All of the out of range values are at the same location:
(i,j)=(40,168) (lon,lat)=(50E,68.3N)
The variables and excessive values are:
C13_HR: -6.8144E+46
C13_NBP: 6.8144E+46
FPI: -1.4824+100

The largest r4 value is ~3.4028E+38
The following vars are also unusually large, but not outside r4 limits
C14_HR:-2.9915E+36
C14_NBP: 2.9915E+36

Values for HR and NBP at the same location are -1.837E-41 and -9.452E-18 respectively.
pretty small

Any suggestions on what I should do to further diagnose this and/or work around it?

The original case is
CASE=b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001
CASEROOT=/glade/work/cmip6/cases/C4MIP/b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001
RUNDIR=/glade/scratch/cmip6/b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001/run

My clone, where I changed hist_ndens and reran, is
CASE=b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001
CASEROOT=/glade/scratch/klindsay/temp_caseroots/b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001
RUNDIR=/glade/scratch/klindsay/b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001/run

@olyson olyson added the tag: bug - impacts science bug causing incorrect science results label May 31, 2019
@olyson olyson self-assigned this May 31, 2019
@olyson
Copy link
Contributor Author

olyson commented May 31, 2019

Keith O and I have a hunch that this is likely caused by a very small inorganic N pool size (likely SMIN_NH4) that results in the negative FPI value you're receiving.

There's a limiter already in the code to make sure we're not dividing by a negative or zero value, but nothing for the numerator

Try adding the underlined line to the end of SoilBiogeochemCompetitionMod.F90 [line 935], and the appropriate end if

This should hopefully get you past the error, and also allow some additional history files to be written out that we can look at.

        if (potential_immob(c) > 0.0_r8) then
        if (actual_immob(c) > 0.0_r8) then
           fpi(c) = actual_immob(c) / potential_immob(c)
       else

@olyson
Copy link
Contributor Author

olyson commented May 31, 2019

Will (and others),

Thanks for taking a look. I applied the patch below to the clone of my case with hist_ndens=1 and reran it.

/gpfs/u/home/cmip6/cesm_tags/BSSP585_BPRP_n001/cime/../components/clm/src/soilbiogeochem/SoilBiogeochemCompetitionMod.F90	2019-05-15 13:16:00.935004097 -0600
+++ SourceMods/src.clm/SoilBiogeochemCompetitionMod.F90	2019-05-20 15:06:26.059764128 -0600
@@ -933,7 +933,11 @@ subroutine SoilBiogeochemCompetition (bo
             end if
 
             if (potential_immob(c) > 0.0_r8) then
-               fpi(c) = actual_immob(c) / potential_immob(c)
+               if (actual_immob(c) > 0.0_r8) then
+                  fpi(c) = actual_immob(c) / potential_immob(c)
+               else
+                  fpi(c) = 0._r8
+               end if
             else
                fpi(c) = 1._r8
             end if

The good news is that the patch did get rid of the large FPI value, and all other fields were unchanged.

The bad news is that the large magnitude values in the variables C13_HR, C13_NBP, C14_HR, and C14_NBP,
at the same grid point, are still present. The C13_HR and C13_NBP values are outside the range of r4,
so I would not be able to run with hist_ndens=2.

CASE=b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001
CASEROOT=/glade/scratch/klindsay/temp_caseroots/b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001
RUNDIR=/glade/scratch/klindsay/b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001/run

I extracted all clm2.h0 values from the problem child gridpoint into
$RUNDIR/b.e21.BSSP585_BPRPcmip6.f09_g17.CMIP6-esm-ssp585.001.clm2.h0.2046-01.pt.nc
Some relevant values are:
C13_HR =
-6.81443291122197e+46 ;
HR =
-1.83704239242707e-41 ;
LITTERC_HR =
-1.83704239242707e-41 ;
HR_vr =
0,
0,
0,
0,
3.33734231432754e-144,
1.57498056441182e-144,
5.9121634920142e-145,
1.93438605151145e-145,
4.49655604321222e-146,
8.09904191108604e-147,
1.13762985262988e-147,
1.2568091653275e-148,
1.09278960068717e-149,
6.76582666436599e-151,
2.34741439808028e-152,
4.57370983130404e-154,
5.08349126820701e-156,
3.40453874087456e-158,
-1.76638691579526e-41,
0 ;

It kinda looks like the decomp cascade is somehow producing small magnitude negative HR at depth.

It looks like the computation of isotopic HR occurs in the block lines 480-495 of CNCIsoFluxMod.F90

Perhaps the (isostate / gross state) ratio is large, despite being a ratio of small numbers,
and this is leading to large magnitude negative iso HR.

What are your thoughts on adding (soilbiogeochem_cf%decomp_cascade_hr_vr_col(cc,j,l) > 0._r8)
to the conditional in line 485 of CNCIsoFluxMod.F90, and maybe line 502 as well?

Or does it make more sense to try to eliminate the negative HR value?

@olyson
Copy link
Contributor Author

olyson commented May 31, 2019

I sent an email just to Will and Keith O at the same moment that Keith L sent his. I will repeat it here for the benefit of the whole list, though it repeats some of what Keith L just said:

The possibility that we saw in our software meeting was that decomp_cpools_vr_col could be very close to 0 in this block of code in CNCIsoFluxMod:

           if ( soilbiogeochem_cs%decomp_cpools_vr_col(cc,j,cdp) /= 0._r8) then
              iso_soilbiogeochem_cf%decomp_cascade_hr_vr_col(cc,j,l)  =  &
                  soilbiogeochem_cf%decomp_cascade_hr_vr_col(cc,j,l) * &
                  (iso_soilbiogeochem_cs%decomp_cpools_vr_col(cc,j,cdp) &
                     / soilbiogeochem_cs%decomp_cpools_vr_col(cc,j,cdp)) * 1._r8
           else
              iso_soilbiogeochem_cf%decomp_cascade_hr_vr_col(cc,j,l) = 0._r8
           end if

Possibly what's happening is: one of the cpools is being updated to be close to 0 (when it should probably be exactly 0), and truncation of this near-zero value doesn't happen until later.

One possible place where I see an update of decomp_cpools_vr_col without an associated truncation is in CStateUpdateDynPatch.

Bill

@olyson
Copy link
Contributor Author

olyson commented May 31, 2019

Hi,

Before testing the mod to CNCIsoFluxMod.F90, I set 'hist_nhtfrq(1) = 1' to get output every timestep.
The large magnitude values for C13_HR, C13_NBP, C14_HR, and C14_NBP occur in the first timestep of the year.
I think this is consistent with Bill's highlighting CStateUpdateDynPatch as a possible culprit.

I put in the following patch, and the model no longer generates large magnitude values in the first timestep for
C13_HR, C13_NBP, C14_HR, and C14_NBP

Despite changing these quantities in the first timestep of the year, there were no changes to values in subsequent timesteps.

/gpfs/u/home/cmip6/cesm_tags/BSSP585_BPRP_n001/cime/../components/clm/src/biogeochem/CNCIsoFluxMod.F90	2019-05-15 13:16:00.034407733 -0600
+++ SourceMods/src.clm/CNCIsoFluxMod.F90	2019-05-20 19:42:44.939207535 -0600
@@ -482,7 +482,8 @@ subroutine CIsoFlux1(num_soilc, filter_s
          do j = 1, nlevdecomp
             do l = 1, ndecomp_cascade_transitions
                cdp = cascade_donor_pool(l)
-               if ( soilbiogeochem_cs%decomp_cpools_vr_col(cc,j,cdp) /= 0._r8) then
+               if ( soilbiogeochem_cs%decomp_cpools_vr_col(cc,j,cdp) /= 0._r8 .and. &
+                    soilbiogeochem_cf%decomp_cascade_hr_vr_col(cc,j,l) > 0._r8) then
                   iso_soilbiogeochem_cf%decomp_cascade_hr_vr_col(cc,j,l)  =  &
                       soilbiogeochem_cf%decomp_cascade_hr_vr_col(cc,j,l) * &
                       (iso_soilbiogeochem_cs%decomp_cpools_vr_col(cc,j,cdp) &
@@ -499,7 +500,8 @@ subroutine CIsoFlux1(num_soilc, filter_s
          do j = 1, nlevdecomp
             do l = 1, ndecomp_cascade_transitions
                cdp = cascade_donor_pool(l)
-               if ( soilbiogeochem_cs%decomp_cpools_vr_col(cc,j,cdp) /= 0._r8) then
+               if ( soilbiogeochem_cs%decomp_cpools_vr_col(cc,j,cdp) /= 0._r8 .and. &
+                    soilbiogeochem_cf%decomp_cascade_ctransfer_vr_col(cc,j,l) > 0._r8) then
                   iso_soilbiogeochem_cf%decomp_cascade_ctransfer_vr_col(cc,j,l)  =  &
                       soilbiogeochem_cf%decomp_cascade_ctransfer_vr_col(cc,j,l) * &
                       (iso_soilbiogeochem_cs%decomp_cpools_vr_col(cc,j,cdp) &

I tried removing the patch to SoilBiogeochemCompetitionMod.F90, but the large FPI returned.
So I think that patch is needed.

Keith

@olyson
Copy link
Contributor Author

olyson commented May 31, 2019

FYI, I reran my test without changing line 502 of CNCIsoFluxMod.F90, and the large values are still gone from the history file.

@billsacks billsacks added the type: bug something is working incorrectly label Jun 1, 2019
@olyson
Copy link
Contributor Author

olyson commented Jun 4, 2019

I've cloned Keith's case to do some investigation.
On the first time step of the year, I see numerous small negative values of state variables

cs_soil%decomp_cpools_vr_col(c,j,i_met_lit)
cs_soil%decomp_cpools_vr_col(c,j,i_cel_lit)
cs_soil%decomp_cpools_vr_col(c,j,i_lig_lit)
cs_soil%decomp_cpools_vr_col(c,j,i_cw

These variables are zero initially and then become small negative because of small negative values of the fluxes:

cf_veg%dwt_frootc_to_litr_met_c_col(c,j)
cf_veg%dwt_frootc_to_litr_cel_c_col(c,j)
cf_veg%dwt_frootc_to_litr_lig_c_col(c,j)
cf_veg%dwt_livecrootc_to_cwdc_col(c,j)
cf_veg%dwt_deadcrootc_to_cwdc_col(c,j

The state variables don't seem to be truncated until much later on in the calling sequence.
I put in a call to SoilBiogeochemPrecisionControl in CNVegetationFacade.F90 within subroutine DynamicAreaConservation here:

call NStateUpdateDynPatch(bounds, num_soilc_with_inactive, filter_soilc_with_inactive, &
     this%cnveg_nitrogenflux_inst, this%cnveg_nitrogenstate_inst, &
     soilbiogeochem_nitrogenstate_inst)
call t_stopf('CNUpdateDynPatch')

!KO
call t_startf('SoilBiogeochemPrecisionControl')
call SoilBiogeochemPrecisionControl(num_soilc_with_inactive, filter_soilc_with_inactive, &
soilbiogeochem_carbonstate_inst, c13_soilbiogeochem_carbonstate_inst, &
c14_soilbiogeochem_carbonstate_inst,soilbiogeochem_nitrogenstate_inst)
call t_stopf('SoilBiogeochemPrecisionControl')
!KO

call t_startf('dyn_cnbal_col')
call dyn_cnbal_col(bounds, clump_index, column_state_updater, &
     soilbiogeochem_carbonstate_inst, c13_soilbiogeochem_carbonstate_inst, &
     c14_soilbiogeochem_carbonstate_inst, soilbiogeochem_nitrogenstate_inst, &
     ch4_inst)
call t_stopf('dyn_cnbal_col')

This seemed to get rid of large values of C13_HR, C13_NBP, C14_HR, C14_NBP, FPI, small negative HR_vr at depth, and small negative values of HR and NBP. And I was able to run with hist_ndens=2 (r4).

But, at this point I don't understand why the fluxes are small negative and whether that is expected.

@billsacks
Copy link
Member

From discussion today:

  • Look at which term is negative in:
          ! fine root litter carbon fluxes
          cnveg_carbonflux_inst%dwt_frootc_to_litr_met_c_col(c,j) = &
               cnveg_carbonflux_inst%dwt_frootc_to_litr_met_c_col(c,j) + &
               (dwt_frootc_to_litter(p)*pftcon%fr_flab(patch%itype(p)))/dt &
               * soilbiogeochem_state_inst%froot_prof_patch(p,j)

Is it dwt_frootc_to_litter or froot_prof_patch?

  • If the culprit is dwt_frootc_to_litter, then is the issue that frootc_patch is negative going into update_patch_state (called from DynamicPatchAdjustments in CNVegCarbonStateType)? Or is there something else causing this?

@ekluzek
Copy link
Contributor

ekluzek commented Jul 9, 2019

Keith Lindsay reports that the exact change he ran with that works is as follows:

-- /gpfs/u/home/cmip6/cesm_tags/BSSP585_BPRP_n001/components/clm/src/soilbiogeochem/SoilBiogeochemCompetitionMod.F90	2019-05-15 13:16:00.935004097 -0600
+++ ./SoilBiogeochemCompetitionMod.F90	2019-05-21 17:23:16.623107000 -0600
@@ -933,7 +933,11 @@ subroutine SoilBiogeochemCompetition (bo
             end if
 
             if (potential_immob(c) > 0.0_r8) then
-               fpi(c) = actual_immob(c) / potential_immob(c)
+               if (actual_immob(c) > 0.0_r8) then
+                  fpi(c) = actual_immob(c) / potential_immob(c)
+               else
+                  fpi(c) = 0._r8
+               end if
             else
                fpi(c) = 1._r8
             end if

@olyson
Copy link
Contributor Author

olyson commented Jul 12, 2019

After spending several hours looking at the wrong variety of carbon, I found that dwt_frootc_to_litter is the culprit. So next I'll start by looking at frootc_patch.

@olyson
Copy link
Contributor Author

olyson commented Jul 13, 2019

frootc_patch is negative going into update_patch_state.
That exact negative value is actually in the restart file and is read in (as frootc_13).
And in restart files prior, and in the initial file that was used for the simulation:
b.e21.BHIST_BPRP.f09_g17.CMIP6-esm-hist.002.clm2.r.2015-01-01-00000.nc

Would it make sense that it is an inactive patch (it does seem to be only one pft on the suspect column) and it becomes active 2046-01-01? Alternative hypothesis anyone?

@billsacks
Copy link
Member

Thanks for digging, @olyson ! I can't tell from your last comment: are you saying that it is an inactive patch, or is that just a hypothesis? If the latter, you should be able to check pfts1d_active on the restart file (along with other possibly helpful things like pfts1d_wtcol, pfts1d_wtlnd, pfts1d_wtxy, pfts1d_itypveg). (Or, if you prefer, you can give me enough information that I can find the restart file and the right point on it, and I'd be happy to poke around at some of these things myself.)

It doesn't really make sense to me that this would be an inactive point going in to the land use change, because I think this dwt term is generated from a shrinking patch – so, presumably, one that had non-zero area to begin with (though it's possible that a patch could be shrinking on its column while the column itself has 0 area... I'd need to think more about that). If you do find that the patch was inactive on the restart file, I can help give this some more thought.

@olyson
Copy link
Contributor Author

olyson commented Jul 13, 2019

Just a hypothesis. I'll check the restart file.
I was just trying to think about why, if the patch had the same negative value for frootc_patch since the beginning of the simulation, it suddenly caused problems at year 2046.

@olyson
Copy link
Contributor Author

olyson commented Jul 17, 2019

Per Bill's suggestion I found the correct pft number in the restart file (I couldn't get GetGlobalIndex to work because of error "calling within a threaded region", but I narrowed it down using lat/lon, pft type, etc) and found that it did have that exact negative value for frootc_c13 (and it's active). I had been using the local pft number.

Then, looking at the code, I realized that we don't do precision control for frootc:
prec_control_for_froot = .not. use_nguardrail
and use_nguardrail=.true. for this simulation.
So, given that there are hundreds of pfts with negative frootc in the restart file, I've abandoned this line of inquiry.

The other thing I investigated was I went back to the fpi problem that Will and I suggested a fix for. I found that there was also a large negative value for fpi_vr for the original problem column noted by Keith L. that comes from a large negative value for fpi_nh4_vr. I applied a similar fix to the calculation for fpi_nh4_vr:

              if (potential_immob_vr(c,j) > 0.0_r8) then
                 if (actual_immob_nh4_vr(c,j) > 0.0_r8) then
                   fpi_nh4_vr(c,j) = actual_immob_nh4_vr(c,j) / potential_immob_vr(c,j)
                 else
                   fpi_nh4_vr(c,j) = 0.0_r8
                 end if
              else
                 fpi_nh4_vr(c,j) = 0.0_r8
              end if

Along with the other fix to fpi (and not using the fix to CNCIsoFluxMod.F90), that allowed the model to run to the end of the month and put out a history file in single precision.
However, this did not work for my clone of Simone's case (b.e21.BWSSP370cmip6.f09_g17.CMIP6-SSP3-7.0-WACCM.003).

Since these approaches are all some form of precision control anyway, I tried my original approach of calling SoilBiogeochemPrecisionControl (with no other code modifications), and this worked for Simone's case (worked previously for Keith L.'s case as well).

@billsacks
Copy link
Member

Thanks a lot, @olyson !

So, for the dwt flux issue, is this summary correct? (I have less understanding of the fpi issue): frootc can sometimes be negative; this is intentional. Negative frootc causes negative dwt_frootc_to_litter if the patch in question is shrinking. The resulting negative fluxes cause problems in the ciso calculation. This can be worked around by inserting an extra precision control call between the calculation of the dwt fluxes and the ciso fluxes, so that small negative dwt fluxes are set to 0.

I'm okay with fixing this by putting in this extra call to precision control, as long as there's a comment explaining why this is needed. I'm thinking: For the sake of code maintenance, it could be helpful to know which precision control calls are needed for broad purposes, and which are just needed for very select reasons. Eventually we could consider removing the latter, either fixing the root cause of the problem or just doing something like precision control on the small number of variables that need it at that point.

@olyson
Copy link
Contributor Author

olyson commented Jul 17, 2019

That sums it up nicely. I'm not sure of the cause of the fpi issue, if it's related to the negative fluxes, or why the precision control call fixes it. Probably worth further investigation.

@olyson
Copy link
Contributor Author

olyson commented Jul 17, 2019

FYI, I've now got GetGlobalIndex working and confirmed that I'm looking at the right /pft/column/etc.

@olyson
Copy link
Contributor Author

olyson commented Jul 19, 2019

After talking with Dave and Bill, we agreed to go with the added call to SoilBiogeochemPrecisionControl in CNVegetationFacade.F90 within subroutine DynamicAreaConservation to solve this issue. The SoilBiogeochemPrecisionControl performs precision control on the C12, C13, and C14 decomp_cpools_vr_col.
The diffs are as follows:

image

I have not done any testing on this other than verifying it works on Keith L.'s case and Simone's case as described above.

@olyson
Copy link
Contributor Author

olyson commented Jul 19, 2019

diff --git a/glade/u/home/cmip6/cesm_tags/BSSP585_BPRP_n001/components/clm/src/biogeochem/CNVegetationFacade.F90 b/CNVegetationFacade.F90
index fed7889..c3b4220 100644
--- a/glade/u/home/cmip6/cesm_tags/BSSP585_BPRP_n001/components/clm/src/biogeochem/CNVegetationFacade.F90
+++ b/CNVegetationFacade.F90
@@ -721,6 +721,13 @@ contains
          soilbiogeochem_nitrogenstate_inst)
     call t_stopf('CNUpdateDynPatch')

+    ! This call fixes issue #741 by performing precision control on decomp_cpools_vr_col
+    call t_startf('SoilBiogeochemPrecisionControl')
+    call SoilBiogeochemPrecisionControl(num_soilc_with_inactive, filter_soilc_with_inactive,  &
+         soilbiogeochem_carbonstate_inst, c13_soilbiogeochem_carbonstate_inst, &
+         c14_soilbiogeochem_carbonstate_inst,soilbiogeochem_nitrogenstate_inst)
+    call t_stopf('SoilBiogeochemPrecisionControl')
+
     call t_startf('dyn_cnbal_col')
     call dyn_cnbal_col(bounds, clump_index, column_state_updater, &
          soilbiogeochem_carbonstate_inst, c13_soilbiogeochem_carbonstate_inst, &

billsacks pushed a commit to billsacks/ctsm that referenced this issue Jul 19, 2019
When running a transient case with C isotopes, people occasionally ran
into a problem whereby C13_HR, C13_NBP, FPI values result in numeric
conversion not representable error.

At least part of the problem can be explained as: frootc can sometimes
be negative; this is intentional. Negative frootc causes negative
dwt_frootc_to_litter if the patch in question is shrinking. The
resulting negative fluxes cause problems in the ciso calculation. This
can be worked around by inserting an extra precision control call
between the calculation of the dwt fluxes and the ciso fluxes, so that
small negative dwt fluxes are set to 0.

For more details, see ESCOMP#741

Resolves ESCOMP#741
@billsacks
Copy link
Member

@olyson - I have run the test suite off of the release branch. All tests are passing with this change, but we're getting answer changes. As is often the case, it's hard to tell definitively from the test suite whether these answer changes are scientifically meaningful or are essentially roundoff level.

Transient cases show answer changes, as expected. In addition:

  • Two present-day tests that specify changes in soil layer structure (ERI_D_Ld9.ne30_g16.I2000Clm50BgcCruGs.cheyenne_intel.clm-vrtlay and ERS_D_Ld3.f10_f10_musgs.I2000Clm50BgcCruGs.cheyenne_intel.clm-deepsoil_bedrock), have changes in many fields (for the latter: mostly looks roundoff-level, but a few greater than roundoff-level diffs)
  • The present-day ciso test, ERP_D_Ld5.f10_f10_musgs.I2000Clm50BgcCruGs.cheyenne_gnu.clm-ciso_flexCN_FUN, has changes in a few c13 and c14 fields
  • There are changes in many fields in SSP tests; I don't understand these tests well enough to know if this is expected
  • ERP_D_P36x2_Ld3.f10_f10_musgs.I2000Clm50BgcCruGs.cheyenne_intel.clm-noFUN_flexCN (and a similar test on hobart) have roundoff-level changes in SMINN_vr, SMIN_NH4_vr and SMIN_NO3_vr (I'm not concerned about this)

@billsacks
Copy link
Member

@olyson I should add: I'm not too concerned about the diffs in non-transient cases: with this extra precision control call, it's not too surprising that we're getting answer changes in a variety of situations. So I think that, if you verify that behavior looks okay (around roundoff-level diffs) in a transient case, like we discussed on Friday, then that's probably sufficient to sign off on these changes.

@olyson
Copy link
Contributor Author

olyson commented Jul 29, 2019

I've complete a full transient simulation with the proposed fix. Diagnostics compared to a control are here:

http://webext.cgd.ucar.edu/I20TR/clm50_cesm20R_2deg_GSWP3V1_issue741_hist/lnd/clm50_cesm20R_2deg_GSWP3V1_issue741_hist.1995_2014-clm50_cesm20R_2deg_GSWP3V1_hist.1995_2014/setsIndex.html

I don't see any scientifically meaningful differences between these simulations, but happy to discuss with others that want to take a look.

billsacks added a commit that referenced this issue Jul 30, 2019
Add CN prec. control call to fix problems related to small neg. values

Small negative values (roughly roundoff-level different from zero) in
frootc (and possibly other quantities) were occasionally creating
problems with carbon isotope fluxes and FPI in the first time step of
the year, at the time of transient landcover change. This tag fixes the
problem by introducing an extra call to SoilBiogeochemPrecisionControl
in between computing the patch-level transient landcover fluxes and
moving these to column-level. In particular, this truncates small
negative values of decomp_cpools_vr_col to zero, which prevents the
previous blow-ups.

For most of the problematic fields, the explanation seems to be: frootc
can sometimes be negative; this is intentional. Negative frootc causes
negative dwt_frootc_to_litter if the patch in question is shrinking. The
resulting negative fluxes cause problems in the ciso calculation. This
can be worked around by inserting an extra precision control call
between the calculation of the dwt fluxes and the ciso fluxes, so that
small negative dwt fluxes are set to 0.

This does not necessarily fully explain the issue with FPI, but the
insertion of the extra precision control call fixes that issue, too.

For more details, see the discussion in
#741

Resolves #741
@billsacks
Copy link
Member

Fixed on the release branch in release-clm5.0.26. This still needs to come to master; this will be done in the big batch of tags that @ekluzek is bringing from the release branch to master soon.

ekluzek added a commit to ekluzek/CTSM that referenced this issue Aug 8, 2019
Add CN prec. control call to fix problems related to small neg. values

Small negative values (roughly roundoff-level different from zero) in
frootc (and possibly other quantities) were occasionally creating
problems with carbon isotope fluxes and FPI in the first time step of
the year, at the time of transient landcover change. This tag fixes the
problem by introducing an extra call to SoilBiogeochemPrecisionControl
in between computing the patch-level transient landcover fluxes and
moving these to column-level. In particular, this truncates small
negative values of decomp_cpools_vr_col to zero, which prevents the
previous blow-ups.

For most of the problematic fields, the explanation seems to be: frootc
can sometimes be negative; this is intentional. Negative frootc causes
negative dwt_frootc_to_litter if the patch in question is shrinking. The
resulting negative fluxes cause problems in the ciso calculation. This
can be worked around by inserting an extra precision control call
between the calculation of the dwt fluxes and the ciso fluxes, so that
small negative dwt fluxes are set to 0.

This does not necessarily fully explain the issue with FPI, but the
insertion of the extra precision control call fixes that issue, too.

For more details, see the discussion in
ESCOMP#741

Resolves ESCOMP#741
fritzt pushed a commit to CESM-GC/CTSM that referenced this issue May 19, 2021
When running a transient case with C isotopes, people occasionally ran
into a problem whereby C13_HR, C13_NBP, FPI values result in numeric
conversion not representable error.

At least part of the problem can be explained as: frootc can sometimes
be negative; this is intentional. Negative frootc causes negative
dwt_frootc_to_litter if the patch in question is shrinking. The
resulting negative fluxes cause problems in the ciso calculation. This
can be worked around by inserting an extra precision control call
between the calculation of the dwt fluxes and the ciso fluxes, so that
small negative dwt fluxes are set to 0.

For more details, see ESCOMP#741

Resolves ESCOMP#741
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tag: bug - impacts science bug causing incorrect science results type: bug something is working incorrectly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants