Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ice shelf wetland fix in mksurfdata_map can lead to glacier+lake > 100% on surface datasets #673

Closed
ekluzek opened this issue Apr 3, 2019 · 16 comments · Fixed by #883
Closed
Assignees
Labels
type: bug something is working incorrectly

Comments

@ekluzek
Copy link
Contributor

ekluzek commented Apr 3, 2019

Brief summary of bug

Creating a conus_30_x8 surface dataset and then trying to run with it doesn't work.

General bug information

**CTSM version you are using:**release-clm5.0.20

Does this bug cause significantly incorrect results in the model's science? No

Configurations affected: conus_30_x8

Details of bug

Important details of your setup / configuration so we can reproduce the bug

Setup a case for ne30...

SMS_Ld1_D.ne30_g17.I1850Clm50BgcCrop.cheyenne_intel.clm-default

That case will fail because it needs a new ne30np4 surface dataset. Pointing to a new one it then works.

Here's some changes to get it setup for the new grid..

./xmlchange LND2GLC_FMAPNAME="$DIN_LOC_ROOT/cpl/gridmaps/ne0np4CONUS.ne30x8/map_ne0CONUSne30x8_TO_gland4km_aave.190227.nc"
./xmlchange LND2GLC_SMAPNAME="$DIN_LOC_ROOT/cpl/gridmaps/ne0np4CONUS.ne30x8/map_ne0CONUSne30x8_TO_gland4km_blin.190227.nc"
./xmlchange GLC2LND_SMAPNAME="$DIN_LOC_ROOT/cpl/gridmaps/gland4km/map_gland4km_TO_ne0CONUSne30x8_aave.190227.nc"
./xmlchange GLC2LND_FMAPNAME="$DIN_LOC_ROOT/cpl/gridmaps/gland4km/map_gland4km_TO_ne0CONUSne30x8_aave.190227.nc"
./xmlchange LND2ROF_FMAPNAME="$DIN_LOC_ROOT/lnd/clm2/mappingdata/maps/ne0np4CONUS.ne30x8/map_ne0CONUSne30x8_TO_0.5x0.5_nomask_aave.190227.nc"
./xmlchange ROF2LND_FMAPNAME="$DIN_LOC_ROOT/lnd/clm2/mappingdata/maps/ne0np4CONUS.ne30x8/map_0.5x0.5_nomask_TO_ne0CONUSne30x8_aave.190227.nc"
./xmlchange ATM_DOMAIN_FILE="domain.lnd.conus_30_x8_gx1v6.170119.nc"
./xmlchange LND_DOMAIN_FILE="domain.lnd.conus_30_x8_gx1v6.170119.nc"
./xmlchange ATM_DOMAIN_PATH="$DIN_LOC_ROOT/atm/datm7/domain.clm"
./xmlchange LND_DOMAIN_PATH="$DIN_LOC_ROOT/atm/datm7/domain.clm"

Add this to user_nl_clm...

fsurdat = '/gpfs/fs1/scratch/erik/release-clm5.0.20/tools/mksurfdata_map/surfdata_conus_30_x8_hist_78pfts_CMIP6_simyr1850_c190402.nc'

Since, it dies early on, there probably aren't problems with the above mapping files to cause the problem.

Important output or errors that show the problem

Here's the error from the cesm.log file:

check_weights ERROR: at g = 38823 total PFT weight is
1.45000000000000 active_only = F
check_weights ERROR: at g = 38823 total col weight is
1.45000000000000 active_only = F
check_weights ERROR: at g = 38823 total lunit weight is
1.45000000000000 active_only = F

If you are seeing this message at the beginning of a run with
use_init_interp = .true. and init_interp_method = "use_finidat_areas",
and you are seeing weights less than 1, then a likely cause is:
For the above-mentioned grid cell(s):
The matching input grid cell had some non-zero-weight subgrid type
that is not present in memory in the new run.

ENDRUN:
ERROR: ERROR in subgridWeightsMod.F90 at line 684
Image PC Routine Line Source
cesm.exe 0000000004E9528D Unknown Unknown Unknown
cesm.exe 00000000041A7901 shr_abort_mod_mp_ 114 shr_abort_mod.F90
cesm.exe 00000000041A7782 shr_abort_mod_mp_ 61 shr_abort_mod.F90
cesm.exe 0000000000873EC4 abortutils_mp_end 50 abortutils.F90
cesm.exe 0000000000D7DC5E subgridweightsmod 684 subgridWeightsMod.F90
cesm.exe 0000000000CEDF74 reweightmod_mp_re 55 reweightMod.F90
cesm.exe 00000000008ADD8E clm_initializemod 227 clm_initializeMod.F90

@ekluzek ekluzek self-assigned this Apr 3, 2019
@billsacks
Copy link
Member

Some findings from a similar failure from @fischer-ncar . Note that @fischer-ncar generated the surface dataset from release-clm5.0.18, and ran with ctsm1.0.30; he found that the error went away if he generated the surface dataset from ctsm1.0.30 (which does not have the Antarctica wetlands fix):

This death occurs before the finidat file is read, it doesn't seem we can blame the finidat file.

I suggested running with changes like the following (adjusting the gridcell index appropriately):

diff --git a/src/main/subgridWeightsMod.F90 b/src/main/subgridWeightsMod.F90
index b2c882a4..0c38afed 100644
--- a/src/main/subgridWeightsMod.F90
+++ b/src/main/subgridWeightsMod.F90
@@ -657,6 +657,10 @@ subroutine check_weights (bounds, active_only)
 
     do l = bounds%begl,bounds%endl
        g = lun%gridcell(l)
+       if (g == 3838) then
+          write(iulog,*) 'WJS: g=3838: ', grc%latdeg(g), grc%londeg(g), grc%area(g)
+          write(iulog,*) 'WJS: ', l, lun%itype(l), lun%wtgcell(l), lun%active(l)
+       end if
        if ((active_only .and. lun%active(l)) .or. .not. active_only) then
           sumwtgcell(g) = sumwtgcell(g) + lun%wtgcell(l)
        end if

@fischer-ncar did this and got the following output:

check_weights ERROR: at g =         1321 total col weight is    1.45000000000000      active_only =  F
 WJS: g=1321:   -81.3524976604657        313.420729004390        2309.23302769917    
 WJS:         1321           1  0.000000000000000E+000 T
 WJS: g=1321:   -81.3524976604657        313.420729004390        2309.23302769917    
 WJS:         7154           5  0.450000000000000      T
 WJS: g=1321:   -81.3524976604657        313.420729004390        2309.23302769917    
 WJS:        10162           4   1.00000000000000      T
 check_weights ERROR: at g =         1321 total lunit weight is    1.45000000000000      active_only =  F

So this is a grid cell in Antarctica, and the surface dataset says that it has 100% glacier and 45% lake in that grid cell. I have confirmed that the issue exists on the surface dataset – so this is a problem in the creation of the surface dataset, not in the CTSM code:

In [1]: surf = Dataset('/gpfs/fs1/work/fischer/code/release-clm5.0.18_fsurfdata_map/tools/mksurfdata_map/surfdata_ne0CONUSne30x8_hist_16pfts_Irrig_CMIP6_simyr2000_c190328.nc')

In [3]: np.where(np.abs(surf.variables['LONGXY'][:] - 313.420729004390) < 0.0001)
Out[3]: (array([43665]),)

In [4]: np.where(np.abs(surf.variables['LATIXY'][:] + 81.3524976604657) < 0.0001)
Out[4]: (array([43665]),)

In [5]: g = 43665

In [6]: surf.variables['PCT_LAKE'][g]
Out[6]: 45.0

In [7]: surf.variables['PCT_GLACIER'][g]
Out[7]: 100.0

From eyeballing the raw data files, I think they have lake in the given area (though only for a small number of 3' grid cells), while having 100% or close to 100% glacier.

But I can't understand why:

(1) mksurfdata_map failed to renormalize these % coverages to equal 100%

(2) mksurfdata_map failed to catch this problem of pct_glacier + pct_lake > 100%

(3) CTSM's surfrdMod failed to catch this problem of pct_glacier + pct_lake > 100% (which should have triggered an error prior to the more cryptic error you got)

@billsacks billsacks added the type: bug something is working incorrectly label Apr 3, 2019
@fischer-ncar
Copy link
Contributor

I've generated a new fsurdat file using release-clm5.0.20 for ne120np4. I'm getting a similar error. When I switch to an older fsurdat, I don't get the error.

371: check_weights ERROR: at g = 492658 total PFT weight is
371: 1.58000000000000 active_only = F
371: check_weights ERROR: at g = 492658 total col weight is
371: 1.58000000000000 active_only = F
371: check_weights ERROR: at g = 492658 total lunit weight is
371: 1.58000000000000 active_only = F
371:
371: If you are seeing this message at the beginning of a run with
371: use_init_interp = .true. and init_interp_method = "use_finidat_areas",
371: and you are seeing weights less than 1, then a likely cause is:
371: For the above-mentioned grid cell(s):
371: The matching input grid cell had some non-zero-weight subgrid type
371: that is not present in memory in the new run.
371:
371: ENDRUN:
371: ERROR: ERROR in subgridWeightsMod.F90 at line 684
371:Image PC Routine Line Source
371:cesm.exe 0000000002E7BCED Unknown Unknown Unknown
371:cesm.exe 00000000025B3A12 shr_abort_mod_mp_ 114 shr_abort_mod.F90
371:cesm.exe 000000000199BA7F abortutils_mp_end 50 abortutils.F90
371:cesm.exe 0000000001AEBAFB subgridweightsmod 684 subgridWeightsMod.F90
371:cesm.exe 0000000001ACB190 reweightmod_mp_re 55 reweightMod.F90
371:cesm.exe 00000000019ACBBF clm_initializemod 231 clm_initializeMod.F90
371:cesm.exe 0000000001993721 lnd_comp_mct_mp_l 200 lnd_comp_mct.F90
371:cesm.exe 000000000042AB28 component_mod_mp_ 257 component_mod.F90
371:cesm.exe 0000000000419E9A cime_comp_mod_mp_ 1334 cime_comp_mod.F90
371:cesm.exe 0000000000427BFE MAIN__ 122 cime_driver.F90

My new fsurdat file is.
fsurdat = '/gpfs/fs1/work/fischer/code/release-clm5.0.20_clean/tools/mksurfdata_map/surfdata_ne120np4_hist_16pfts_Irrig_CMIP6_simyr2000_c190501.nc'

@billsacks billsacks added the branch tag: release Changes go on release branch as well as master label May 20, 2019
@billsacks
Copy link
Member

I'm not sure if this is the source of the problem, but it seems like there could be a problem with this line:

https://github.com/ESCOMP/ctsm/blob/02f8258c5299e1e67f72cd8febaf4bc784183ebd/tools/mksurfdata_map/src/mksurfdat.F90#L708

If pctgla + pctlak > 100, we'd end up with pctwet negative, which probably wouldn't be handled properly in following code.

@billsacks
Copy link
Member

Ah, yes, I'm pretty sure that's exactly the source of the problem, based on looking back at my earlier comments and checking PCT_WETLAND (which I should have checked before):

In [5]: surf.variables['PCT_LAKE'][g]
Out[5]: 45.0

In [6]: surf.variables['PCT_GLACIER'][g]
Out[6]: 100.0

In [7]: surf.variables['PCT_WETLAND'][g]
Out[7]: -45.0

This would explain why the other normalizations and error checks didn't catch the problem.

@billsacks
Copy link
Member

I'm not sure what the correct fix here. My first inclination was to simply change:

pctwet(n)    = 100._r8 - pctgla(n) - pctlak(n)

to:

pctwet(n)    = max(100._r8 - pctgla(n) - pctlak(n), 0._r8)

However, I think that (along with the original code) wouldn't be quite right in the case where pctwet(n) was greater than 0 to begin with. For example, imagine that, going into this block of code, we had pctgla = 50, pctlak = 50 and pctwet = 50, for a point outside the pftdata mask. Then, upon execution of this line, we'd end up with pctwet = 0, which seems wrong. So maybe the rule should be that we shouldn't ever decrease pctwet as a result of this logic. So we'd have:

! Anything that isn't already lake or glacier will be considered wetland. But don't allow a decrease in any already-existing wetland area due to this adjustment.
pctwet(n)    = max(100._r8 - pctgla(n) - pctlak(n), pctwet(n))

(It should be okay for pctwet+pctgla+pctlak to be something other than 100 at this point: the renormalizations later should sort that out, I think.)

@ekluzek
Copy link
Contributor Author

ekluzek commented May 20, 2019

A general thing we should do is to check for PCT of types being negative, and if so handle appropriately.

@billsacks
Copy link
Member

Actually, another possible fix would be to say: The real intent here is to make sure that there is some landunit present anywhere. If there is already some glacier or lake, we shouldn't need to do anything with wetland (just like, I believe: If we have anything more than 1e-6 pct vegetation, we don't do anything with wetland). Then we would replace this:

https://github.com/ESCOMP/ctsm/blob/02f8258c5299e1e67f72cd8febaf4bc784183ebd/tools/mksurfdata_map/src/mksurfdat.F90#L704-L709

with something like:

  if (pctgla(n) < 1.e-6_r8 .and. pctlak(n) < 1.e-6_r8) then
     pctgla(n)    = 0._r8
     pctlak(n)    = 0._r8
     pctwet(n)    = 100._r8
  end if

(There is no else clause: if either pctgla or pctlak are non-zero, then we'll just fill the grid cell with one of those land units, which seems like a reasonable, and probably most-correct thing to do.)

@billsacks billsacks changed the title Trying to run with new conus_30_x8 grid fails Ice shelf wetland fix in mksurfdata_map can lead to glacier+lake > 100% on surface datasets May 20, 2019
@billsacks
Copy link
Member

billsacks commented May 20, 2019

Feeling from today's discussion:

On release:

pctwet(n)    = max(100._r8 - pctgla(n) - pctlak(n), 0._r8)

and check to make sure it doesn't change any of our existing surface datasets.

On master, go with something like:

  if (pctgla(n) < 1.e-6_r8 .and. pctlak(n) < 1.e-6_r8) then
     pctgla(n)    = 0._r8
     pctlak(n)    = 0._r8
     pctwet(n)    = 100._r8
  end if

(So wetland should only be 0 or 100 if you're using no_inlandwet.)

@ekluzek
Copy link
Contributor Author

ekluzek commented May 21, 2019

I added a check for negative percent types and found that most of the high resolution cases have this problem while mid or low resolution do not. So ne60np4, 360x720cru, 0.125x0.125, f05, f02 as well as conus30x8, ne120, ne240 as we already know fail. But f09, f19, f10, f45 are all fine for example.

ekluzek added a commit to ekluzek/CTSM that referenced this issue May 22, 2019
ekluzek added a commit to ekluzek/CTSM that referenced this issue Jun 3, 2019
Fix a couple small issues. Correct end year for ndep for SSP's so can run to the end of 2100. Some fixes to mksurfdata_map for high
resolution surface datasets. Have 2-degree WACCM-CMIP6DECK match a user-mod directory without carbon isotopes on. Remove the
ne120np4 and conus_30_x8 surface dataset files, as they can't be used (see ESCOMP#673).

Remove 8x16, 32x64 resolutions as they are no longer needed and there are problems with them. Add in the mapping files needed for
94x192. Check that special landunit percent area's is not less than 0.0, and don't let PCT_WET be less than zero for areas with
ocean (see ESCOMP#673).

Change some of the longer single point tests to use Qian forcing (as faster, less memory, less problems). Add compsets for this.
This change was also done on master.
@ekluzek
Copy link
Contributor Author

ekluzek commented Oct 7, 2019

None of these resolution are on the release branch, so the release branch is OK.

@ekluzek ekluzek removed the branch tag: release Changes go on release branch as well as master label Oct 7, 2019
@billsacks
Copy link
Member

@ekluzek are you still planning to put in place the one-line fix noted above for the release branch so that it will work to create high-res surface datasets from the release branch in the future?

@ekluzek
Copy link
Contributor Author

ekluzek commented Oct 7, 2019

Actually this is already put in place. Just looks like the issue wasn't referenced.

This commit: 027faaf and release-clm5.0.24

ekluzek added a commit to ekluzek/CTSM that referenced this issue May 1, 2020
@ekluzek
Copy link
Contributor Author

ekluzek commented May 2, 2020

We talked about having a different fix for this on master than on the release branch. The problem is that the suggested fix for master doesn't work. So I end up doing something that's functionally equivalent to what's on the release branch.

So I'll bring the release branch version to master and we should decide if we really do want different behavior on master than on the release branch.

ekluzek added a commit to ekluzek/CTSM that referenced this issue May 2, 2020
@billsacks
Copy link
Member

@ekluzek when you say that the suggested fix for master doesn't work: can you please describe in what way it doesn't work? Looking at the code that was in place before you reverted things in 162c35a: It doesn't look like the previous code correctly captured the above plan: In particular, as noted in #673 (comment), there should not have been an else clause, but there was one in your earlier implementation (though I'm not sure if that was related to how this wasn't working).

@billsacks
Copy link
Member

@ekluzek when you say that the suggested fix for master doesn't work: can you please describe in what way it doesn't work? Looking at the code that was in place before you reverted things in 162c35a: It doesn't look like the previous code correctly captured the above plan: In particular, as noted in #673 (comment), there should not have been an else clause, but there was one in your earlier implementation (though I'm not sure if that was related to how this wasn't working).

Also, I wonder if your problems related to this comment:

#553 (comment)

It doesn't look like that was addressed in the changes in your commit.

I thought it best to open a new issue to capture the remaining changes needed, since these were spread across both this issue and #553 . If you agree, let's move further discussion to the new issue - #1001 .

@ekluzek
Copy link
Contributor Author

ekluzek commented May 4, 2020

I'll move more discussion over to #1001. But, to complete this here. The problem I ran into was that my first try at what I understood was asked for -- resulted in mksurfdata_map dying. Some refinements I tried, ran but gave identical answers as the release code version. So I gave up and left it identical to the release branch version.

You are right I didn't try moving the truncation of the percent fields. That would likely make a difference. I suppose that might be the right thing to do here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug something is working incorrectly
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants