Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mksurfdata_map: replace source (SRC) files of various masks with SRC files with no mask #823

Merged
merged 45 commits into from
May 25, 2021

Conversation

slevis-lmwg
Copy link
Contributor

@slevis-lmwg slevis-lmwg commented Oct 17, 2019

Based on notes that I collected in a meeting with @ekluzek and @billsacks on 2019/9/24:

Description of changes

1) mkmapdata.sh requires no change in method; only changes in the list of SRC files.

Collapse multiple SRC files of a given resolution to a single "nomask" SRC file. A single weight (aka map) file will result that corresponds to each nomask case. Apply the mask in mksurfdat instead (see (2) below).

Create nomask files for SRC resolutions that don’t have them. Not for UGRID file(s) if I understood Erik correctly, but this comment may apply only to this UGRID file: UGRID_1km-merge-10min_HYDRO1K-merge-nomask_c130402.nc.

2) mksurfdata
mkgridmapMod has loop over ns in subr. *default (used currently) and in subr. *sourcemask that I should now use, except for UGRID file(s) (again, exception may apply only to the above mentioned UGRID file).

Frac_src will be passed in as an arg now.
Frac_dst has same meaning as weight_norm, so pass weight_norm through subroutines to replace tgridmap%frac_dst.

Start with mklai. Switch the call from *default to *sourcemask. Mklai with the right mask should be identical (test 1).

mkindexmapMod has similar loop over ns that will need changing. Look out for any additional places that may need changing. If subr. *sourcemask2 is not in use, remove.

After mklai, work with mkpft and the rest.

May need to change the mask to “nomask” in some raw datasets.
Bill was looking at one with “topo” in the name and decided that this one has no masking.

3) Last step...
Mksurfdata.pl -debug will generate a namelist that I can modify by hand. Run manually with input from the namelist. This could end up with roundoff diffs.

Specific notes

Contributors other than yourself, if any:
@ekluzek @billsacks

CTSM Issues Fixed (include github issue #):
Fixes #286
Fixes #938

Are answers expected to change (and if so in what way)?
Expecting roundoff changes

Any User Interface Changes (namelist or namelist defaults changes)?
namelist_defaults_ctsm.xml
namelist_defaults_ctsm_tools.xml
namelist_definitions_ctsm.xml
all change, but this should be transparent to users working with default source (SRC) and destination (DST) resolutions. Users working with user-generated SRC resolutions will now need to create only one nomask SRC file per custom SRC resolution. They will then need to apply any associated custom masks at the "surface-dataset" generation stage of the process (see (2) above).

Progress and testing performed:
This far I am close to completing step (1) above. I have:

  • Created 3 nomask SRC files that were not available previously for certain default SRC resolutions
  • Run qsub regridbatch.sh for the nomask list of SRC files; this has generated weight (aka map) files for all SRC/DST combinations, except the 1km DST grid as far as I can tell.

1. I modified the list of SRC resolutions in mkmapdata.sh and in
checkmapfiles.ncl to only include the "nomask" grid for each resolution.
I removed the 360x720cru SRC resolution entirely because the only
difference from the 0.5x0.5 SRC file was that longitudes were given
0 to 360 instead of -180 to 180.
2. I placed three new files in
/glade/p/cesmdata/cseg/inputdata/mappingdata/grids that were not
available previously:
- SCRIPgrid_0.25x0.25_nomask_c191014.nc
- SCRIPgrid_0.9x1.25_nomask_c191014.nc
- SCRIPgrid_3x3min_nomask_c191014.nc
3. Modified namelist_defaults_ctsm, namelist_defaults_ctsm_tools, and
namelist_definition_ctsm to be consistent with (1) and (2).
4. qsub regridbatch.sh appears to work for all SRC grids.
@billsacks
Copy link
Member

Can one of you explain why the UGRID file should be treated differently from the others?

@ekluzek
Copy link
Collaborator

ekluzek commented Oct 18, 2019

Can one of you explain why the UGRID file should be treated differently from the others?

@billsacks the ugrid required different arguments to ESMF regrid weights. If that's still the case for OCGIS then it will need to be kept. But, if OCGIS detects the filetype -- you might not need to keep track of it.

@billsacks
Copy link
Member

Thanks @ekluzek . Does that imply that we need to keep source masking done at the time of mapping file creation? @slevisconsulting seems to imply this, but I don't understand why we need to treat the UGRID file differently in that respect.

@ekluzek ekluzek added PR status: work in progress enhancement new capability or improved behavior of existing capability labels Oct 18, 2019
@ekluzek ekluzek self-assigned this Oct 18, 2019
Copy link
Collaborator

@ekluzek ekluzek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the first step in the process. This removes the maps that will be redundant when mksurfdata is changed to use the maps separately. Those two changes are linked together so more work is required before this can come in.

This part looks good though.

@ekluzek
Copy link
Collaborator

ekluzek commented Oct 18, 2019

Thanks @ekluzek . Does that imply that we need to keep source masking done at the time of mapping file creation? @slevisconsulting seems to imply this, but I don't understand why we need to treat the UGRID file differently in that respect.

Ahhh, yes. OK I reread @slevisconsulting note about that. The reason the one UGRID might as well be handled differently is because it's the only 1km grid file. So you might as well keep the mask associated with it. It would actually move extra work into mksurfdata to manage the mask separately. For grids with several different masks this makes sense, but this is a high resolution grid and there's only one, so it will slow mksurfdata down to manage the mask separately.

This does mean not everything is handled the same way -- but I think it's OK for this case. If it were a low resolution map and there was only one of them, you might as well handle it like the others. But, for the high resolution map we probably don't want to both pay the cost in changing mksurfdata AND pay the cost of making it slower by changing it.

@billsacks
Copy link
Member

Okay, I see your points @ekluzek . On the other hand, we're going to pay a long-term cost in terms of the potential confusion of having things done differently for different raw datasets. I'm okay with this as long as this is documented somewhere - i.e., we should document that, in nearly all cases, we use mapping files with no masking and apply the mask separately in mksurfdata_map; but there is this one exception (and give the reasons you laid out above).

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented Oct 18, 2019

Keeping record of a few issues that I have encountered.

  1. ./mksurfdata.pl -d generated a few namelists for me to work with; however, the program ended with this error:
    ERROR: could NOT find a mapping file for this resolution: '94x192' and type: lak at 3x3min and MODIS-wCsp

  2. An example of a generated namelist file:
    surfdata_'48x96'_hist_78pfts_CMIP6_simyr1850_c191018.namelist
    whereas the namelist generated by ./mksurfdata.pl -res 48x96 looks like this:
    surfdata_48x96_hist_78pfts_CMIP6_simyr1850_c191018.namelist

The contents of these namelists differ (shown here in opposite order):
51,52c51,52
< fsurdat = 'surfdata_48x96_hist_78pfts_CMIP6_simyr1850_c191018.nc'
< fsurlog = 'surfdata_48x96_hist_78pfts_CMIP6_simyr1850_c191018.log'
---
> fsurdat = 'surfdata_'48x96'_hist_78pfts_CMIP6_simyr1850_c191018.nc'
> fsurlog = 'surfdata_'48x96'_hist_78pfts_CMIP6_simyr1850_c191018.log'

The latter formatting caused me trouble when I tried running
./mksurfdata_map < surfdata_'48x96'_hist_78pfts_CMIP6_simyr2000_c191018.namelist

@slevis-lmwg
Copy link
Contributor Author

Following up on my previous post:
I think I understand the reasoning for this syntax. The .nc and the .log files do not exist, and this is a way of showing that. I guess the problem is that the error I got just told me that I had a namelist error.

- mkdomainMod.F90: Removed error check confirming that the input domain
mask should equal the gridmap mask because the gridmap mask (SRC mask
found in the "nomask" mapping file) now equals 1 everywhere
- mkgridmapMod.F90: Replaced wtnorm with gridmap%frac_dst in
subroutine gridmap_areaave_srcmask
- mklaiMod.F90: Set mask_src equal to tdomain%frac instead of equal to 1

Answers over continent appear unchanged. Answers over coastal areas
appear different by more than round off. I don't know if that's a
problem.
@slevis-lmwg
Copy link
Contributor Author

@ekluzek (and @billsacks ?) pls check the work in my latest commit and let me know if my changes make sense to you when you get a chance.

Note that I did not pass frac_src and frac_dst as arguments the way we had discussed. As far as I could tell, it was not necessary.

Also note that these mods resulted in bigger than round off changes along coastlines. Do you interpret this as a problem with my approach? Or maybe in my approach I didn't capture everything that I should have...

@billsacks
Copy link
Member

Note that I did not pass frac_src and frac_dst as arguments the way we had discussed. As far as I could tell, it was not necessary.

For frac_dst, see my above comment (#823 (comment)).

For frac_src, I'm thinking that when you said this in your initial comment:

Frac_src will be passed in as an arg now.

we actually meant mask_src (though I can't remember for sure). For lai, this is already passed in, which is why you didn't need to do that step. You'll need to pass in mask_src for most other fields.

@billsacks
Copy link
Member

Also note that these mods resulted in bigger than round off changes along coastlines. Do you interpret this as a problem with my approach? Or maybe in my approach I didn't capture everything that I should have...

I interpret that as a possible problem that at least warrants further investigation.

I'm just thinking: It may be worth the time to set up an artificial grid for testing purposes. For example, if you have a 0.5 deg raw dataset (is that the resolution of LAI?) you could set up an exactly 1 deg model grid. Then it would be relatively easy to see what the values in each model grid cell should be.

@ekluzek
Copy link
Collaborator

ekluzek commented Oct 21, 2019

I agree with @billsacks that I wouldn't expect the change to change results to more than roundoff. It's possible we could be convinced that answer changes beyond that would be OK, but I wouldn't expect it. One of the only reasons I can think of to expect answers to change beyond roundoff is if there's a bug in how the mask is handled either inside mksurfdata_map or in the ESMF regridder.

@slevis-lmwg
Copy link
Contributor Author

we actually meant mask_src (though I can't remember for sure). For lai, this is already passed in, which is why you didn't need to do that step. You'll need to pass in mask_src for most other fields.

Based on your comment @billsacks I could see the problem stemming from my setting
mask_src(:) = tdomain%frac(:)
instead of
mask_src(:) = tdomain%mask(:)

I will look into it.

Results now are the same as before the code modifications to within
roundoff and roughly -1e-9 to 1e-9
@@ -57,7 +57,7 @@ module mkpctPftTypeMod
end interface pct_pft_type

! !PRIVATE TYPES:
real(r8), parameter :: tol = 1.e-12_r8 ! tolerance for checking equality
real(r8), parameter :: tol = 1.e-9_r8 ! tolerance for checking equality
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to raise the tolerance for this test to pass:
convert_from_p2g ERROR: default_pct_p2l must sum to 100
sum(default_pct_p2l) = 100.000000000149

Copy link
Member

@billsacks billsacks Oct 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, together with the still-slightly-larger-than-expected differences from before, is making me nervous. This amounts to a relative error of 1e-12, which is about 4 orders of magnitude greater than double precision roundoff.

I feel like we should either (1) trace this greater error back to the source and convince ourselves that this is an acceptable level of error, or (2) play with the source-masking remapping algorithm to try to reduce roundoff-level errors.

Regarding (2), the first thing I'd try is to change this:

          dst_array(no) = dst_array(no) + wt*mask_src(ni)*src_array(ni)/wtnorm(no)

I'd delete the division by wtnorm(no) here. Instead, I'd change the where block at the end of the routine:

    where (wtnorm == 0._r8)
       dst_array = nodata
    end where

to:

    where (wtnorm == 0._r8)
       dst_array = nodata
    elsewhere
       dst_array = dst_array / wtnorm
    end where

My gut feeling is that just doing this division once will be less prone to introducing errors in the final result, though I'm not sure of this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if it would help, I'm happy to help with this investigation if you give me directions for how to reproduce what you've been doing for testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@billsacks I tried your suggestion, but it did not help. I have now pushed new changes that correct the problem.

@slevis-lmwg
Copy link
Contributor Author

Latest commit:
Results still remain the same as prior to the code modifications to within roundoff and roughly -1e-9 to 1e-9

@ekluzek
Copy link
Collaborator

ekluzek commented Oct 22, 2019

@slevisconsulting I'm confused. You say the latest commit is to within roundoff, but then you state it's within +-1.e-9 -- roundoff is roundoff is roughly +-1.e-15. +-1.e-9 might be OK, but it is more than roundoff, unless the numbers being differenced are larger than order 1.0.

@slevis-lmwg
Copy link
Contributor Author

@slevisconsulting I'm confused. You say the latest commit is to within roundoff, but then you state it's within +-1.e-9 -- roundoff is roundoff is roughly +-1.e-15. +-1.e-9 might be OK, but it is more than roundoff, unless the numbers being differenced are larger than order 1.0.

@ekluzek thank you for the correction. I'm sorry for using the wrong terminology. Is this level of change is acceptable?

@billsacks
Copy link
Member

Based on your comment @billsacks I could see the problem stemming from my setting
mask_src(:) = tdomain%frac(:)
instead of
mask_src(:) = tdomain%mask(:)

Ah, yes - good catch. I think it's arguable which one is correct to use in the remapping, but it makes sense that using the mask would be consistent with what was being done when we were relying on the ESMF regridder for source masking.

@ekluzek
Copy link
Collaborator

ekluzek commented May 6, 2021

In terms of your last two questions. We still want to change from scrip grid to ESMF mesh files. Is that what the UNSTRUCT format is? You might as well include them in what you are doing.

The rimport script will fail if a file doesn't exist. I would suggest before you use it though that you do....

cd $CSMDATA
ls 'cat list'

So use the list to check for the files existence beforehand.

@slevis-lmwg
Copy link
Contributor Author

In terms of your last two questions. We still want to change from scrip grid to ESMF mesh files. Is that what the UNSTRUCT format is? You might as well include them in what you are doing.

The rimport script will fail if a file doesn't exist. I would suggest before you use it though that you do....

cd $CSMDATA
ls 'cat list'

So use the list to check for the files existence beforehand.

I think I did this correctly:
cd /glade/p/cesm/cseg/inputdata
ls cat /glade/work/slevis/git/mksurfdata_maps_wo_src_masks/PR_823_new_file_list
and this wrote out the list without error. If a path or filename had been wrong, it would have returned "not found", right?

@slevis-lmwg
Copy link
Contributor Author

With BobO's correction, these tests that were failing previously now PASS:
make -f /glade/work/slevis/git/mksurfdata_maps_wo_src_masks/tools/mksurfdata_map/Makefile.data crop-global-present-ne120np4
make -f /glade/work/slevis/git/mksurfdata_maps_wo_src_masks/tools/mksurfdata_map/Makefile.data crop-global-present-ne16np4
I believe that all the tests recommended by @ekluzek and @billsacks are OK now (i.e. only expected failures).

@slevis-lmwg
Copy link
Contributor Author

I think this PR is ready for final review and other near-merge activities. I expect this to be my last post until I return to work on 5/18.

In my local branch directory /glade/work/slevis/git/mksurfdata_maps_wo_src_masks I have a draft PR_823_new_file_list that's up-to-date to the best of my knowledge. I have not run rimport on it yet.

In /glade/work/slevis/git/mksurfdata_maps_wo_src_masks/docs I have draft Change* files (named Change*_DRAFT). I did not update them this week, so the contents are from 3/2020.

Now that I have replaced the map_3x3min_nomask_to_ne120 and ne16 files and tests pass, should I go back and remove all map_3x3min_nomask_to_ne files as if I never created them?

@billsacks
Copy link
Member

billsacks commented May 19, 2021

Thank you for all of your work on this, @slevisconsulting! It really sounds like you're close now!

I generated a new set of map_ files for 0.9x1.25 and then created new fsurdat files with the baseline code and with the branch. Only PCT_NAT_PFT triggers the error check for max relative difference > 1e-13 with a max relative difference = 9e-12. A year-and-a-half ago this number was 8e-13. Do you consider this change a red flag @billsacks ?

This is probably fine. However – partly connected to this increase in error, but more just for its own sake – if you haven't already done so, I'd suggest doing a quick review of the diffs that have come to master since the original version of this branch in the files modified by this PR. That is, look quickly at the diffs between ctsm1.0.dev080 and ctsm5.1.dev039 in relevant files: mksurfdata_map, mkmapdata and relevant parts of namelist_defaults_ctsm.xml and namelist_defaults_ctsm_tools.xml. This is something that I often do when I'm bringing an old branch up to date, to try to catch semantic conflicts that wouldn't be picked up as textual conflicts in the git merge. An example of the kind of thing to look out for is the introduction of a new call to gridmap_areaave which needs to be changed according to the other changes in this PR. My hope is that there haven't been too many changes on master to the relevant files, so hopefully reviewing the changes made to these files on master won't be too time consuming.

You had a couple of other questions about whether to keep various files and possibly import them to the inputdata repository. I don't have my head back in this enough to answer your specific questions, but my general rule is: keep any files that are actually needed (i.e., pointed to in xml), and delete the others (or move them to some other location outside of inputdata if they might be needed in the future but aren't currently referenced).

@slevis-lmwg
Copy link
Contributor Author

[...] more just for its own sake – if you haven't already done so, I'd suggest doing a quick review of the diffs that have come to master since the original version of this branch in the files modified by this PR. That is, look quickly at the diffs between ctsm1.0.dev080 and ctsm5.1.dev039 in relevant files: mksurfdata_map, mkmapdata and relevant parts of namelist_defaults_ctsm.xml and namelist_defaults_ctsm_tools.xml. This is something that I often do when I'm bringing an old branch up to date, to try to catch semantic conflicts that wouldn't be picked up as textual conflicts in the git merge. An example of the kind of thing to look out for is the introduction of a new call to gridmap_areaave which needs to be changed according to the other changes in this PR. My hope is that there haven't been too many changes on master to the relevant files, so hopefully reviewing the changes made to these files on master won't be too time consuming.

@billsacks originally I had done this by looking at
https://github.com/ESCOMP/CTSM/pull/823/files
Since you brought it up, I repeated this confirmation by doing the following:
git diff ctsm5.1.dev038 > dif_dev038.txt
git diff ctsm1.0.dev080 5dfe8dd5d33fe691834aeba72cfa379d38e062b4 > dif_dev080.txt
diff dif_dev038.txt dif_dev080.txt > dif_new_vs_old.txt
...and, as far as I can tell, I see expected diffs only and nothing out of the ordinary.

@billsacks
Copy link
Member

@billsacks originally I had done this by looking at
https://github.com/ESCOMP/CTSM/pull/823/files
Since you brought it up, I repeated this confirmation by doing the following:
git diff ctsm5.1.dev038 > dif_dev038.txt
git diff ctsm1.0.dev080 5dfe8dd5d33fe691834aeba72cfa379d38e062b4 > dif_dev080.txt
diff dif_dev038.txt dif_dev080.txt > dif_new_vs_old.txt
...and, as far as I can tell, I see expected diffs only and nothing out of the ordinary.

That's very helpful, but not really what I meant. Looking at the diffs on master (not bringing your branch into the picture at all) between those two ctsm tags can be helpful with these long-lived branches. For example, if someone introduced an entirely new module in mksurfdata_map that needs to be adjusted similarly to how you adjusted the existing modules, you wouldn't be able to see that just by looking at the diffs between your branch and master.

I don't feel it's critical that you do this, but it tends to make me more comfortable to do that one extra quick check when I'm working with such a long-lived branch.

@slevis-lmwg
Copy link
Contributor Author

I tried the rimport command
./rimport -list /glade/work/slevis/git/mksurfdata_maps_wo_src_masks/PR_823_new_file_list
and got the following error. Should I hit "accept permanently" or should I try something else?

svn import  /glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/maps/0.1x0.1/map_0.125x0.125_nomask_to_0.1x0.1_nomask_aave_da_c200206.nc https://svn-ccsm-inputdata.cgd.ucar.edu/trunk/inputdata/lnd/clm2/mappingdata/maps/0.1x0.1/map_0.125x0.125_nomask_to_0.1x0.1_nomask_aave_da_c200206.nc
Error validating server certificate for 'https://svn-ccsm-inputdata.cgd.ucar.edu:443':
 - The certificate is not issued by a trusted authority. Use the
   fingerprint to validate the certificate manually!
Certificate information:
 - Hostname: *.cgd.ucar.edu
 - Valid: from Nov 18 00:00:00 2019 GMT until Nov 17 23:59:59 2021 GMT
 - Issuer: InCommon, Internet2, Ann Arbor, MI, US
 - Fingerprint: 1A:41:6E:31:C5:F7:99:DD:B7:72:CC:C6:30:B9:E1:C7:90:82:D3:C8
(R)eject, accept (t)emporarily or accept (p)ermanently?

@billsacks
Copy link
Member

Accept permanently.

@slevis-lmwg
Copy link
Contributor Author

rimport ran to completion.

I have updated the ChangeLog.

Question: will you want me to commit/push the changes to mkmapdata.sh for accessing BobO's new ESMF build?

@ekluzek
Copy link
Collaborator

ekluzek commented May 19, 2021

@slevisconsulting you should update mkmapdata.sh to point to the version of ESMF needed for this to work correctly. Is Bob O's version now the latest version of ESMF out there? If there's a later version that has his fixes in place you might not need it, but use the latest one that has the fixes needed in place.

Add NEON sites. Add cime/cdeps support and capability to run NEON tower sites.

This also brings in Negin's first version of the subset_data.py script to create
datasets for single-point and regional cases. And a version of it was run to
create the NEON surface datasets.

To setup a NEON site do the following

cd cime/scripts
./create_newcase --case myNEONtest --res CLM_USRDAT --compset IHist1PtClm51Bgc \
--user-mods-dir NEON/NIWO --driver nuopc --run-unsupported

(There's also a I1PtClm51Bgc compset that can be used for fixed conditions)

Note, also that several externals were updated to the version in cesm2_3_alpha03a. This means that
answers change when CISM is turned on. Some of the grids for compsets tested were updated to
use the new grid name "gris" in place of the older "gland".
@billsacks
Copy link
Member

This old post reminded me:

  1. I made the new sfc datasets and landuse.timeseries files by running with the "all" option,
    but they are sitting in /glade/scratch/slevis/temp_work/surfdata/branch_results. Shall I copy them to /glade/p/cesmdata/cseg/inputdata/lnd/clm2/surfdata_map and rerun the rimport command for them?
  2. When you consider this PR ready for merge, do you want me to run the full test-suites?

@ekluzek and I talked about this today. What we want is similar to what was noted in this old comment: #823 (comment) (the list of testing to run). In particular, you should run the tools tests (I think you already have) and a single system test to make sure there are no errors in the xml files. It would probably be a good idea to re-verify that you can make all surface datasets via Makefile.data again (if you haven't done that since the merge up to the latest master).

Before too long, we'd like to update all of the surface datasets so that they are consistent with these tools changes, but we (including @negin513) decided it makes sense to wait a bit on this in case there are other answer changes that will come in with the toolchain work.

We also decided this can be the next tag to go in, given that you seem just about ready. So please go ahead with the final steps at your convenience.

@slevis-lmwg
Copy link
Contributor Author

[...] run the tools tests (I think you already have) and a single system test to make sure there are no errors in the xml files. It would probably be a good idea to re-verify that you can make all surface datasets via Makefile.data again (if you haven't done that since the merge up to the latest master).

That's right, I completed these tests when I merged to dev038. At your recommendation, I am rerunning the Makefile.data test now that I have merged to dev039. As before, I am getting the following failures that I consider "expected" because they also occur in the baseline:

  • failure in landuse-timeseries-smallville
  • failure in global-present-nldas
  • invalid resolution warnings for ne30pg2 and ne0np4ARCTICGRISne30x8

I will post an update in the next couple of hours if anything unexpected happens with the last 8 jobs in this test, which is in progress.

We also decided this can be the next tag to go in, given that you seem just about ready. So please go ahead with the final steps at your convenience.

Sounds good. Let me know if you think of anything else that needs to happen.

@billsacks
Copy link
Member

@slevisconsulting - I remembered one more thing that could be good to do before finalizing this PR, if you haven't already: Rerunning CIME's check_map tool on some (or all) of the new mapping files. I don't feel that this absolutely needs to be done, but the problems uncovered March, 2020 manifested as conservation errors in the check_map tool. See #823 (comment) for details. It would probably be good to at least check whether the 3min to spectral element (ne) mapping files are now good according to that tool. And if you can quickly (e.g., with a script) iterate over all of the new mapping files, it wouldn't hurt to just run this on everything to verify that the latest ESMF version properly conserves. I don't want this to become a big rabbit hole, though, so please feel free to ignore if this seems like something that would take more time than the value it provides.

@billsacks
Copy link
Member

@slevisconsulting - I remembered one more thing that could be good to do before finalizing this PR, if you haven't already: Rerunning CIME's check_map tool on some (or all) of the new mapping files. I don't feel that this absolutely needs to be done, but the problems uncovered March, 2020 manifested as conservation errors in the check_map tool. See #823 (comment) for details. It would probably be good to at least check whether the 3min to spectral element (ne) mapping files are now good according to that tool. And if you can quickly (e.g., with a script) iterate over all of the new mapping files, it wouldn't hurt to just run this on everything to verify that the latest ESMF version properly conserves. I don't want this to become a big rabbit hole, though, so please feel free to ignore if this seems like something that would take more time than the value it provides.

I'm also comfortable with deferring this check until after the tag is merged, in order to just get this in and done: Any issues uncovered are likely to be issues with the ESMF mapping tool, and as such might take a long time to resolve. Again, though, in either case, no need to spend significant time on this.

@slevis-lmwg
Copy link
Contributor Author

slevis-lmwg commented May 21, 2021

Hmm, I tried the following with both the current module loads, as well as with the ones needed to get BobO's fix (see further below):

cd check_maps/src/
gmake
cd ..
module load esmf_libs/8.0.0
module load esmf-8.0.0-ncdfio-uni-O
./check_map.sh /glade/p/cesm/cseg/inputdata/lnd/clm2/mappingdata/maps/<dir>/<map_file.nc>

or

module use /glade/p/cesmdata/cseg/PROGS/modulefiles/esmfpkgs/intel/19.1.1
module load esmf-8.2.0.b06-ncdfio-mpt-g

Both worked for this file that I generated last year map_0.125x0.125_nomask_to_0.1x0.1_nomask_aave_da_c200206.nc

Both failed for this file that I generated last year map_0.25x0.25_nomask_to_0.1x0.1_nomask_aave_da_c200309.nc:

 FAILED: conservation error =   2.895461648222408E-012  in test            3

And both returned a seg fault for the files created earlier this month, e.g. map_3x3min_nomask_to_ne16np4_nomask_aave_da_c210506.nc:

forrtl: severe (174): SIGSEGV, segmentation fault occurred
Image              PC                Routine            Line        Source      
ESMF_RegridWeight  0000000000448D13  Unknown               Unknown  Unknown
libpthread.so.0    00007FFAF3317B00  Unknown               Unknown  Unknown
libesmf.so         00007FFAF7C32822  Unknown               Unknown  Unknown
ESMF_RegridWeight  00000000004289E4  offlinetester_IP_        1395  ESMF_RegridWeightGenCheck.F90
ESMF_RegridWeight  00000000004105FA  MAIN__                    368  ESMF_RegridWeightGenCheck.F90
ESMF_RegridWeight  0000000000406AA2  Unknown               Unknown  Unknown
libc.so.6          00007FFAF21BB6E5  __libc_start_main     Unknown  Unknown
ESMF_RegridWeight  00000000004069A9  Unknown               Unknown  Unknown

So I think that you're right about this being a new can of worms...

@billsacks
Copy link
Member

I'm not too concerned about the conservation error of 2.9e-12: I have a feeling that is just slightly greater than the tolerance, which is fairly arbitrary.

But the seg fault on the new file indeed seems like a can of worms. Darn. I guess we can just move ahead without trying to do those tests.

@slevis-lmwg
Copy link
Contributor Author

Ok. And would you like me to report the seg fault to someone in the ESMF group?

@billsacks
Copy link
Member

Ok. And would you like me to report the seg fault to someone in the ESMF group?

No. This is a CIME tool, not an ESMF tool. It's not clear to me at this point where the problem lies.

@slevis-lmwg
Copy link
Contributor Author

partly connected to this increase in error, but more just for its own sake – if you haven't already done so, I'd suggest doing a quick review of the diffs that have come to master since the original version of this branch in the files modified by this PR. That is, look quickly at the diffs between ctsm1.0.dev080 and ctsm5.1.dev039 in relevant files: mksurfdata_map, mkmapdata and relevant parts of namelist_defaults_ctsm.xml and namelist_defaults_ctsm_tools.xml. This is something that I often do when I'm bringing an old branch up to date, to try to catch semantic conflicts that wouldn't be picked up as textual conflicts in the git merge. An example of the kind of thing to look out for is the introduction of a new call to gridmap_areaave which needs to be changed according to the other changes in this PR. My hope is that there haven't been too many changes on master to the relevant files, so hopefully reviewing the changes made to these files on master won't be too time consuming.

I went over the diffs brought in with the dev038 merge focusing on the following files and didn't spot anything concerning:

  • mkmapdata.sh
  • namelist_defaults_ctsm*.xml
  • mksurfdata_map codes: @billsacks could you doublecheck my work in file mkpftMod.F90 because I did have to resolve conflicts?

I also typed git grep 'call gridmap' in /glade/work/slevis/git/mksurfdata_maps_wo_src_masks/tools/mksurfdata_map/src and everything appeared in order to me.

@billsacks
Copy link
Member

  • mksurfdata_map codes: @billsacks could you doublecheck my work in file mkpftMod.F90 because I did have to resolve conflicts?

I just double checked this. Your conflict resolution looks good to me.

@billsacks
Copy link
Member

@slevisconsulting - FYI, I'm making some minor adjustments to the ChangeLog entry: This tag doesn't actually change answers for the model itself, even though It will change answers once we update surface datasets. I'm clarifying that in the ChangeLog. Also, even when we do bring in new surface datasets, I wouldn't call the changes "significant" (according to the section near the top), since they are only about roundoff-level; so I'm unchecking the checkboxes for significant answer changes.

@billsacks billsacks merged commit 18c3cb9 into ESCOMP:master May 25, 2021
@slevis-lmwg slevis-lmwg deleted the mksurfdata_maps_wo_src_masks branch May 26, 2021 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability
Projects
Status: Done (non release/external)
6 participants