Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unused fields from surface datasets #629

Closed
ekluzek opened this issue Jan 31, 2019 · 12 comments · Fixed by #883
Closed

Remove unused fields from surface datasets #629

ekluzek opened this issue Jan 31, 2019 · 12 comments · Fixed by #883
Assignees
Labels
enhancement new capability or improved behavior of existing capability priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations
Milestone

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Jan 31, 2019

There are some unused fields on the surface datasets. These should be removed in order to save disk space:

PCT_GLC_MEC_GIC
PCT_GLC_MEC_ICESHEET
PCT_GLC_GIC
PCT_GLC_ICESHEET
F0
P3
ZWT0

There are also a bunch of fields that could be eliminated for cases where anthropocentric influences are removed:

URBAN_REGION_ID
CONST_HARVEST_VH1
CONST_HARVEST_VH2
CONST_FERTNITRO_CFT
CONST_HARVEST_SH1
CONST_HARVEST_SH2
CONST_HARVEST_SH3
CONST_GRAZING
gdp
abm
NLEV_IMPROAD
CV_IMPROAD
CV_WALL
CV_ROOF
TK_IMPROAD
TK_WALL
TK_ROOF
ALB_WALL_DIF
ALB_ROOF_DIF
ALB_WALL_DIR
ALB_ROOF_DIR
ALB_IMPROAD_DIF
ALB_IMPROAD_DIF
ALB_PERROAD_DIF
ALB_PERROAD_DIF
WTROAD_PERV
WTLUNIT_ROOF
WIND_HGT_CANYON
T_BUILDING_MIN
THICK_WALL
THICK_ROOF
HT_ROOF
EM_WALL
EM_ROOF
EM_PERROAD
EM_IMPROAD
CANYON_HWR
UNREPRESENTED_CFT_LULCC
UNREPRESENTED_PFT_LULCC

And finally there's several fields only used for VIC (which comes from f09 tunings)

binfl
Dsmax
Ds
Ws

@ekluzek ekluzek added enhancement new capability or improved behavior of existing capability priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations labels Jan 31, 2019
@ekluzek ekluzek added this to the cesm2.1.1 milestone Jan 31, 2019
@ekluzek ekluzek self-assigned this Jan 31, 2019
@billsacks
Copy link
Member

@whlipscomb I think the presence of PCT_GLC_MEC_GIC, PCT_GLC_MEC_ICESHEET, PCT_GLC_GIC and PCT_GLC_ICESHEET dates back to some of my earliest work as LIWG liaison (in 2011). Do you have a sense of whether anyone actually uses these fields for post-processing? The context is that we're trying to reduce the size of the surface datasets if possible.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Feb 5, 2019

Note, that VIC tests are only run for f09, f19, f10 for 2000 conditions with Crop (and CN) off.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Feb 5, 2019

Many urban fields are really lookup tables for a relatively small number of regions throughout the globe. So we could also save space, by referencing the lookup tables and just keep the region ID on the surface dataset. That would also make it easier to bring in new urban fields, because we wouldn't have to put them on the surface datasets, before adopting them. @olyson

@ekluzek
Copy link
Collaborator Author

ekluzek commented Feb 5, 2019

For urban fields, note issue #633 which handles them by pointing to the urban field region lookup table as a separate file. We won't do this one right away, but should consider it for future changes.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Feb 13, 2019

In response to this...

@whlipscomb I think the presence of PCT_GLC_MEC_GIC, PCT_GLC_MEC_ICESHEET, PCT_GLC_GIC and PCT_GLC_ICESHEET dates back to some of my earliest work as LIWG liaison (in 2011). Do you have a sense of whether anyone actually uses these fields for post-processing? The context is that we're trying to reduce the size of the surface datasets if possible.

If the usage is for postprocessing, including these fields in the input datasets is the wrong solution. We are at a point of needing to be careful with the creation of input surface datasets because of space. For postprocessing you probably only need it at one resolution, and you can get by with older versions of the dataset. In contrast to that the input datasets are carefully version controlled, and the surface datasets are specifically for things that need to be regrid to the resolution the model is running at. So older surface datasets will have these fields on it, and the raw data that we created it from will have it as well.

@whlipscomb
Copy link

@billsacks and @ekluzek, I don't know whether people are currently using these fields. But in general it could be useful to distinguish between GIC (i.e., peripheral glaciers and ice caps) and the main ice sheet, for someone who is post-processing CLM glacier output. Would it be possible to include these fields in non-input datasets, as @ekluzek suggests?

I'm not sure whether or not these fields should be included at each supported resolution. Ideally, these fields should (at any resolution) satisfy pct_glc_gic + pct_glc_icesheet = pct_glc, where pct_glc is the glacier percentage in the surface dataset. If this equality holds for one resolution, would mapping preserve it at other resolutions? I think this would depend on whether the mapping is linear.

@billsacks
Copy link
Member

@whlipscomb I understand why these fields are useful in principle, but I was thinking that, if they're not being used (much) in practice, we should save the disk space. Would it be reasonable to send a message to liwg-core asking if anyone is using these fields? As for @ekluzek 's suggestions, I'm not sure I understand these, but maybe we could discuss further in person.

@ekluzek would a compromise be to have an optional flag that turns on these fields? Is that something you're doing for some other fields now? Then, if there are a few liwg members who want them, we could provide instructions for how they could create a surface dataset on the fly with the necessary fields.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Feb 13, 2019

@billsacks I could add them as optional. I did that for VIC. I still think that's the wrong solution. But, I think we should talk to some liwg-core members who need this and narrow down what the real requirements are. Note, since it's been available until now, you just create it with an older version. But, getting the real requirements from the people who are (or would) use them is really the important thing to do here.

@whlipscomb
Copy link

@ekluzek and @billsacks, I may have misunderstood. I agree that we don't need these fields in the standard surface data sets. I'd just like to have them available in case someone wants them. @ekluzek, what do you mean when you say "you just create it with an older version"? Do you mean an older version of the scripts that process raw data to create surface data sets? If so, then I think I'd prefer an optional flag (turned off by default) in a current script, rather than relying on older scripts.

I'd like to make the decision based on what makes sense scientifically, rather than a survey of what current LIWG members are doing. People may not know these fields are there. (I'd forgotten myself.) But it's common practice, when post-processing Greenland ice sheet output, to mask out GICs.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Feb 13, 2019

@whlipscomb let's setup a meeting to discuss this. My point is to understand the scientific requirements here and provide a solution that best meets them. To get at that, we both probably need to learn what people are doing as well as make sure we are providing a solution that they can take advantage of even if they haven't to this point.

@billsacks
Copy link
Member

In discussion with @ekluzek and @whlipscomb about the above glacier-related fields

Bill L says: these diagnostic fields, while not critical, are very helpful to have for post-processing.

The easiest thing would be to add an option (like '-mask_gic') to mksurfdat.pl to add these diagnostic fields. Then LIWG scientists could run this tool with this option. While having mksurfdata_map do this isn't ideal, it's probably a lot easier than making a separate tool.

@billsacks
Copy link
Member

Talking with @ekluzek - our recollection is that this was done, but some of the changes for PCT_GLC need to be backed out or reworked. Probably do that on release branch as well as master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability priority: high High priority to fix/merge soon, e.g., because it is a problem in important configurations
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants