Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce disk space on surface datasets by using the urban lookup table #633

Open
ekluzek opened this issue Feb 5, 2019 · 6 comments
Open
Assignees
Labels
enhancement new capability or improved behavior of existing capability

Comments

@ekluzek
Copy link
Collaborator

ekluzek commented Feb 5, 2019

There are 34 urban fields on the surface dataset that are output as double precision at the resolution of the output grid. We could save space and make it easier to bring in updates to urban fields in, by just keeping the region ID on the surface dataset, and referencing the lookup table.

Roughly 25% of the surface dataset is made up of such urban fields. We could save about 50GBytes per new set of surface datasets made. The entire raw dataset (with region ID at 0.05x0.05 degree resolution) is only about one GByte.

@olyson @dlawrenncar

@ekluzek ekluzek added the enhancement new capability or improved behavior of existing capability label Feb 5, 2019
@ekluzek ekluzek self-assigned this Feb 5, 2019
@ekluzek
Copy link
Collaborator Author

ekluzek commented Feb 21, 2019

I went over this with @olyson. We agree it would be good to do. It will both save disk space, and also make it easier for Keith to do urban updates. He won't have to create new surface datasets in order to try out a new urban dataset parameters, he can just point to the new dataset. So this change is both more flexible for Keith and his work, and also saves maybe a 100GBytes of data for each time we create new surface datasets at all the different standard resolutions. It also makes the dataset easier to read and understand, and bundles all the urban fields into one file (so it's more obvious this data isn't needed for Paleo work for example).

The surface dataset will still have the region_id and the density index. And so mksurfdata_map will read those two from the raw urban dataset. Then CLM at initialization will take the density index of the land-unit and the region id to populate all the urban parameters.

We could separate out the urban dataset into two files, one used by mksurfdata and one by CLM. But, we figure we might as well in the beginning keep them on the same raw data file. And both CLM and mksurfdata_map will point to the same file, just use different bits of information from it. Keeping it on the same file, allows the current tools to continue working with it.

Keith will look into if he has time to work on this. If not we don't know when it will happen, but it will be in the list of priorities. The change to mksurfdata_map is simple -- just removing things. So @ekluzek will do that. The part in CTSM is more involved, but it's mainly just moving the code from mksurfdata_map into CTSM itself, so in principle isn't hard.

@olyson
Copy link
Contributor

olyson commented Mar 25, 2019

I will create a branch to start this. Should I branch off master or clm release?

@ekluzek
Copy link
Collaborator Author

ekluzek commented Mar 25, 2019

Branch off of master. This will just go on master and not the release branch.

@olyson
Copy link
Contributor

olyson commented Apr 1, 2019

I have something on a branch that is working. It runs for a global test case and it passes this test (including bfb with ctsm1.0.dev031):

ERP_Ld3.f09_g17.I1850Clm50BgcCropCru.cheyenne_intel.clm-ciso

Basically, it replaces the UrbanInput code in UrbanParamsMod.F90 that reads in the urban parameters from the surface dataset with a call to mkurbanpar, the urban lookup table routine, that is in tools/mksurfdata_map/src.

But to do this, I had to add in the following modules from tools/mksurfdata_map/src because of dependencies:
src/biogeophys/mkgridmapMod.F90
src/biogeophys/mkindexmapMod.F90
src/biogeophys/mkncdio.F90
src/biogeophys/mkvarctl.F90

Not sure if that is desirable. It might be possible to extract exactly what is needed from these four routines and stick them in other existing appropriate modules.

Another issue is that there may be a more elegant way to execute the following code which is used to fill the urbinp arrays:

do p = 1, size(params_scalar)
call lookup_and_check_err(params_scalar(p)%name, params_scalar(p)%fill_val, &
params_scalar(p)%check_invalid, urban_skip_abort_on_invalid_data_check, &
data_scalar_o, 0)
if (params_scalar(p)%name == "CANYON_HWR") then
urbinp%canyon_hwr = data_scalar_o
else if (params_scalar(p)%name == "EM_IMPROAD") then
urbinp%em_improad = data_scalar_o
else if (params_scalar(p)%name == "EM_PERROAD") then
urbinp%em_perroad = data_scalar_o
else if (params_scalar(p)%name == "EM_ROOF") then
urbinp%em_roof = data_scalar_o
else if (params_scalar(p)%name == "EM_WALL") then
urbinp%em_wall = data_scalar_o
else if (params_scalar(p)%name == "HT_ROOF") then
urbinp%ht_roof = data_scalar_o
else if (params_scalar(p)%name == "THICK_ROOF") then
urbinp%thick_roof = data_scalar_o
else if (params_scalar(p)%name == "THICK_WALL") then
urbinp%thick_wall = data_scalar_o
else if (params_scalar(p)%name == "T_BUILDING_MIN") then
urbinp%t_building_min = data_scalar_o
else if (params_scalar(p)%name == "WIND_HGT_CANYON") then
urbinp%wind_hgt_canyon = data_scalar_o
else if (params_scalar(p)%name == "WTLUNIT_ROOF") then
urbinp%wtlunit_roof = data_scalar_o
else if (params_scalar(p)%name == "WTROAD_PERV") then
urbinp%wtroad_perv = data_scalar_o
else if (params_scalar(p)%name == "NLEV_IMPROAD") then
urbinp%nlev_improad = data_scalar_o
end if
end do

For example, I'm wondering if there are any upper to lower case conversion functions in Fortran.

@olyson
Copy link
Contributor

olyson commented Jun 24, 2020

Erik and I had a discussion about whether this new feature should be required before bringing in the new set of urban datasets described in PR#591. In the interest of supporting the urban user community and provide them with the new datasets by default in a timely manner, we decided to table this new feature until after the new urban datasets are brought in.
Another barrier to doing this is that I realized that the THESIS urban properties tool now hosted here:

https://github.com/olyson/THESISUrbanPropertiesTool

would have to be reworked to produce a separate table of urban properties consistent with the above approach.
I don't have any resources to do that at this time.
The new urban datasets and the THESIS urban properties tools are documented in Oleson and Feddema (2019) and it's important to have a CTSM version that has these datasets. This is now slated to be a part of CTSM5.2.

@ekluzek
Copy link
Collaborator Author

ekluzek commented Apr 10, 2024

@olyson and @briandobbins this is part of what I was talking about in email.

From the above comments though we see that @olyson got started on this, but the THESISUrbanPropertiesTool would need to be updated which was too big of a lift. And for CESM3 it's obvious from that this couldn't be done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement new capability or improved behavior of existing capability
Projects
None yet
Development

No branches or pull requests

2 participants