-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce disk space on surface datasets by using the urban lookup table #633
Comments
I went over this with @olyson. We agree it would be good to do. It will both save disk space, and also make it easier for Keith to do urban updates. He won't have to create new surface datasets in order to try out a new urban dataset parameters, he can just point to the new dataset. So this change is both more flexible for Keith and his work, and also saves maybe a 100GBytes of data for each time we create new surface datasets at all the different standard resolutions. It also makes the dataset easier to read and understand, and bundles all the urban fields into one file (so it's more obvious this data isn't needed for Paleo work for example). The surface dataset will still have the region_id and the density index. And so mksurfdata_map will read those two from the raw urban dataset. Then CLM at initialization will take the density index of the land-unit and the region id to populate all the urban parameters. We could separate out the urban dataset into two files, one used by mksurfdata and one by CLM. But, we figure we might as well in the beginning keep them on the same raw data file. And both CLM and mksurfdata_map will point to the same file, just use different bits of information from it. Keeping it on the same file, allows the current tools to continue working with it. Keith will look into if he has time to work on this. If not we don't know when it will happen, but it will be in the list of priorities. The change to mksurfdata_map is simple -- just removing things. So @ekluzek will do that. The part in CTSM is more involved, but it's mainly just moving the code from mksurfdata_map into CTSM itself, so in principle isn't hard. |
I will create a branch to start this. Should I branch off master or clm release? |
Branch off of master. This will just go on master and not the release branch. |
I have something on a branch that is working. It runs for a global test case and it passes this test (including bfb with ctsm1.0.dev031): ERP_Ld3.f09_g17.I1850Clm50BgcCropCru.cheyenne_intel.clm-ciso Basically, it replaces the UrbanInput code in UrbanParamsMod.F90 that reads in the urban parameters from the surface dataset with a call to mkurbanpar, the urban lookup table routine, that is in tools/mksurfdata_map/src. But to do this, I had to add in the following modules from tools/mksurfdata_map/src because of dependencies: Not sure if that is desirable. It might be possible to extract exactly what is needed from these four routines and stick them in other existing appropriate modules. Another issue is that there may be a more elegant way to execute the following code which is used to fill the urbinp arrays: do p = 1, size(params_scalar) For example, I'm wondering if there are any upper to lower case conversion functions in Fortran. |
Erik and I had a discussion about whether this new feature should be required before bringing in the new set of urban datasets described in PR#591. In the interest of supporting the urban user community and provide them with the new datasets by default in a timely manner, we decided to table this new feature until after the new urban datasets are brought in. https://github.com/olyson/THESISUrbanPropertiesTool would have to be reworked to produce a separate table of urban properties consistent with the above approach. |
@olyson and @briandobbins this is part of what I was talking about in email. From the above comments though we see that @olyson got started on this, but the THESISUrbanPropertiesTool would need to be updated which was too big of a lift. And for CESM3 it's obvious from that this couldn't be done. |
There are 34 urban fields on the surface dataset that are output as double precision at the resolution of the output grid. We could save space and make it easier to bring in updates to urban fields in, by just keeping the region ID on the surface dataset, and referencing the lookup table.
Roughly 25% of the surface dataset is made up of such urban fields. We could save about 50GBytes per new set of surface datasets made. The entire raw dataset (with region ID at 0.05x0.05 degree resolution) is only about one GByte.
@olyson @dlawrenncar
The text was updated successfully, but these errors were encountered: