Munge GCM downloads into GLM-ready variables #266

lindsayplatt · 2021-12-21T18:23:56Z

In addition to the munging, this adds

a temporary way to get all the GCM data as feather files to skip NetCDF files & my original attempt at making a NetCDF file (see GCM pipeline - NetCDF formatting of output #252) and unblock Hayley (those files were zipped up and manually uploaded here in Sharepoint for now)
expands the time ranges to use the full time period (fixes Expand beyond test time period for downscaled GCMs #264)

Putting this up here now but just want it to sit for now. Need to do the following before it can be reviewed:

Delete NetCDF attempt (or actually solve by doing GCM pipeline - NetCDF formatting of output #252)
add retry() to GCM download code
~~add some basic optimization to the pivoting code~~ (waiting to do this later given mixed results. See GCM pipeline: scale up to MN #258 (comment))

Merge branch 'gcm_driver_data_munge_pipeline' of github.com:USGS-R/lake-temperature-model-prep into gcm_driver_data_munge_pipeline # Conflicts: # _targets.R

…and units

…r rebuilds appropriately

7_drivers_munge/src/GCM_driver_utils.R

lindsayplatt · 2022-01-05T20:50:56Z

@jread-usgs looking for your review specifically on the munge_to_glm() function but feel free to comment on other aspects. @hcorson-dosch looking for your review on my targets code. Note that I deleted all of my old NetCDF munging code, but left the top-level function, generate_gcm_nc(), there (and all its params) but it just saves a blank .nc file for now (this will get filled out in #252

_targets.R

hcorson-dosch-usgs · 2022-01-05T23:01:58Z

7_drivers_munge/src/GCM_driver_utils.R

+#' @description the final GCM driver data will need to be daily, but geoknife
+#' returns hourly values. This step summarizes the data into daily values. It
+#' creates a file with the exact same name, except that the "_raw" part of the
+#' `in_file` filepath is replaced with "_daily".


Might be good to add the conversion steps to this function description

_targets.R

jordansread

I focused on the munge_to_glm function.

jordansread · 2022-01-06T13:50:48Z

7_drivers_munge/src/GCM_driver_utils.R

+    ) %>%
+
+    # Simply rename GDP variables into GLM variables
+    mutate(RelHum = qas,


RelHum isn't equivalent to qas, so conversion is needed here. To calculate RelHum, you need to specific humidity, pressure, and air temperature (see here). See helper function here

ps is the surface air pressure in hPa (see here) and would need to be part of your download and would need to be converted to mb from hPa to use that helper function above.

jordansread · 2022-01-06T13:51:12Z

7_drivers_munge/src/GCM_driver_utils.R

+    # Convert from hourly to daily data
+    group_by(time, cell) %>%
+    # TODO: should we `na.rm = TRUE`?
+    summarize(across(.cols = -WindSpeed, .fns = ~ mean(.x, na.rm = FALSE)),


nice way of doing this 👍

Agreed! This is a really clean approach.

This gets a bit muddied by the precip need - "meters/day". Wouldn't this need a sum of the hourly rate and not a mean?

I think I can just follow the pattern I did for WindSpeed where it has its own method defined.

Since it's a rate, in units of m/day already, rather than the depth value that some models use, I think using mean() is fine, right @jread-usgs ?

yes, mean would be equivalent in this case. If it were m/hour though, we'd need to treat it differently

jordansread · 2022-01-06T13:52:10Z

7_drivers_munge/src/GCM_driver_utils.R

+              n = length(time),
+              .groups = "keep") %>%
+    ungroup() %>%
+    # This drops Jan 1, 1980


hmmm...we lose the first day of the year because it isn't a complete day? Bummer. Are we working in the appropriate timezone for that summation?

_targets.R

jordansread · 2022-01-06T13:58:06Z

7_drivers_munge/src/GCM_driver_utils.R

+    ) %>%
+
+    # Create a column with just the date to use for summarizing
+    mutate(date = as.Date(DateTime)) %>%


I'm not running this so I can't tell quite what you are seeing here, but I am wondering if we're calculating date using the timezone of the local machine running this code instead of the timezone of the dataset itself. If so, that may explain why we're losing Jan 1 1980 (but funny, as I'd expect we'd also lose the leading date in each of the GCM time chunks...).

7_drivers_munge/src/GCM_driver_utils.R

jordansread · 2022-01-06T14:04:26Z

7_drivers_munge/src/GCM_driver_utils.R

+#' creates a file with the exact same name, except that the "_raw" part of the
+#' `in_file` filepath is replaced with "_daily".
+#' @param in_file filepath to a feather file containing the hourly geoknife data
+munge_to_glm <- function(in_file) {


might want to call this munge_notaro_to_glm since this function is specific to a particular set of variables and unites.

In the future, might want to pull out the generic munging components into a separate function and keep the munge prep work that is specific to a particular source in a unique function. (e.g., could use the same generic for NLDAS and Notaro GCMs, but unique prep functions for both). But probably not important now.

Hmmm what would you say are the "generic munging components" here? Feels like most are Notaro specific

I see generic as: going from hourly to daily, writing the files, and perhaps some of the variable changes. Probably good to just ignore this since I agree there isn't a clear divide and NLDAS isn't handled here currently anyhow.

_targets.R

hcorson-dosch-usgs · 2022-01-06T16:13:35Z

7_drivers_munge/src/GCM_driver_nc_utils.R

+#' @param dim_time_input vector of GCM driver data dates
+#' @param dim_cell_input vector of all GCM driver grid cells (whether or not data was pulled)
+#' @param vars_info variables and descriptions to store in NetCDF
+#' @param global_att global attribute description for the observations (e.g. notaro_ACCESS_1980_1999)


Looks like you may have tweaked the format of this global attribute description since providing this example.

Oh yes. I should probably just delete all of this documentation since I anticipate arguments passed to the function will change immensely with #252 but this runs in the pipeline, so was keeping for now.

lindsayplatt · 2022-01-06T16:13:57Z

Commenting just to say that I am seeing all these comments and starting to work through them! Thanks for these very helpful reviews @jread-usgs and @hcorson-dosch

7_drivers_munge/src/GCM_driver_utils.R

hcorson-dosch-usgs

Okay - done with my review! I looked over the targets code as well as the munge_glm() function. Thanks for all of the detailed comments and function descriptions, Lindsay. Those really helped. I think this is looking great, and will be good to merge in once the unit conversions are finalized.

lindsayplatt · 2022-01-06T18:01:46Z

…ctly converted so far

lindsayplatt · 2022-01-06T22:42:05Z

Pausing for now (just relative humidity and tz stuff left) because the updated downscaled, debiased GCM data might make some of this moot. See #273

lindsayplatt · 2022-01-07T18:35:10Z

Decided that merging this, but noting those two incomplete items - timezone issues & correct relative humidity conversions - is the best way to keep moving forward. Given the new GCMs are already daily and mostly in the units we need already, these two concerns likely won't be present anyways.

lindsayplatt added 17 commits December 7, 2021 12:03

munge GCM output into NetCDF files

66a9c42

add steps to convert from hourly timestep to daily

66a8312

skip dplyr grouped by message in console

bdc452e

fix merge conflicts

c3fd697

Merge branch 'gcm_driver_data_munge_pipeline' of github.com:USGS-R/lake-temperature-model-prep into gcm_driver_data_munge_pipeline # Conflicts: # _targets.R

scale GDP downloads to more time periods

e294000

refine GCM munging code to convert variables into appropriate values …

23f801f

…and units

smarter way to extract the gcm names from the filename

c6e9bc4

temporary target for saving feather files instead of nc files

e966b18

already expanded the timezones

0fd9c76

expand time range

5619306

continue to download the GCMs, even if one failed

08e1dcd

build table with variable info

44e6fae

scale up to all GCMs + add retry() to GDP downloads

95e8343

inclde the hashes in the GCM grouped table for branch files to trigge…

7c54e68

…r rebuilds appropriately

create a single vector with ALL the dates

6b525bc

correct the target name storing variable info

4cb5908

remove initial NetCDF build attempt and insert placeholder to target fxn

51e15bf

lindsayplatt marked this pull request as ready for review January 5, 2022 20:46

remove old code

10f14a7

lindsayplatt commented Jan 5, 2022

View reviewed changes

7_drivers_munge/src/GCM_driver_utils.R Outdated Show resolved Hide resolved

lindsayplatt requested review from jordansread and hcorson-dosch-usgs January 5, 2022 20:48

hcorson-dosch-usgs reviewed Jan 5, 2022

View reviewed changes

_targets.R Outdated Show resolved Hide resolved

hcorson-dosch-usgs reviewed Jan 5, 2022

View reviewed changes

_targets.R Outdated Show resolved Hide resolved

hcorson-dosch-usgs reviewed Jan 5, 2022

View reviewed changes

_targets.R Outdated Show resolved Hide resolved

hcorson-dosch-usgs reviewed Jan 5, 2022

View reviewed changes

_targets.R Outdated Show resolved Hide resolved

hcorson-dosch-usgs reviewed Jan 5, 2022

View reviewed changes

_targets.R Outdated Show resolved Hide resolved

hcorson-dosch-usgs reviewed Jan 5, 2022

View reviewed changes

_targets.R Show resolved Hide resolved

jordansread reviewed Jan 6, 2022

View reviewed changes

hcorson-dosch-usgs reviewed Jan 6, 2022

View reviewed changes

_targets.R Show resolved Hide resolved

hcorson-dosch-usgs reviewed Jan 6, 2022

View reviewed changes

_targets.R Show resolved Hide resolved

hcorson-dosch-usgs reviewed Jan 6, 2022

View reviewed changes

7_drivers_munge/src/GCM_driver_utils.R Show resolved Hide resolved

hcorson-dosch-usgs reviewed Jan 6, 2022

View reviewed changes

7_drivers_munge/src/GCM_driver_utils.R Show resolved Hide resolved

hcorson-dosch-usgs reviewed Jan 6, 2022

View reviewed changes

lindsayplatt added 4 commits January 6, 2022 12:15

simple updates - comments, variable descriptions, and function names

c1af577

convert pr to rainfall rate

047a8fd

add details about the conversions for those vars that have been corre…

bcb17fd

…ctly converted so far

add units validation step to munge_notaro_to_glm()

38edf14

lindsayplatt mentioned this pull request Jan 7, 2022

Switch to debiased GCMs #273

Closed

6 tasks

lindsayplatt merged commit cb58c82 into DOI-USGS:gcm_driver_data_munge_pipeline Jan 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Munge GCM downloads into GLM-ready variables #266

Munge GCM downloads into GLM-ready variables #266

lindsayplatt commented Dec 21, 2021 •

edited

Loading

lindsayplatt commented Jan 5, 2022

hcorson-dosch-usgs Jan 5, 2022

jordansread left a comment

jordansread Jan 6, 2022

jordansread Jan 6, 2022

hcorson-dosch-usgs Jan 6, 2022

lindsayplatt Jan 6, 2022

lindsayplatt Jan 6, 2022

hcorson-dosch-usgs Jan 6, 2022 •

edited

Loading

jordansread Jan 6, 2022

jordansread Jan 6, 2022

jordansread Jan 6, 2022

jordansread Jan 6, 2022

lindsayplatt Jan 6, 2022

jordansread Jan 6, 2022

hcorson-dosch-usgs Jan 6, 2022

lindsayplatt Jan 6, 2022

lindsayplatt commented Jan 6, 2022

hcorson-dosch-usgs left a comment

lindsayplatt commented Jan 6, 2022 •

edited

Loading

lindsayplatt commented Jan 6, 2022

lindsayplatt commented Jan 7, 2022 •

edited

Loading

Munge GCM downloads into GLM-ready variables #266

Munge GCM downloads into GLM-ready variables #266

Conversation

lindsayplatt commented Dec 21, 2021 • edited Loading

lindsayplatt commented Jan 5, 2022

Choose a reason for hiding this comment

jordansread left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hcorson-dosch-usgs Jan 6, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lindsayplatt commented Jan 6, 2022

hcorson-dosch-usgs left a comment

Choose a reason for hiding this comment

lindsayplatt commented Jan 6, 2022 • edited Loading

lindsayplatt commented Jan 6, 2022

lindsayplatt commented Jan 7, 2022 • edited Loading

lindsayplatt commented Dec 21, 2021 •

edited

Loading

hcorson-dosch-usgs Jan 6, 2022 •

edited

Loading

lindsayplatt commented Jan 6, 2022 •

edited

Loading

lindsayplatt commented Jan 7, 2022 •

edited

Loading