Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Munge GCM downloads into GLM-ready variables #266

Conversation

lindsayplatt
Copy link
Contributor

@lindsayplatt lindsayplatt commented Dec 21, 2021

In addition to the munging, this adds

Putting this up here now but just want it to sit for now. Need to do the following before it can be reviewed:

@lindsayplatt lindsayplatt marked this pull request as ready for review January 5, 2022 20:46
@lindsayplatt
Copy link
Contributor Author

@jread-usgs looking for your review specifically on the munge_to_glm() function but feel free to comment on other aspects. @hcorson-dosch looking for your review on my targets code. Note that I deleted all of my old NetCDF munging code, but left the top-level function, generate_gcm_nc(), there (and all its params) but it just saves a blank .nc file for now (this will get filled out in #252

_targets.R Outdated Show resolved Hide resolved
_targets.R Outdated Show resolved Hide resolved
_targets.R Outdated Show resolved Hide resolved
_targets.R Outdated Show resolved Hide resolved
_targets.R Outdated Show resolved Hide resolved
#' @description the final GCM driver data will need to be daily, but geoknife
#' returns hourly values. This step summarizes the data into daily values. It
#' creates a file with the exact same name, except that the "_raw" part of the
#' `in_file` filepath is replaced with "_daily".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add the conversion steps to this function description

Copy link

@jordansread jordansread left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I focused on the munge_to_glm function.

) %>%

# Simply rename GDP variables into GLM variables
mutate(RelHum = qas,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RelHum isn't equivalent to qas, so conversion is needed here. To calculate RelHum, you need to specific humidity, pressure, and air temperature (see here). See helper function here

ps is the surface air pressure in hPa (see here) and would need to be part of your download and would need to be converted to mb from hPa to use that helper function above.

# Convert from hourly to daily data
group_by(time, cell) %>%
# TODO: should we `na.rm = TRUE`?
summarize(across(.cols = -WindSpeed, .fns = ~ mean(.x, na.rm = FALSE)),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice way of doing this 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed! This is a really clean approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets a bit muddied by the precip need - "meters/day". Wouldn't this need a sum of the hourly rate and not a mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can just follow the pattern I did for WindSpeed where it has its own method defined.

Copy link
Contributor

@hcorson-dosch-usgs hcorson-dosch-usgs Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's a rate, in units of m/day already, rather than the depth value that some models use, I think using mean() is fine, right @jread-usgs ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, mean would be equivalent in this case. If it were m/hour though, we'd need to treat it differently

n = length(time),
.groups = "keep") %>%
ungroup() %>%
# This drops Jan 1, 1980

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm...we lose the first day of the year because it isn't a complete day? Bummer. Are we working in the appropriate timezone for that summation?

_targets.R Outdated Show resolved Hide resolved
) %>%

# Create a column with just the date to use for summarizing
mutate(date = as.Date(DateTime)) %>%

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not running this so I can't tell quite what you are seeing here, but I am wondering if we're calculating date using the timezone of the local machine running this code instead of the timezone of the dataset itself. If so, that may explain why we're losing Jan 1 1980 (but funny, as I'd expect we'd also lose the leading date in each of the GCM time chunks...).

7_drivers_munge/src/GCM_driver_utils.R Outdated Show resolved Hide resolved
#' creates a file with the exact same name, except that the "_raw" part of the
#' `in_file` filepath is replaced with "_daily".
#' @param in_file filepath to a feather file containing the hourly geoknife data
munge_to_glm <- function(in_file) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to call this munge_notaro_to_glm since this function is specific to a particular set of variables and unites.

In the future, might want to pull out the generic munging components into a separate function and keep the munge prep work that is specific to a particular source in a unique function. (e.g., could use the same generic for NLDAS and Notaro GCMs, but unique prep functions for both). But probably not important now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm what would you say are the "generic munging components" here? Feels like most are Notaro specific

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see generic as: going from hourly to daily, writing the files, and perhaps some of the variable changes. Probably good to just ignore this since I agree there isn't a clear divide and NLDAS isn't handled here currently anyhow.

#' @param dim_time_input vector of GCM driver data dates
#' @param dim_cell_input vector of all GCM driver grid cells (whether or not data was pulled)
#' @param vars_info variables and descriptions to store in NetCDF
#' @param global_att global attribute description for the observations (e.g. notaro_ACCESS_1980_1999)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you may have tweaked the format of this global attribute description since providing this example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes. I should probably just delete all of this documentation since I anticipate arguments passed to the function will change immensely with #252 but this runs in the pipeline, so was keeping for now.

@lindsayplatt
Copy link
Contributor Author

Commenting just to say that I am seeing all these comments and starting to work through them! Thanks for these very helpful reviews @jread-usgs and @hcorson-dosch

Copy link
Contributor

@hcorson-dosch-usgs hcorson-dosch-usgs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay - done with my review! I looked over the targets code as well as the munge_glm() function. Thanks for all of the detailed comments and function descriptions, Lindsay. Those really helped. I think this is looking great, and will be good to merge in once the unit conversions are finalized.

@lindsayplatt
Copy link
Contributor Author

lindsayplatt commented Jan 6, 2022

Working my way through the following list after combing through your reviews:

  • change fxn name to include "Notaro"
  • add more description about the unit conversions in the munge fxn descr
  • edit rain conversion to be meters/day
  • edit relhum conversion (need to download additional variable from GDP)
  • investigate conversion of datetimes to dates (check tz etc)
  • see if tz fix means we are still losing day 1
  • add variable descriptions
  • update comment about downloading to mention each time period, too
  • fix glm_vars_info to be correct order
  • add appropriate units to glm_vars_info
  • add check for units assumptions coming back from GDP

@lindsayplatt
Copy link
Contributor Author

Pausing for now (just relative humidity and tz stuff left) because the updated downscaled, debiased GCM data might make some of this moot. See #273

@lindsayplatt
Copy link
Contributor Author

lindsayplatt commented Jan 7, 2022

Decided that merging this, but noting those two incomplete items - timezone issues & correct relative humidity conversions - is the best way to keep moving forward. Given the new GCMs are already daily and mostly in the units we need already, these two concerns likely won't be present anyways.

@lindsayplatt lindsayplatt mentioned this pull request Jan 7, 2022
6 tasks
@lindsayplatt lindsayplatt merged commit cb58c82 into DOI-USGS:gcm_driver_data_munge_pipeline Jan 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants