-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate input tables #7
Conversation
@kdorheim Re failing tests, no worries for now, thanks. Will look at this shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of goodness here @kdorheim – nice work! – though I think many opportunities to clarify code, improve comments, and improve robustness.
R/convert_RCMIP.R
Outdated
# Make sure data exists for the scenario(s) selected to process. | ||
data_scns <- unique(emiss_data$Scenario, conc_data$Scenario) | ||
missing <- !scenario %in% data_scns | ||
assertthat::assert_that(sum(missing) == 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be clearer to say
available <- scenario %in% data_scns
assert_that(all(available), ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes!
R/convert_RCMIP.R
Outdated
# unit, ect. These columns will be used to transform the data from being wide to long so that each row | ||
# corresponds to concenration for a specific year. | ||
id_vars <- which(!grepl(pattern = "[[:digit:]]{4}", x = names(conc_data))) | ||
conc_long <- data.table::melt.data.table(data = conc_data, id.vars = id_vars, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider putting importFrom data.table melt.table.data
in the header
R/convert_RCMIP.R
Outdated
|
||
# Determine the columns that contain identifier information, such as the model, scneairo, region, variable, | ||
# unit, ect. These columns will be used to transform the data from being wide to long so that each row | ||
# corresponds to concenration for a specific year. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"concentration"
R/convert_RCMIP.R
Outdated
# Determine the columns that contain identifier information, such as the model, scneairo, region, variable, | ||
# unit, ect. These columns will be used to transform the data from being wide to long so that each row | ||
# corresponds to concenration for a specific year. | ||
id_vars <- which(!grepl(pattern = "[[:digit:]]{4}", x = names(conc_data))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the columns just years? If yes perhaps make the pattern more specific, i.e. "^[[:digit:]]{4}$"
R/convert_RCMIP.R
Outdated
variable.factor = FALSE) | ||
|
||
# Concatenate the long emissions and concetnration data tables together and subset so that | ||
# only the scenarios of intrest will be converted. Remove the NA entries that arose when converted from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"concentration...interest...converting"
#' @return a formated unit string | ||
#' @author Alexey Shiklomanov | ||
#' @noRd | ||
parse_chem <- function(unit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An internal comment or two might be useful...a bit hard to follow this code
R/helper_fxns.R
Outdated
} | ||
|
||
|
||
#' Drop " " from the begning of strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"beginning"
R/helper_fxns.R
Outdated
for(i in cols){ | ||
|
||
assertthat::assert_that(is.character(df[[i]]) | is.factor(df[[i]])) | ||
df[[i]] <- gsub(pattern = '^ ', replacement = '', x = df[[i]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be greatly simplified by using base R's trimws
function?
R/helper_fxns.R
Outdated
# TODO add some sort of method to make sure that the data frame contains all of the required | ||
# emissions or constraints. Otherwise errors will not be triggered until trying to run the | ||
# Hector core. | ||
assertthat::assert_that(sum(emis, conc) == 1, msg = 'input data should include either emissions or constrained data not both.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely time to importFrom assertthat assert_that
I'd say.
R/helper_fxns.R
Outdated
|
||
# Transform the data frame into the wide format that Hector expects. | ||
input_data <- x[ , list(Date = year, variable, value)] | ||
input_data <- dcast(input_data, Date ~ variable, value.var = 'value') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dcast
? Where is this coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kdorheim Great work! The changes you made for @bpbond were great. The following are a few high-level comments to go along with what is inline:
-
Some functions duplicate quite a bit of functionality for different constraints. For example, from
R/generate_fxns.R
thegenerate_input_tables
function could be broken down into (1) a function that processes a constraint being passed, and (2) a function that uses function (1) to process each constraint and then return your tables. This allows you to reduce the size of your codebase, reduce the possibility for error since you only have to make changes to a block of code when needed, and
allow you to write succinct tests that target specific functionality. -
Remove hard-coded values in functions where possible. These could either be passed in through a YAML config file or set as defaults for arguments. This will prevent folks for having to mess with your code when something like a new year range is needed or a header changes in a file.
@@ -18,4 +18,5 @@ LazyData: true | |||
Roxygen: list(markdown = TRUE) | |||
RoxygenNote: 7.1.0 | |||
Suggests: | |||
testthat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious as to why you are not specifying a version constraint for data.table
or zoo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah! because I forgot 😬 thanks for pointing that out.
R/convert_RCMIP.R
Outdated
|
||
# Remove trailing spaces from the RCMIP inputs. | ||
cols_to_modify <- which(names(raw_inputs) %in% c("Model", "Scenario", "Region", "Variable", "Unit", "Mip_Era")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will these column names ever change? If so, then they should be passed into the function or read in from a YAML file. This comment applies to all hard-coded names thereafter and expected_year
value that on line 81 that could be a default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They shouldn't ever change, they are really only relevant to the RCMIP files because of some funky formatting.
hmmm this is true and what I was aiming to do 🙈... Which is why |
@crvernon and @bpbond thanks for taking a look at this! I really appreciate your input, I'm struggling with getting package checks to pass because of a dependency with assertthat. But as soon as that passes I'm going to merge this and start working on the functions that will be used to set up the ini files. |
Codecov Report
@@ Coverage Diff @@
## master #7 +/- ##
===========================================
+ Coverage 54.54% 93.06% +38.52%
===========================================
Files 1 3 +2
Lines 11 101 +90
===========================================
+ Hits 6 94 +88
- Misses 5 7 +2
Continue to review full report at Codecov.
|
@crvernon and @bpbond this is a larger PR than I would have liked and plan on doing smaller PRs in the future. However I felt a PR of this size was needed to give an idea of the package structure.
The objective here is to create the ability to generate Hector input csv files and ini files for the CMIP6 scenarios with a minimal effort from a user. Right now it is quite a hassle and has been a reoccurring problem for the RCMIP and hector calibration work. So this should be something that is ideally easy for any user to use and makes our lives easier when we have to generate new scenarios in the future.
In this PR there are helper functions that would be useful for developers/advanced users that are trying to generate their own hector inputs. But I think that the average Hector user would be interacting with generate function, which would allow users to generate the Hector csv inputs and ini files (not implemented yet) that are canonical aka the RCP, SPPs, and DECK scenarios.
If you have comments, concerns, or would like to chat before looking over this PR please let me know. I look forward to working with the both of you on this and hearing your feedback. Thank you very much!