Generate input tables #7

kdorheim · 2020-05-28T17:01:53Z

@crvernon and @bpbond this is a larger PR than I would have liked and plan on doing smaller PRs in the future. However I felt a PR of this size was needed to give an idea of the package structure.

The objective here is to create the ability to generate Hector input csv files and ini files for the CMIP6 scenarios with a minimal effort from a user. Right now it is quite a hassle and has been a reoccurring problem for the RCMIP and hector calibration work. So this should be something that is ideally easy for any user to use and makes our lives easier when we have to generate new scenarios in the future.

In this PR there are helper functions that would be useful for developers/advanced users that are trying to generate their own hector inputs. But I think that the average Hector user would be interacting with generate function, which would allow users to generate the Hector csv inputs and ini files (not implemented yet) that are canonical aka the RCP, SPPs, and DECK scenarios.

If you have comments, concerns, or would like to chat before looking over this PR please let me know. I look forward to working with the both of you on this and hearing your feedback. Thank you very much!

…MIP6 inputs.

kdorheim · 2020-05-28T17:41:51Z

@bpbond and @crvernon the tests are failing on git hub actions because I don't have the rpackageutils importing properly for the tests. Do you want me to try and work that out before of after you take a look at the PR?

bpbond · 2020-05-29T15:27:22Z

@kdorheim Re failing tests, no worries for now, thanks. Will look at this shortly.

bpbond

Lots of goodness here @kdorheim – nice work! – though I think many opportunities to clarify code, improve comments, and improve robustness.

bpbond · 2020-05-29T15:31:51Z

R/convert_RCMIP.R

+  # Make sure data exists for the scenario(s) selected to process. 
+  data_scns <- unique(emiss_data$Scenario, conc_data$Scenario)
+  missing   <- !scenario %in% data_scns
+  assertthat::assert_that(sum(missing) == 0,


It might be clearer to say

available <- scenario %in% data_scns assert_that(all(available), ...)

bpbond · 2020-05-29T15:38:54Z

R/convert_RCMIP.R

+  # unit, ect. These columns will be used to transform the data from being wide to long so that each row
+  # corresponds to concenration for a specific year.
+  id_vars <- which(!grepl(pattern = "[[:digit:]]{4}", x = names(conc_data)))
+  conc_long <- data.table::melt.data.table(data = conc_data, id.vars = id_vars,


Consider putting importFrom data.table melt.table.data in the header

bpbond · 2020-05-29T15:39:10Z

R/convert_RCMIP.R

+
+  # Determine the columns that contain identifier information, such as the model, scneairo, region, variable,
+  # unit, ect. These columns will be used to transform the data from being wide to long so that each row
+  # corresponds to concenration for a specific year.


"concentration"

bpbond · 2020-05-29T15:39:50Z

R/convert_RCMIP.R

+  # Determine the columns that contain identifier information, such as the model, scneairo, region, variable,
+  # unit, ect. These columns will be used to transform the data from being wide to long so that each row
+  # corresponds to concenration for a specific year.
+  id_vars <- which(!grepl(pattern = "[[:digit:]]{4}", x = names(conc_data)))


Are the columns just years? If yes perhaps make the pattern more specific, i.e. "^[[:digit:]]{4}$"

bpbond · 2020-05-29T15:45:42Z

R/convert_RCMIP.R

+                                            variable.factor = FALSE)
+
+  # Concatenate the long emissions and concetnration data tables together and subset so that
+  # only the scenarios of intrest will be converted. Remove the NA entries that arose when converted from


"concentration...interest...converting"

bpbond · 2020-05-29T16:10:33Z

R/helper_fxns.R

+#' @return a formated unit string
+#' @author Alexey Shiklomanov
+#' @noRd 
+parse_chem <- function(unit) {


An internal comment or two might be useful...a bit hard to follow this code

bpbond · 2020-05-29T16:11:55Z

R/helper_fxns.R

+}
+
+
+#' Drop " " from the begning of strings


"beginning"

bpbond · 2020-05-29T16:12:30Z

R/helper_fxns.R

+  for(i in cols){
+
+    assertthat::assert_that(is.character(df[[i]]) | is.factor(df[[i]]))
+    df[[i]] <-  gsub(pattern = '^ ', replacement = '', x = df[[i]])


Could this be greatly simplified by using base R's trimws function?

bpbond · 2020-05-29T16:13:28Z

R/helper_fxns.R

+  # TODO add some sort of method to make sure that the data frame contains all of the required 
+  # emissions or constraints. Otherwise errors will not be triggered until trying to run the 
+  # Hector core. 
+  assertthat::assert_that(sum(emis, conc)  == 1, msg = 'input data should include either emissions or constrained data not both.')


Definitely time to importFrom assertthat assert_that I'd say.

bpbond · 2020-05-29T16:14:18Z

R/helper_fxns.R

+
+  # Transform the data frame into the wide format that Hector expects. 
+  input_data <- x[ , list(Date = year, variable, value)]
+  input_data <- dcast(input_data, Date ~ variable, value.var = 'value') 


dcast? Where is this coming from?

kdorheim · 2020-06-03T18:55:18Z

@bpbond thanks for the suggestions @crvernon whenever you have a chance to take a look at this that would be great!

crvernon

@kdorheim Great work! The changes you made for @bpbond were great. The following are a few high-level comments to go along with what is inline:

Some functions duplicate quite a bit of functionality for different constraints. For example, from R/generate_fxns.R the generate_input_tables function could be broken down into (1) a function that processes a constraint being passed, and (2) a function that uses function (1) to process each constraint and then return your tables. This allows you to reduce the size of your codebase, reduce the possibility for error since you only have to make changes to a block of code when needed, and
allow you to write succinct tests that target specific functionality.
Remove hard-coded values in functions where possible. These could either be passed in through a YAML config file or set as defaults for arguments. This will prevent folks for having to mess with your code when something like a new year range is needed or a header changes in a file.

crvernon · 2020-06-03T22:00:01Z

DESCRIPTION

@@ -18,4 +18,5 @@ LazyData: true
 Roxygen: list(markdown = TRUE)
 RoxygenNote: 7.1.0
 Suggests: 
-    testthat


Just curious as to why you are not specifying a version constraint for data.table or zoo?

ah! because I forgot 😬 thanks for pointing that out.

crvernon · 2020-06-03T22:05:02Z

R/convert_RCMIP.R


+  # Remove trailing spaces from the RCMIP inputs. 
+  cols_to_modify <- which(names(raw_inputs) %in% c("Model", "Scenario", "Region", "Variable", "Unit", "Mip_Era"))


Will these column names ever change? If so, then they should be passed into the function or read in from a YAML file. This comment applies to all hard-coded names thereafter and expected_year value that on line 81 that could be a default.

They shouldn't ever change, they are really only relevant to the RCMIP files because of some funky formatting.

kdorheim · 2020-06-05T14:07:37Z

Some functions duplicate quite a bit of functionality for different constraints. For example, from R/generate_fxns.R the generate_input_tables function could be broken down into (1) a function that processes a constraint being passed, and (2) a function that uses function (1) to process each constraint and then return your tables. This allows you to reduce the size of your codebase, reduce the possibility for error since you only have to make changes to a block of code when needed, and
allow you to write succinct tests that target specific functionality.

hmmm this is true and what I was aiming to do 🙈... Which is why convert_rcmipCMIP6_hector is separate from save_hector_table and wrapped inside generate_input_tables hmmm.. it is a tad confusing though I'll go back to the drawing board to try to stream some of the code. Thanks!

kdorheim · 2020-06-05T16:19:59Z

@crvernon and @bpbond thanks for taking a look at this! I really appreciate your input, I'm struggling with getting package checks to pass because of a dependency with assertthat. But as soon as that passes I'm going to merge this and start working on the functions that will be used to set up the ini files.

…tforms

codecov-commenter · 2020-06-12T16:18:07Z

Codecov Report

Merging #7 into master will increase coverage by 38.52%.
The diff coverage is 93.06%.

@@             Coverage Diff             @@
##           master       #7       +/-   ##
===========================================
+ Coverage   54.54%   93.06%   +38.52%     
===========================================
  Files           1        3        +2     
  Lines          11      101       +90     
===========================================
+ Hits            6       94       +88     
- Misses          5        7        +2

Impacted Files	Coverage Δ
R/generate_fxns.R	`92.85% <92.85%> (ø)`
R/helper_fxns.R	`93.02% <93.02%> (ø)`
R/convert_RCMIP.R	`93.33% <93.33%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e9b6ea1...b6295ab. Read the comment docs.

kdorheim added 8 commits May 28, 2020 10:28

Add the conversion table for RCMIP inputs.

428dd5d

Add data.table dependencies

2a221f3

Ignore planning files.

dd302d6

Add functions that will help with the conversion of RCMIP to hector C…

63afe96

…MIP6 inputs.

update tests

f4dacd3

Changes to help pass tests

f73f7e5

Add function to converthe RCMIP results.

5c0feed

Add generate wrapper

d31a53d

kdorheim requested review from bpbond and crvernon May 28, 2020 17:01

bpbond reviewed May 29, 2020

View reviewed changes

Changes in response to BBL PR.

9428c1b

crvernon reviewed Jun 3, 2020

View reviewed changes

kdorheim added 3 commits June 5, 2020 10:16

Remove legacy script.

620b224

Remove the renv files in attempt to fix the check that.

655eec3

Streamline code and remove dependencies and todos.

00d02d2

kdorheim added 9 commits June 11, 2020 23:13

Remove renv

db3afec

Udate the github actions

7c38c2d

Add dependencies to make passing easier.

7994ffb

Update github actions

0124818

github actions from scratch

5b4d42a

Updating remotes information to fix the github actions issues

0f27596

spciefy CRAN mirror

54a35a3

Add biogas

9436415

Fix typo

d625afd

kdorheim added 4 commits June 12, 2020 10:43

Add coverage test and update the RCMD check to build on different pla…

ba1edf4

…tforms

Fix ymal syntax error

735a2e3

Specify CRAN mirrior

ce43136

clean up github actions

b6295ab

kdorheim merged commit 5757ab1 into master Jun 12, 2020

kdorheim deleted the generate_input_tables branch June 12, 2020 16:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate input tables #7

Generate input tables #7

kdorheim commented May 28, 2020

kdorheim commented May 28, 2020

bpbond commented May 29, 2020

bpbond left a comment

bpbond May 29, 2020

kdorheim Jun 1, 2020

bpbond May 29, 2020

bpbond May 29, 2020

bpbond May 29, 2020

bpbond May 29, 2020

bpbond May 29, 2020

bpbond May 29, 2020

bpbond May 29, 2020

bpbond May 29, 2020 •

edited

Loading

bpbond May 29, 2020

kdorheim commented Jun 3, 2020

crvernon left a comment

crvernon Jun 3, 2020

kdorheim Jun 5, 2020

crvernon Jun 3, 2020

kdorheim Jun 5, 2020

kdorheim commented Jun 5, 2020

kdorheim commented Jun 5, 2020

codecov-commenter commented Jun 12, 2020


		# Remove trailing spaces from the RCMIP inputs.
		cols_to_modify <- which(names(raw_inputs) %in% c("Model", "Scenario", "Region", "Variable", "Unit", "Mip_Era"))

Generate input tables #7

Generate input tables #7

Conversation

kdorheim commented May 28, 2020

kdorheim commented May 28, 2020

bpbond commented May 29, 2020

bpbond left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bpbond May 29, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdorheim commented Jun 3, 2020

crvernon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kdorheim commented Jun 5, 2020

kdorheim commented Jun 5, 2020

codecov-commenter commented Jun 12, 2020

Codecov Report

bpbond May 29, 2020 •

edited

Loading