Format Record db.query (Shiny Ingest App) #2012

LiamBurke24 · 2018-07-24T02:18:54Z

This PR adds functionality to insert a format record and an optional format-variables record in the BETY db. The emphasis here is that both of these records can be created simultaneously improving the ease of registering new data with BETY.

Description

I included the insert.format.vars function that allows the user to register a format record in the BETY "formats" and "format_variables" tables. This query includes all the input fields found in the BETY GUI: name, header, skip, mimetype_id, and notes.

Additionally it includes the following inputs for the format-variables record: name, units, storage type, column number, format_id, variable_id.

Motivation and Context

This change adds more information to the formats table than the db.query included in the dbfiles.input.insert function. More importantly, it lays the groundwork for completing the shiny app with a GUI for registering the format-variables record.

Review Time Estimate

Immediately
Within one week
When possible

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My change requires a change to the documentation.
I have updated the CHANGELOG.md.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.
All new and existing tests passed.

… format record.

mdietze · 2018-07-24T10:10:47Z

CHANGELOG.md

  - "Test BETY" button allows users create a record in BETY with `dbfile.input.insert`
+  - Added separate `db.query` to create format record. 


Would be more helpful if you named the new function in the log, rather than just referencing the functions called within your new function

I will change this when I rewrite the function db.files.

mdietze · 2018-07-24T10:11:03Z

shiny/Data-Ingest/server_files/ingest_workflow_svr.R


  output$FormatRecordOut <- renderPrint({print(FormatRecordList)})

+  cmd <- paste0("INSERT INTO formats ",


This needs to be it's own function in the DB package, not here in the shiny app.

This function needs to return the new Format ID.

I'm fairly confident you don't need to include the NOW() bits -- the existence of that in other functions is a legacy that I thought had been cleaned up, but evidently not as you're emulating it

This function also needs to take an (optional) formats-variables table and insert all those relationships.

Got it. I can drop the NOW() bits, but there are columns for date created and date updated in the formats table. Just want to confirm that you want to lose that functionality?

Would it be easier/ more efficient to create a separate format-variables table query since I'd be writing to two separate tables? Just double checking before I get started. Thanks!

NOW: I'm 99% sure that those will be updated by postGRES automatically if you don't manually set them. I'd drop and then confirm that they're being set.

Creating Formats and Formats-Variables separately is exactly the pain in arse that we currently have to deal with in the BETY webpage and specifically what we're trying to avoid. Exactly analogous to why we create dbfiles and inputs at the same time.

NOW: sounds good.

F-V: Ok got it. Just to clarify, by the same time, you mean in the same function?

Would you like me to resolve the mimetype_id in this new db.files function? I am currently resolving the ID within the shiny app. Thanks!

Yes, in the same function

mimetype_id would need to be an input to the function

OK! This revision should have all the changes you asked for!

…riables record

… Fixed query = query. now query = cmd.

…global environment and write to list as before. Replace manual db query with dbfile.format.variable.insert

mdietze · 2018-07-25T09:30:49Z

shiny/Data-Ingest/server_files/ingest_workflow_svr.R

+  FormatRecordList$NewFormatName <- "Test_Format_Name_99" #input$NewFormatName
+  FormatRecordList$HeaderBoolean <- "TRUE" #ifelse((input$HeaderBoolean == "Yes"), "TRUE", "FALSE")
+  FormatRecordList$SkipLines <- "3" #input$SkipLines #This should appear only if header = TRUE
+  FormatRecordList$FormatNotes <- "These are some test notes" #input$FormatNotes


Looks like all of these are still set at testing values.

Got it. Good catch. Fixing this uncovered a bug that took me over an hour to squash!

mdietze · 2018-07-25T09:35:26Z

base/db/R/dbfiles.R

+#' bety <- betyConnect()
+#'   dbfile.format.variable.insert('Test_Format', header = TRUE, skip = 2, mimetype_id = 1060, con = bety$con, format_variables = FALSE)
+#' }
+dbfile.format.variable.insert <- function(format_name, header, skip, mimetype_id, format_notes = "", con, format_variables = FALSE, 


Please drop dbfile from the name as this function isn't inserting anything into that table. Given that this function meant to parrallel query.format.vars, why not call it insert.format.vars?

Also, when a value is missing, such as format_variables we typically set it to NULL not to FALSE. For the remaining values (description, etc), pass these as a data frame to format_variables rather than passing them individually. Passing them individually is a pain and will be error prone (misalignment of vectors)

An additional note on argument order: I recommend putting the con argument before format_notes, so that all the arguments without default values come at the beginning of the call and all the arguments with default values come after them.

This is partly because it's kind to the user to put "important" (~= "have to set them every time" ~= "function can't guess it") arguments first, but also so that unnamed values get matched to the required arguments before the optional ones. As currently written, the function requires a minimum of five arguments but dbfile.format.variable.insert('Test_Format', TRUE, 2, 1060, bety$con) will set format_notes to bety$con and then complain that argument "con" is missing, with no default.

(This second point is also a reason to name all arguments when calling any function as a user, but that ship never sailed in the R world)

@mdietze I included dbfile in the name since it appeared to be a convention for all the query functions in the dbfiles.R document. I can change the name.

@infotroph Thanks. Hahaha. Good point.

mdietze · 2018-07-25T09:41:11Z

base/db/R/dbfiles.R

+dbfile.format.variable.insert <- function(format_name, header, skip, mimetype_id, format_notes = "", con, format_variables = FALSE, 
+                                          description = NA, units = NA, var_notes = NA, var_name = NA, max = NA, min = NA, standard_name = NA, label = NA, type = NA){
+
+  formatid <- get.id(


Really don't follow what you're doing here. It looks like you're using get.id to create the format record, but with only just the name and mimetype. Next, you use a second query to create a DIFFERENT format record, that has header, skip, mimetype_id, notes, name, but doesn't include the returning id necessary to get the format ID. Therefore, this second record with more information is never returned. You need to make a choice between these two ways of creating records and do just one, not both.

@mdietze My mistake. I thought that get.id resolved the format id but I now see that this was a mistake.

can you clarify what I need to do to return the format id from the second type of query?

Never mind. I see this in the comments below. Thanks.

mdietze · 2018-07-25T09:44:26Z

base/db/R/dbfiles.R

+
+    #### Format Variables Query ####
+    if(format_variables == TRUE){
+    cmd_fv <- paste0("INSERT INTO variables ",


This query is wrong. First, you need to loop over all the variables (virtually no files will just have one column). Second, you need to be inserting records into formats_variables not variables as the goal is to associate variables with a specific format. As such you've got to revise all the columns being passed. You also need to use the correct format ID when creating these relationships

Got it. Makes sense.

infotroph · 2018-07-25T11:28:51Z

base/db/R/dbfiles.R

+                   "(header, skip, mimetype_id, notes, name) VALUES (",
+                   header, ", ", skip, ", '", mimetype_id, "', '", format_notes, "','", format_name, "')")
+
+    db.query(query = cmd, con = con)


I see that you're trying to parallel existing code here, but there are two very serious problems here that really ought to be fixed in the existing code (#395) and should not be repeated in new functions:

db.query is a wrapper around DBI::dbGetQuery, which is documented to be for SELECT-like statements only. We should not be using it for inserts and updates.

Manually generating SQL by pasting together strings and variables is extremely likely to produce SQL injection vulnerabilities, and we'll need to be even more cautious of this as we expose more functionality through Shiny and the like.

We don't yet have a fully formed standard method (hence the open status of #395), but in general instead of building an SQL string inside of any PEcAn function we want to assemble the data that we want to insert and then let an existing function to the translation to SQL. When I needed to generate inserts for #1903 I used DBI::dbWriteTable, and I think @ashiklom intended PEcAn.db::db_merge_into to be useful for similar cases.

@infotroph I see you point and agree that the use of raw SQL is potentially dangerous. DBI::dbWriteTable seems like it is a better solution in most instances. That said, it appears that this function only returns TRUE/FALSE and you can't specify the "RETURNING" argument which I've asked @LiamBurke24 to add. I find this to be an unfortunate and odd design limitation.

@mdietze I think I follow, but to be clear: "this function" being DBI::dbWriteTable, and "RETURNING" being the SQL keyword, as in INSERT INTO ... VALUES ... RETURNING id? If so: Yeah, I've wished for that as well. Should we consider adding that to db_merge_into?

Yes, that's what I meant. Would be good if @ashiklom chimed in about db_merge_into since I haven't used that before and I don't completely follow what it's doing internally. But yes, if there's a way to add RETURNING id

In the mean time, what do you want @LiamBurke24 to do immediately? Fields generated via pull-down menus and other queries should be safe (e.g. mimetype ID), but is there a way to properly sanitize anything that's coming in via a text field (e.g. format name)?

So the point of db_merge_into is to try to solve somewhat elegantly what I felt was the very common problem of updating an SQL table with a local table based on one or more "key" columns. The objective is to have a function that, if one were to run it twice with the same arguments, would maybe do something the first time (if the target SQL table was missing rows compared to the local input, those rows would be added) but would basically always do nothing the second time (because the table has already been updated). I was hoping this would allow us to simplify code for adding various things to the database by eliminating the need for logic like the following (which is, effectively, what db_merge_into does using dplyr filtering joins:

1. Get SQL table IDs 2. Subset local table to IDs not in SQL table 3. If local table has any remaining rows after subset, write them to SQL

For an example of this in action, see the unit test.

The db_merge_into function uses insert_table as a backend, which was designed to be a more robust and user-friendly version of DBI::dbWriteTable that also does things like ensure that column types match and automatically subset and align input data to columns that are in the target SQL table.

I was not aware of RETURNING as an SQL statement when I came up with this -- that's good to know about for the future. We could add it to db_merge_into, but I'm not sure it's necessary. db_merge_into currently returns a dplyr "lazy SQL query" data.frame (which can easily be loaded in memory with just a dplyr::collect) which includes the full target SQL table (including ID columns) inner-joined with the input data. So the IDs and any other columns of interest from the SQL table are already in the output.

We don't yet have a fully formed standard method (hence the open status of #395), but in general instead of building an SQL string inside of any PEcAn function we want to assemble the data that we want to insert and then let an existing function to the translation to SQL.

I strongly agree with this. More generally, creating a well-documented API for inserting data into bety should be a top priority for our next releases, as it could make a lot of tasks much easier and safer.

@LiamBurke24 you want to try revising to use db_merge_into? Sound like this would also eliminate the need for a loop in the formats_variables insert.

@LiamBurke24 I tried to make db_merge_into easy to use, but ping me on Slack if you have questions. The general idea is to first construct a data.frame that looks something like your target SQL table (in this case, formats_variables) in terms of column names, and then specify which columns determine the uniqueness of the join via a character vector by argument. E.g.

input_formats <- data.frame( header = header, skip = skip, mimetype_id = mimetype_id, notes = format_notes, name = format_name ) inserted_data <- db_merge_into(input_formats, "formats_variables", con, by = c("format_name", "mimetype_id"))

The by argument is optional -- if omitted, it joins based on all columns the two data frames have in common (i.e. if any of the common columns are different, then it assumes you meant to insert a new record). In practice, you usually only want to check some of the columns -- e.g. above, if something with the same format_name and mimetype_id already exists, then you probably don't want to add a new format with the same name and id but, say, slightly different notes.

Wow. Thanks @mdietze @ashiklom and @infotroph! Yes, I will revise to use db_merge_into. If this works, should I delete the dbfiles.format.variable.insert function? I'm not attached to it I just didn't know if it would be of any use.

Or is the idea that I should use db_merge_into instead of the SQL queries to make a function in the dbfiles.R document? Am I making any sense? haha.

Never mind. This question is addressed below. Sorry.

ashiklom

A few minor suggestions here and there.

ashiklom · 2018-07-25T13:49:55Z

base/db/R/dbfiles.R

@@ -636,3 +636,63 @@ dbfile.id <- function(type, file, con, hostname=PEcAn.remote::fqdn()) {
    invisible(NA)
  }
 }
+
+#' Function to Insert Records into the Formats and Variables Tables
+#' @name dbfile.format.variable.insert


This is an unnecessary roxygen tag. You should remove it.

Also, we already know it's a function, so you should drop the "Function to" part of the title.

Good catch.

ashiklom · 2018-07-25T13:50:26Z

base/db/R/dbfiles.R

+#' @param type ?
+#' @author Liam Burke
+#'
+#' @return formatid


Please add a (brief) description of this. Is it a vector? List? Scalar integer value?

More generally, you should avoid redundancy in your documentation. E.g. TRUE/FALSE Boolean is redundant, as is format_name Format Name. Also, in addition to a general description, I think parameter documentation should generally try to make clear (1) the R type of value (e.g. numeric, factor, character, integer, logical), (2) whether the values have to be a scalar, or can be a vector, and (3) if there are additional format requirements (e.g. units string probably has to be parse-able by udunits2).

Got it. Thanks. This is awesome advice.

ashiklom · 2018-07-25T14:11:32Z

base/db/R/dbfiles.R

+                   "(header, skip, mimetype_id, notes, name) VALUES (",
+                   header, ", ", skip, ", '", mimetype_id, "', '", format_notes, "','", format_name, "')")
+
+    db.query(query = cmd, con = con)


@LiamBurke24 I tried to make db_merge_into easy to use, but ping me on Slack if you have questions. The general idea is to first construct a data.frame that looks something like your target SQL table (in this case, formats_variables) in terms of column names, and then specify which columns determine the uniqueness of the join via a character vector by argument. E.g.

input_formats <- data.frame( header = header, skip = skip, mimetype_id = mimetype_id, notes = format_notes, name = format_name ) inserted_data <- db_merge_into(input_formats, "formats_variables", con, by = c("format_name", "mimetype_id"))

The by argument is optional -- if omitted, it joins based on all columns the two data frames have in common (i.e. if any of the common columns are different, then it assumes you meant to insert a new record). In practice, you usually only want to check some of the columns -- e.g. above, if something with the same format_name and mimetype_id already exists, then you probably don't want to add a new format with the same name and id but, say, slightly different notes.

ashiklom · 2018-07-25T14:16:51Z

shiny/Data-Ingest/server_files/ingest_workflow_svr.R


  output$FormatRecordOut <- renderPrint({print(FormatRecordList)})

+  dbfile.format.variable.insert(header = FormatRecordList$HeaderBoolean, skip = FormatRecordList$SkipLines, mimetype_id = FormatRecordList$NewmimetypeID,
+                                format_notes = FormatRecordList$FormatNotes, format_name = FormatRecordList$NewFormatName, con = bety$con, format_variables = FALSE)
+


Using db_merge_into as discussed above, you may not actually need dbfile.format.variable.insert at all. You may just be able to get away with constructing FormatRecordList as a data.frame instead (with as many rows as you want -- it sounds like you could be adding multiple variables at once, which db_merge_into is specifically designed to handle) and then just insert it all at once with a single db_merge_into statement. A nice advantage of db_merge_into in the context of Shiny is that, by design, it will never insert duplicate rows (at least, with a well-specified by argument), so it can be more-or-less safely called over-and-over in a single session (though there is a bit of computational overhead, so I would avoid a design that calls it hundreds of times).

@mdietze Was this more-or-less what you had in mind?

I think it's still worth having the function, even if it only ends up being 2-3 lines, so that it doesn't have to be recreated in the future. I had asked @LiamBurke24 to make this a function, rather than just having the insert code directly in the shiny app (original version) precisely because I find going through BETY's current multi-step process of creating the format, and then editing that record to associate variables with the format, to be a pain. The fact that it's taking 4 of us having a long discussion to sort out how to do this correctly implies that not having a function could lead to mistakes by others.

Additional point -- I wonder if this function should return the full format-variable object that @bcow has query.format.vars returns rather than just the format ID? I don't think anyone is going to need to create a format and then immediately use it in load_data, but it's worth considering.

@mdietze @ashiklom So we're all on the same page. I am going to use db_merge_into to rewrite the dbfile.format.variable function (which will shortly be renamed to something with out dbfile in the name).

I would also like to clarify what we would like the function to be returning. SHould I start with the format ID and then proceed to the full format-variable object if need be/ time permitting?

Finally, @ashiklom based on what you're saying, db_merge_into looks like a perfect fit for this application.

P.S. where should I be storing this file? In the dbfiles.R document? Or somewhere else?

yes, insert.format.vars is my vote for name

returning format ID is sufficient for now.

put it in it's own file

@mdietze Awesome. Thanks!

Follow-up question: where should I put this new function? Is pecan/base/db/R/ ok?

Sorry, yes it should definitely be in the db package! pecan/base/db/R/

Perfect. Thanks!

…ame and mimetypeID from FormatRecordList to addess a duplicate bug.

… an if else statement

…f/else statements.

…d comments.

…man.

LiamBurke24 · 2018-07-25T23:03:59Z

@mdietze @ashiklom @infotroph Good evening everyone. That took a lot longer than I originally thought. All the changes that I pushed should address your concerns and suggestions. I really appreciate all the fantastic feedback.

…se statement.

…ons.

…s/factors bug, make query dataframe with formatID for formats-variables query, and return formatID.

…vars.

mdietze · 2018-07-30T13:32:25Z

base/db/R/insert.format.vars.R

-#' @param con Bety connection object
+#'
+#' @param con SQL connection to BETYdb
+#' @param format_notes Character string that describes format.


Do these need the format_ part? That seems like a given. Also, it's generally good practice to put the required values before the optional ones, and to put a sensible default on optional variables. So, for example, name and mimetype should come before header, skip, and notes. notes can probably default to NULL, skip to 0, and header to TRUE.

I don't have an issue with axing the format_ part. My thought was that the user may need some extra reinforcement that the names and notes correspond to the formats record and not the variables record.

And good point! Just fixed the order of the arguments and added these default conditions.

mdietze · 2018-07-30T13:42:51Z

base/db/R/insert.format.vars.R

+#' \item name: vector of character strings
+#' \item unit: vector of character strings
+#' \item storage_type: vector of character strings
+#' \item column_number: vector of integers


Similarly, this section should

make it clear which variables are required (variable_id) and which are optional (everything else)

Explain what actually goes in each column (not just its type) and when it's OK to leave values blank

if a column only needs a few values, what should the user put in the empty cells? NULL? NA?

In particular, be aware that name is the variable name in the file (e.g. column head) and only needs to be specified if it's different from the BETY variable name. unit needs to be udunit2 parsable and only needs to be specified if the units are different from the BETY standard. column number only needs to be specified for tabular data that doesn't have a header. storage_type serves 2 purposes. First, it indicates columns stored in a format different from expected (e.g. numbers stored as characters). Second, it stores the POSIX codes used to parse any time variables (e.g. a column with a 4-digit year would be %Y). This last bit should include a "see also" to strptime

Clear on all fronts. These changes are in the most recent commit

mdietze · 2018-07-30T13:49:33Z

base/db/R/insert.format.vars.R

+#' formats_variables_df <- tibble::tibble(
+#'        variable_id = c(327, 2.98e+02), # integer 
+#'        name = c("New Name", "Other New Name"), # 
+#'        unit = c("Yoonits", "Other Units"),


This is a really bad example because:

variable 327 (c2n_root) is unlikely to be in a column named "New Name" and "Yoonits" is most definitely not convertible by udunits2 to the standard units for C:N ratios. Data on C:N ratios are also not going to be used to store the day of year (%j)

variable numbers should not be entered using scientific notation. Given that variable 298 is LE (latent heat flux), the name, units, and storage type entered are just as absurd as the first column.

I apologize for the confusion. This was a placeholder example since I had not made a formal formats-variable record with a real data set at that point. I have now done this and the new example is updated and should address these points.

mdietze · 2018-07-30T13:53:01Z

base/db/R/insert.format.vars.R

+#'        storage_type = c("%j", "%Y"),
+#'        column_number = c(as.integer(22), as.integer(24)),
+#'  )
+#'   insert.format.vars(con = bety$con, format_notes = "Info about format", format_name = "New Format", header = TRUE, skip = 2, mimetype_id = 1090, formats_variables_df)


Improve this example too -- "New Format" is an awful format name, notes is almost as bad (and should probably go last), and make sure to pass the format variable table by name

I also address these suggestions in the new commits.

mdietze · 2018-07-30T13:57:37Z

base/db/R/insert.format.vars.R

+      notes = format_notes,
+      name = format_name,
+      stringsAsFactors = FALSE
+    )


As noted previously, you might want to perform some basic checks on these values before inserting them. For example, if skip is present it needs to be numeric, header needs to be boolean, notes and names need to be character. Mimetype_id needs to be numeric, and if you want to be really good you'll make sure that ID exists before you insert the record. Similarly, you might want to check the format name doesn't already exist. In general, inserting junk into the db creates problems that are remarkably time consuming to clean up, so it's worth the extra effort on inserts to make sure that what you're inserting makes sense.

I perform basic checks on all format types of all inputs and check if the format name exists in the new commits. Can you please clarify what you mean by making sure that the ID exists before inserting the record? Originally I thought you meant the mimetype_id, but this should exist already. I would appreciate some clarification. Thanks!

It's the variable_id that you need to make sure exists. A test for this is implicit in the unit test above

u2 <- dplyr::tbl(con, "variables") %>% dplyr::select(id, units) %>% dplyr::filter(id %in% formats_variables[[1, "variable_id"]]) %>% pull(units)

because this will fail if the variable_id is missing, but that implicit test isn't great as the function will just crash with no error message is being returned

Got it. That was kind of counter intuitive since I keep forgetting that I'm building this function for other applications besides the shiny app. Testing if the variable ID exists is pointless in my app since all the ID's are resolved by the user selecting a variable name.

It actually doesn't make sense in this application since I am not passing the variable name. The user selects BETY variables from a dropdown menu and then the app resolves the variable_id. I included a test but I placed it in a conditional suppress statement so that I can suppress it in my app.

mdietze · 2018-07-30T13:59:29Z

base/db/R/insert.format.vars.R

+    format_id_df <- matrix(data = NA, nrow = n, ncol = 1)
+    for(i in 1:n){
+      format_id_df[i,1] <- format_id
+    }


This for loop is unnecessary, just specify data = format_id (instead of NA) when you create the matrix

Done. Good idea!

mdietze · 2018-07-30T14:04:01Z

base/db/R/insert.format.vars.R

-    ## Insert format record 
-    inserted_formats <- db_merge_into(formats_df, "formats", con = con,  by = c("name", "mimetype_id")) 
+    ## Make query data.frame
+    formats_variables_input <- cbind(format_id_df, formats_variables_df) 


Similar to above, should verify the entries in this table before inserting it. Beyond verifying the correct types, you should also make sure that units are both ud.is.parseable and ud.are.convertible (i.e. grab the BETY units associated with that variable ID and verify that you can convert the data units to the BETY units)

Both of these checks are in the new commits!!

…specific example. Test if all inputs are in the correct format. Convert all NA's to before loading to BETY. Existing format name test in progress.

LiamBurke24 · 2018-08-01T04:15:32Z

Summary of changes since last Review:

rearrange arguments in function in order of importance and update default values
Note and which variables are required in both processes
Explain what should be entered in each column for format_variables
Explain what the user should put in empty cells (NA for sake of clarity and testing. I replace NA's with "" as per the BETY standard.)
Updated the example to use plausible values now that I have more experience with format-variables record registration.
Test if format name already exists
test if units are ud.is.parseable and ud.are.convertable
QUESTION: What ID should I check before making the record? I believe db_merge_into already checks the variable_id and only inserts the record if the record does not exist.

mdietze · 2018-08-01T09:48:57Z

base/db/R/insert.format.vars.R

+#'
+#' @param con SQL connection to BETYdb
+#' @param format_name The name of the format. Type: character string.
+#' @param notes Additional description of the format: character string.


As noted earlier, please put optional variables at the end. This one in particular could be a lot of text, so will probably be best to put last

mdietze · 2018-08-01T09:49:32Z

base/db/R/insert.format.vars.R

+#' @param notes Additional description of the format: character string.
+#' @param header Boolean that indicates the presence of a header in the format. Defaults to "TRUE".
+#' @param skip Integer that indicates the number of lines to skip in the header. Defaults to 0. 
+#' @param mimetype_id The id associated with the mimetype of the format. Type: integer. 


put required variables (e.g. mimetype_id) before optional variables that have defaults

mdietze · 2018-08-01T09:50:19Z

base/db/R/insert.format.vars.R

+#' @details The formats_variables argument must be a 'tibble' and be structured in a specific format so that the SQL query functions properly. All arguments should be passed as vectors so that each entry will correspond with a specific row. All empty values should be specified as NA.  
+#' \describe{
+#' \item{variable_id}{(Required) Vector of integers.}
+#' \item{name}{(Optional) Vector of character strings. Needs only be specified if it differs from the BETY variable name.} 


Clarify what it is

mdietze · 2018-08-01T09:52:29Z

base/db/R/insert.format.vars.R

+#' \describe{
+#' \item{variable_id}{(Required) Vector of integers.}
+#' \item{name}{(Optional) Vector of character strings. Needs only be specified if it differs from the BETY variable name.} 
+#' \item{unit}{(Optional) Vector of type character string. Should be in a format parseable by the udunits library and need only be secified if they differ from the BETY standard.}


Be explicit about what "they" refers to here (the units of the data in the file).

mdietze · 2018-08-01T09:54:01Z

base/db/R/insert.format.vars.R

+#'        name = c("NPP", NA, "YEAR"), # 
+#'        unit = c("g C m-2 yr-1", NA, NA),
+#'        storage_type = c(NA, NA, "%Y"),
+#'        column_number = c(NA, NA, NA),


In this example, column_number is all NA's and thus doesn't need to be specified

mdietze · 2018-08-01T10:01:05Z

base/db/R/insert.format.vars.R

+    )
+  }
+
+  # Test if format name already exists


Same bit of code repeated twice

mdietze · 2018-08-01T10:05:34Z

base/db/R/insert.format.vars.R

+
+  # Test if name is a character string
+  if(!is.character(name)){
+    PEcAn.logger::logger.error(


Ok so right now you're sending a message to the log that this is an error, but then you're letting the function proceed on. You either want to exit here (logger.severe) or return with an error code that your Shiny app knows how to handle. The latter is really worth considering because the error message here will not be shown to the user of the app and an exit would just cause your app to crash, rather than giving the user an opportunity to fix bad entries.

I'll handle this on the shiny side. Thanks!

mdietze · 2018-08-01T10:09:15Z

base/db/R/insert.format.vars.R

+      notes = format_notes,
+      name = format_name,
+      stringsAsFactors = FALSE
+    )


It's the variable_id that you need to make sure exists. A test for this is implicit in the unit test above

u2 <- dplyr::tbl(con, "variables") %>% dplyr::select(id, units) %>% dplyr::filter(id %in% formats_variables[[1, "variable_id"]]) %>% pull(units)

because this will fail if the variable_id is missing, but that implicit test isn't great as the function will just crash with no error message is being returned

mdietze · 2018-08-01T13:31:44Z

base/db/R/insert.format.vars.R

+   formats_variables[is.na(formats_variables)] <- ""
+
+  ###  udunit tests ###
+  for(i in 1:nrow(formats_variables)){


This test should probably be within the if(!is.null(formats_variables)){

it's also better to use seq_along over 1:nrow
Both these changes protect against the sequence ending up over 1:0 and then iterating over empty tables.

Right. I'll move the tests into that if statement. I don't follow on your second point.

mdietze · 2018-08-01T13:33:18Z

shiny/Data-Ingest/server_files/ingest_workflow_svr.R

  FormatRecordList$NewFormatName <- input$NewFormatName
-  FormatRecordList$HeaderBoolean <- input$HeaderBoolean
+  FormatRecordList$HeaderBoolean <- ifelse((input$HeaderBoolean == "Yes"), "TRUE", "FALSE")


mentioned this in the other PR, but why not have input$HeaderBoolean just return TRUE/FALSE instead of Yes/No?

…r row to remove two NAs, added test if variable_id exists and 'suppress' tag to toggle this feature.

LiamBurke24 · 2018-08-01T20:17:23Z

Ok. This should be the last push necessary for this PR.

Merge branch 'develop' of github.com:PecanProject/pecan into link_bety # Conflicts: # CHANGELOG.md

LiamBurke24 added 3 commits July 23, 2018 22:00

Send format inputs to global environment and write db.query to create…

a883bdf

… format record.

Updated CHANGELOG

37f9473

Merge branch 'develop' of github.com:PecanProject/pecan into link_bety

c2c318a

LiamBurke24 self-assigned this Jul 24, 2018

LiamBurke24 requested review from dlebauer, mdietze and tonygardella July 24, 2018 02:18

mdietze requested changes Jul 24, 2018

View reviewed changes

LiamBurke24 added 7 commits July 24, 2018 13:19

Created dbfile.format.variable.insert which inserts a format and a va…

f8b97fb

…riables record

Roxygenized documents AND cleaned up dangling parens

636a610

Edit parameter names. mimety-eid -> mimetype_id. Fixed SQL comma bug.…

94b1965

… Fixed query = query. now query = cmd.

Moved FormatRecordList to inside the ObserveEvent. Store the List in …

fc6f0d4

…global environment and write to list as before. Replace manual db query with dbfile.format.variable.insert

New File: dbfile.format.variable.insert man

2bcb1e8

CHANGELOG

7977f5d

Merge branch 'develop' of github.com:PecanProject/pecan into link_bety

309ba8a

mdietze requested changes Jul 25, 2018

View reviewed changes

infotroph requested changes Jul 25, 2018

View reviewed changes

ashiklom requested changes Jul 25, 2018

View reviewed changes

LiamBurke24 added 10 commits July 25, 2018 12:03

Remove unintentonally hard coded format record list. Remove mimetypeN…

7492e2a

…ame and mimetypeID from FormatRecordList to addess a duplicate bug.

New File & Function: insert.format.variables

491e3a0

Simplified inputs into two data frames and wrapped "db_merge_into" in…

ee5fb16

… an if else statement

New function: insert.format.vars -> NAMESPACE and man

5014308

Add parameter definitions, example. Move return statements into the i…

0b45033

…f/else statements.

Create FormatsRecord_df for loading into insert.format.vars query

7e331e2

Delete depricated dbfile.format.variable.insert function from app. Ad…

b8113aa

…d comments.

Delete 'dbfile.format.variable.insert' from NAMESPACE, dbfiles.R and …

7f313d7

…man.

Remove extraneous spaces from Rd.

1c06854

CHANGELOG

26ea16f

LiamBurke24 added the Topic: BETYdb label Jul 26, 2018

LiamBurke24 added Status: in progress Priority: 02 - Normal and removed Priority: 03 - High Status: Ready Pull request ready for merge, or issue ready to be closed Status: Review required labels Jul 26, 2018

ashiklom approved these changes Jul 26, 2018

View reviewed changes

LiamBurke24 added 8 commits July 26, 2018 19:47

Add format parameters to 'insert.format.vars' formula. Simplify if el…

3c59b07

…se statement.

Merge branch 'develop' of github.com:PecanProject/pecan into link_bety

037287f

Commented out test code

359468d

Commit updated roxygen docs

a201e08

Removed unnecessary line from CHANGELOF as per Chris Black's suggesti…

b636f32

…ons.

insert.format.vars.R: Added detailed param descriptions, fixed string…

db6a06e

…s/factors bug, make query dataframe with formatID for formats-variables query, and return formatID.

updated Roxygendocs

d3eba9f

Remove 'data.frame' for format record query and update insert.format.…

edb9076

…vars.

mdietze requested changes Jul 30, 2018

View reviewed changes

Add format tests and update definitions.

af7027a

mdietze mentioned this pull request Jul 31, 2018

New Data-Ingest App UI and Server Features #2018

Merged

12 tasks

LiamBurke24 added 2 commits July 31, 2018 22:47

Updated parameter descriptions with a greater amount of detail. More …

4646cd1

…specific example. Test if all inputs are in the correct format. Convert all NA's to before loading to BETY. Existing format name test in progress.

Fixed existing mane test. link in roxygen docs is still broken.

26dd2b0

mdietze requested changes Aug 1, 2018

View reviewed changes

LiamBurke24 added 4 commits August 1, 2018 15:31

Reordered parameters in order of being required, changed column_numbe…

26d8662

…r row to remove two NAs, added test if variable_id exists and 'suppress' tag to toggle this feature.

Merge branch 'develop' of github.com:PecanProject/pecan into link_bety

4b02be3

Updated insert.format.vars Rd

3e338f3

Merge branch 'develop' of github.com:PecanProject/pecan into link_bety

46fadcb

LiamBurke24 force-pushed the link_bety branch from 33bfc1b to 46fadcb Compare August 1, 2018 20:13

mdietze approved these changes Aug 1, 2018

View reviewed changes

merge conflicts

18a7938

Merge branch 'develop' of github.com:PecanProject/pecan into link_bety # Conflicts: # CHANGELOG.md

mdietze merged commit 9db3fc4 into PecanProject:develop Aug 1, 2018

		- "Test BETY" button allows users create a record in BETY with `dbfile.input.insert`
		- Added separate `db.query` to create format record.


		output$FormatRecordOut <- renderPrint({print(FormatRecordList)})

		cmd <- paste0("INSERT INTO formats ",

Format Record db.query (Shiny Ingest App) #2012

Format Record db.query (Shiny Ingest App) #2012

Conversation

LiamBurke24 commented Jul 24, 2018 • edited

Description

Motivation and Context

Review Time Estimate

Types of changes

Checklist:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LiamBurke24 Jul 24, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ashiklom left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LiamBurke24 commented Jul 25, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LiamBurke24 commented Aug 1, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LiamBurke24 commented Jul 24, 2018 •

edited

LiamBurke24 Jul 24, 2018 •

edited

LiamBurke24 commented Jul 25, 2018 •

edited