Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The draft for read.settings compatibility with benchmarking #1095

Merged
merged 49 commits into from
Oct 27, 2016

Conversation

bcow
Copy link
Contributor

@bcow bcow commented Oct 17, 2016

Major changes:

  • Added benchmarking stage to workflow.R
  • Reworking the process of reading settings:
    • read.settings reads the pecan.xml file
    • add additional settings such a benchmarking settings
    • update settings and write pecan.CHECKED.xml
    • This new configuration works with normal pecan.xml files - You can test by doing runs with my pecan
  • New improved benchmarking.workflow.R that can be used when not using workflow.R

Example XML files

Can be used to test benchmarking (both with workflow.R and benchmark.workflow.R
(I'll check them off when they are working)

1 variable, 1 metric, 1 site, 1 model

  • settings.file: modules/benchmark/inst/scripts/bm.1var.1metric.1site.1model.xml
  • New Run
  • Existing Run

2 variables, 2 metric, 1 site, 1 model

  • settings.file: modules/benchmark/inst/scripts/bm.2var.2metric.1site.1model.xml
  • New Run
  • Existing Run

2 variables, 2 metric, 1 site, 1 model

  • settings.file: modules/benchmark/inst/scripts/bm.2var.2metric.2site.1model.xml
  • settings will now be a multisite object
  • New Run
  • Existing Run

bcow added 10 commits August 18, 2016 14:50
… didn't call it BRR because I thought reference runs weren't always specifically for benchmarking.
…ipt. Simply because it was too hard to read, I can always recombine them.
…ifferent settings types (ie XML or reference run id's).

Also, trying to figure out where I'm adding in supplemental settings (such as benchmarking settings)
…rep settings XML for insertion into the reference_runs table
Copy link
Member

@mdietze mdietze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all the requested changes are quick fixes. Would love to see a quick turn-around and have this pulled today.

@@ -69,5 +69,6 @@ tests/BC*
compile_on_geo.sh
documentation/tutorials/*/*.html
pecan.Rproj
shiny/BenchmarkReport/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shiny/BenchmarkReport exists in the repo already, so a commit that adds it to the ignore seems like a bad idea. Could you remove that. Better yet, if you've made changes to BenchmarkReport could you commit those changes too?

@@ -9,48 +9,49 @@
##'
##' @author Betsy Cowdery

create.BRR <- function(ensemble.id, workflow, con) {
create.BRR <- function(ens_wf, con){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed argument to something hard to understand, but didn't update the documentation (param still listed as ensemble.id)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ Updated the documentation

# If the ensemble run was done on localhost, turn into a BRR
cnd1 <- ens_wf$hostname == fqdn()
cnd2 <- ens_wf$hostname == 'test-pecan.bu.edu' & fqdn() == 'pecan2.bu.edu'
cnd3 <- ens_wf$hostname == 'pecan2.bu.edu' & fqdn() == 'test-pecan.bu.edu'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to figure out how to avoid having hacks like this hardcoded (Fix not required in this PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You commented on this in the last PR but I still don't have any ideas on what to do with it.
It's just a special case that involves our pecan servers. Could simplify it to:

ens_wf$hostname == fqdn()  | length(grep("pecan.bu.edu", c(fqdn(), "test-pecan.bu.edu")))==2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add some text as a comment on the reason why we can not run a benchmark reference run on any of those other machines, what conditions do we need to be added to the list of exceptions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm just trying to find a way to say that test-pecan, and pecan2 are actually the "same" machine in that the files are accessible from either.
But this is skirting around the issue that if one wants to turn an existing ensemble into a benchmark reference run and that ensemble was not run on localhost, getting all the information to populate the settings section of the reference run table id going to be difficult / may not be possible. This is because we may not be able to read the pecan.xml file on the remote host and the ensemble settings aren't saved anywhere else. This is something that @mdietze and I talked about earlier this week and maybe Mike can explain it better than I can.

Copy link
Member

@robkooper robkooper Oct 27, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if you look at the settings$outdir and see if that exists on the local machine, if so you can (I think) assume that all the files are locally.

# Case in which we would need a remote connection to get the pecan.xml file - not functional
settings_xml <- toString(listToXml(clean, "pecan"))

ref_run <- db.query(paste0(" SELECT * from reference_runs where settings = '", settings_xml,"'"),con)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK for now, but in the long run it's a pretty weak test -- an settings that is LOGICALLY the same but not character-for-character identical (e.g. whitespace, order of tags) wouldn't come up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I will put this on the todo list.

ref_run <- db.query(paste0("INSERT INTO reference_runs",
"(model_id, settings, user_id, created_at, updated_at)",
"VALUES(",ens_wf$model_id,", '",settings_xml,"' , ",user_id,
", NOW() , NOW()) RETURNING *;"),con)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per #1083 we're supposed to remove created_at and updated_at NOW(),NOW() from SQL. The database inserts these automatically and @dlebauer noted it is a bug since most machines insert NOW() in local time, not UTC. @tonygardella make sure this goes into the style guide.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ Removed all created_at, updated_at, NOW(),NOW() in benchmarking

settings <- tbl(bety,"reference_runs") %>%
filter(id == settings$benchmark$reference_run_id) %>%
dplyr::select(settings) %>% collect() %>% unlist() %>%
xmlToList(.,"pecan") %>% append(settings,.) %>% Settings()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive that you can do the query and insert all in one line. Probably would have taken me a page of code.

library(XML)
library(lubridate)
library(PEcAn.DB)
library(PEcAn.utils)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packages

library(XML)
library(lubridate)
library(PEcAn.DB)
library(PEcAn.utils)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

packages

write.settings <- function(settings, outputfile = "pecan.CHECKED.xml"){
library(XML)
library(PEcAn.DB)
library(PEcAn.utils)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can you think of a more appropriate name for this function? write.settings sounds like a function that reads in a settings object and writes it to file. This does a lot more than that! Unfortunately clean.settings and update.settings are already in use.
  2. packages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

settings2pecan.CHECKED?
Or should we not call the other function update.settings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK for today, but renaming the update.settings function so you can name this function update is definitely an option.


# Write pecan.CHECKED.xml
settings <- write.settings(settings, outputfile = "pecan.CHECKED.xml")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the call to calc.benchmarks at the end of workflow.R

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✔️ Added it!

bcow and others added 20 commits October 26, 2016 14:03
…xml settings file may ony contain ensemble ids (and not BRR ids)
… are used elsewhere in the benchmarking workflow. This shouldn't affect anyone else's work. Adding a special case for annual data. This may need more work.
@mdietze mdietze merged commit 3431df5 into PecanProject:master Oct 27, 2016
@mdietze mdietze mentioned this pull request Nov 1, 2016
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants