first take at expanding predictSolute #219

wdwatkins · 2017-08-17T14:41:52Z

One warning is for the vignette, there is also an existing S3 consistency warning from predictSolute.loadComp.

…some to aggregateSolute

wdwatkins · 2017-08-17T14:43:15Z

R/loadComp.R

+  #use aggregate solute to aggregate to agg.by, but warn and return NA for uncertainty
+  if(agg.by != "unit") {
+    preds <- aggregateSolute(preds, metadata = getMetadata(load.model), agg.by = agg.by,
+                             format = flux.or.conc)


@aappling-usgs aggregateSolute has both flux rate and flux total options, which of those is appropriate for these? Should one of them be eliminated?

It is going to flux rate as is via partial matching I think.

Good Q. I like your suggestion to take out flux total entirely and just accept conc or flux [interpreted as flux rate] instead. Let's eliminate flux total everywhere in this function, and add a warning just before

format <- match.arg.loadflex(format, c("conc", "flux rate", "flux total"))

(line ~126; specifics of the above line will change to only accept 'conc' & 'flux'). The warning should check for 'flux total' and, if that was the format passed in, explain that the option is deprecated, and you can multiply flux rate by the duration of each period if a total flux is desired. This will catch very few people now that you've made aggregateSolute internal, but i think it'll help keep things clear for the next year or so.

aappling-usgs · 2017-08-17T14:56:51Z

I see this S3 consistency warning in the Appveyor build:

* checking S3 generic/method consistency ... WARNING
predictSolute:
  function(load.model, flux.or.conc, newdata, interval, level,
           lin.or.log, se.fit, se.pred, date, attach.units, agg.by,
           ...)
predictSolute.loadComp:
  function(load.model, flux.or.conc, newdata, interval, level,
           lin.or.log, se.fit, se.pred, date, attach.units, fit.reg,
           fit.resid, fit.resid.raw, agg.by, ...)
See section 'Generic functions and methods' in the 'Writing R
Extensions' manual.

Is this warning related to the one you're seeing in the vignette? Could you copy that warning here so I can see?

wdwatkins · 2017-08-17T14:59:20Z

The vignette warning is just because I made aggregateSolute internal. We'll just need to adjust the vignette to use the new predictSolute argument instead.

aappling-usgs · 2017-08-17T15:47:21Z

OK, could you create an issue for updating the vignette (unless it's easy enough to do in this PR)?

aappling-usgs · 2017-08-17T15:47:43Z

Also, when I Check locally, I'm seeing an error rather than a warning in the vignette. Is this different from what you're seeing?

Quitting from lines 120-124 (intro_to_loadflex.Rmd) 
Error: processing vignette 'intro_to_loadflex.Rmd' failed with diagnostics:
could not find function "aggregateSolute"

wdwatkins · 2017-08-17T15:53:01Z

Yeah that is the one. Apparently errors in vignettes generate warnings in R CMD CHECK

checking re-building of vignette outputs ... WARNING
Error in re-building vignettes:

aappling-usgs

Good work on a tricky refactor. I've spun off a few additional issues i think we can address in separate PRs. The comments I left in here are things we should probably address in this PR. The big ones will be fixing the S3 consistency warning, deprecating the flux total option, and ensuring that there's a date column for any set of arguments to predLoad/predConc. And then some more minor clarifications and tidyings.

aappling-usgs · 2017-08-17T14:57:56Z

R/aggregateSolute.R

-          "In the meantime, please consider reporting instantaneous uncertainties only, ",
-          "or using predLoad(getFittedModel(load.model), by=[format]) ",
+          "In the meantime, please consider reporting instantaneous uncertainties only",
+          "(by setting agg.by = \"unit\"), or using predLoad(getFittedModel(load.model), by=[format]) ",


aappling-usgs · 2017-08-17T16:10:52Z

R/aggregateSolute.R

-      agg_preds$CI_lower <- agg_preds$Value - CI_quantile*agg_preds$SE
-      agg_preds$CI_upper <- agg_preds$Value + CI_quantile*agg_preds$SE
+      #returning NAs since this is not accurate
+      agg_preds$CI_lower <- NA


lines 284-285 have similar needs, and all these calculations are now extraneous all the time, right? i think it's time to trust git to keep a copy of this code for us, and just replace the whole if(ci.agg) block with

if(ci.agg) { agg_preds$CI_lower <- NA agg_preds$CI_upper <- NA }

(spinning off some additional cleanup tasks into #220)

aappling-usgs · 2017-08-17T16:28:28Z

R/loadComp.R

+  #use aggregate solute to aggregate to agg.by, but warn and return NA for uncertainty
+  if(agg.by != "unit") {
+    preds <- aggregateSolute(preds, metadata = getMetadata(load.model), agg.by = agg.by,
+                             format = flux.or.conc)


Good Q. I like your suggestion to take out flux total entirely and just accept conc or flux [interpreted as flux rate] instead. Let's eliminate flux total everywhere in this function, and add a warning just before

format <- match.arg.loadflex(format, c("conc", "flux rate", "flux total"))

(line ~126; specifics of the above line will change to only accept 'conc' & 'flux'). The warning should check for 'flux total' and, if that was the format passed in, explain that the option is deprecated, and you can multiply flux rate by the duration of each period if a total flux is desired. This will catch very few people now that you've made aggregateSolute internal, but i think it'll help keep things clear for the next year or so.

aappling-usgs · 2017-08-17T16:31:04Z

R/aggregateSolute.R

@@ -227,7 +228,7 @@ aggregateSolute <- function(
  agg_preds <- as.data.frame(summarise(
    preds_filt, 
    Value = mean(preds), 
-    SE = if(se.agg | ci.agg) SEofSum(dates, se.preds, cormat.function) else NA, # why isn't SEofSum divided by n()??
+    SE = NA, # why isn't SEofSum divided by n()??  #returning NAs since this is not accurate


you can take out the # why isn't SEofSum divided by n()?? comment

aappling-usgs · 2017-08-17T16:56:45Z

R/aggregateSolute.R

@@ -101,8 +102,7 @@
 #' time.step=as.difftime(1, units="days"), max.tao=as.difftime(10, units="days"))
 #' aggregateSolute(preds_example, metadata=metadata_example, format="conc", agg.by="month",
 #'                 cormat.function=new_correlation_assumption)
-#' 
-#' @export
+#' }


spinning off some thoughts about deprecation messaging into issue #221 for later cleanup

aappling-usgs · 2017-08-17T17:20:20Z

R/loadReg2.R

  # Get around data size constraints of predLoad and predConc by splitting the 
  # prediction into smaller chunks as necessary. chunk.size is the maximum
  # number of values per chunk; rloadest can handle at most 176000 values by
  # default. nchunks is the number of chunks required.
-  chunk.size <- 176000
+  chunk.size <- 175872 #multiple of 96, so breaks iv data on day boundaries


good idea (and inspired #224)

aappling-usgs · 2017-08-17T17:20:28Z

R/loadReg2.R

  nchunks <- ceiling(nrow(newdata) / chunk.size)
  datachunks <- lapply(1:nchunks, function(i) {
    newdata[((i-1)*chunk.size + 1):min(i*chunk.size, nrow(newdata)),]
  })
-
+  browser()


definitely don't want this in here

would be nice if R CMD CHECK caught those

aappling-usgs · 2017-08-17T17:21:04Z

R/loadReg2.R

+                                   agg.by = agg.by)
+    agg.by = "total" #going to rloadest
+    if(nrow(newdata) > 176000) {
+      stop(paste(strwrap("Sorry, rloadest can't handle more than 176,000 data points, 


change to 175,872 to reflect new chunk.size below?

aappling-usgs · 2017-08-17T17:23:35Z

R/loadReg2.R

+    agg.by = "total" #going to rloadest
+    if(nrow(newdata) > 176000) {
+      stop(paste(strwrap("Sorry, rloadest can't handle more than 176,000 data points, 
+                          and loadflex does not currently support aggregating uncertainties. 


this line will probably confuse users who don't understand how rloadest and loadflex interact. I think you could change this whole message to:

Sorry, rloadest can't aggregate more than 176,000 data points at a time. Please change the agg.by argument to a shorter period, or else supply a shorter period of data for newdata"

aappling-usgs · 2017-08-17T17:28:28Z

R/loadReg2.R

  # Add dates if requested
  if(date) {
    if(!is.data.frame(preds)) {
      preds <- data.frame(fit=preds)
    }
    # prepend the date column
-    preds <- data.frame(date=getCol(metadata, newdata, "date"), preds)
+    #preds <- data.frame(date=getCol(metadata, newdata, "date"), preds)
+    #predLoad returns the dates, don't need to get them from metadata


are you sure? if we just created the data.frame in line 468 above, then preds will be a data.frame with only a fit column, no dates...

(however, if that's the only case when a date column needs to be added, it does look like these lines could be moved into the if(!is.data.frame(preds)) block just above, right?)

Hmm, I see that, although predLoad and predConc both always return data frames containing the dates, so I wasn't sure why they would ever need to be reattached.

i thought i remembered that with certain arguments predLoad just returned a vector...let me see if i can recreate that. maybe i'm misremembering

ok, i've looked and still think the date column needs to be added - always, in fact. preds is not the direct output from predLoad or predConc. it's either a vector (https://github.com/wdwatkins/loadflex/blob/9e4e520e0b88a04ce594aa1e7b5048e26dae0e2a/R/loadReg2.R#L394-L397) or a data.frame with columns fit, lwr, and upr (https://github.com/wdwatkins/loadflex/blob/9e4e520e0b88a04ce594aa1e7b5048e26dae0e2a/R/loadReg2.R#L406-L420)

Ok, I will leave it then. I think this could be streamlined in the future though, I still don't see a reason for dropping the dates and then appending them again later.

that's a reasonable suggestion.

wdwatkins · 2017-08-18T16:59:00Z

should probably add/change some tests so the new agg.by arguments get used.

aappling-usgs · 2017-08-18T20:59:52Z

when you said, "should probably add/change some tests", did you mean that you planned to do so for this PR? if not, could you create an issue for it?

could you also add updates or an issue for the vignette? would rather not have a broken vignette on the master branch for long.

wdwatkins · 2017-08-18T21:34:37Z

Might as well do it now.

wdwatkins · 2017-08-18T21:38:07Z

@aappling-usgs actually it might be better if you you do the vignette, you would probably have a better idea how to realistically incorporate this change into the workflow. I will make an issue.

wdwatkins · 2017-08-25T18:31:36Z

@aappling-usgs this should be good to go now, pending any additional review

aappling-usgs · 2017-08-25T18:54:38Z

cool! good work.

wdwatkins added 9 commits August 2, 2017 13:26

add agg.by to predictSolute

dd54448

more context, test was hanging/taking too long

35a1ceb

general structure of aggregating for non-rloadest models, might move …

0b4ca15

…some to aggregateSolute

returning NAs for uncertainty from aggregateSolute, make internal

3d935a7

loadReg2 tweaks

6a2c8e2

add aggregate solute, agg.by argument to other predictSolute methods

b6b86c8

no build vignettes

a823800

nevermind

73b7578

Merge branch 'master' of https://github.com/USGS-R/loadflex

9e4e520

wdwatkins commented Aug 17, 2017

View reviewed changes

wdwatkins requested a review from aappling-usgs August 17, 2017 14:44

aappling-usgs mentioned this pull request Aug 17, 2017

Remove all unnecessary code from aggregateSolute #220

Open

aappling-usgs reviewed Aug 17, 2017

View reviewed changes

wdwatkins added 3 commits August 17, 2017 15:09

eliminate flux total in aggregateSolute

4b17a92

clear out ci.agg code

320c7e5

minor PR changes

d67bc68

wdwatkins mentioned this pull request Aug 18, 2017

validate 'by' arguments in predLoad and predConc USGS-R/rloadest#28

Merged

leave dates section as is, clarify comment

cea2503

wdwatkins mentioned this pull request Aug 18, 2017

change vignette to remove aggregateSolute #225

Closed

wdwatkins added 2 commits August 25, 2017 11:08

loadLm tests

70bb643

need to send dates to aggregateSolute

cdfe08a

test agg.by in other predictSolutes

c487d18

aappling-usgs merged commit 286a7c2 into DOI-USGS:master Aug 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

first take at expanding predictSolute #219

first take at expanding predictSolute #219

wdwatkins commented Aug 17, 2017

wdwatkins Aug 17, 2017 •

edited

Loading

aappling-usgs Aug 17, 2017

aappling-usgs commented Aug 17, 2017

wdwatkins commented Aug 17, 2017 •

edited

Loading

aappling-usgs commented Aug 17, 2017

aappling-usgs commented Aug 17, 2017

wdwatkins commented Aug 17, 2017 •

edited

Loading

aappling-usgs left a comment

aappling-usgs Aug 17, 2017

aappling-usgs Aug 17, 2017

aappling-usgs Aug 17, 2017

aappling-usgs Aug 17, 2017

aappling-usgs Aug 17, 2017

aappling-usgs Aug 17, 2017

aappling-usgs Aug 17, 2017

aappling-usgs Aug 17, 2017

wdwatkins Aug 17, 2017

aappling-usgs Aug 17, 2017

aappling-usgs Aug 17, 2017

aappling-usgs Aug 17, 2017

aappling-usgs Aug 17, 2017

wdwatkins Aug 18, 2017

aappling-usgs Aug 18, 2017

aappling-usgs Aug 18, 2017

wdwatkins Aug 18, 2017

aappling-usgs Aug 18, 2017

wdwatkins commented Aug 18, 2017

aappling-usgs commented Aug 18, 2017

wdwatkins commented Aug 18, 2017

wdwatkins commented Aug 18, 2017

wdwatkins commented Aug 25, 2017

aappling-usgs commented Aug 25, 2017

first take at expanding predictSolute #219

first take at expanding predictSolute #219

Conversation

wdwatkins commented Aug 17, 2017

wdwatkins Aug 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aappling-usgs commented Aug 17, 2017

wdwatkins commented Aug 17, 2017 • edited Loading

aappling-usgs commented Aug 17, 2017

aappling-usgs commented Aug 17, 2017

wdwatkins commented Aug 17, 2017 • edited Loading

aappling-usgs left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wdwatkins commented Aug 18, 2017

aappling-usgs commented Aug 18, 2017

wdwatkins commented Aug 18, 2017

wdwatkins commented Aug 18, 2017

wdwatkins commented Aug 25, 2017

aappling-usgs commented Aug 25, 2017

wdwatkins Aug 17, 2017 •

edited

Loading

wdwatkins commented Aug 17, 2017 •

edited

Loading

wdwatkins commented Aug 17, 2017 •

edited

Loading