-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
first take at expanding predictSolute #219
Conversation
…some to aggregateSolute
R/loadComp.R
Outdated
#use aggregate solute to aggregate to agg.by, but warn and return NA for uncertainty | ||
if(agg.by != "unit") { | ||
preds <- aggregateSolute(preds, metadata = getMetadata(load.model), agg.by = agg.by, | ||
format = flux.or.conc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aappling-usgs aggregateSolute
has both flux rate
and flux total
options, which of those is appropriate for these? Should one of them be eliminated?
It is going to flux rate
as is via partial matching I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Q. I like your suggestion to take out flux total
entirely and just accept conc
or flux
[interpreted as flux rate] instead. Let's eliminate flux total
everywhere in this function, and add a warning just before
format <- match.arg.loadflex(format, c("conc", "flux rate", "flux total"))
(line ~126; specifics of the above line will change to only accept 'conc' & 'flux'). The warning should check for 'flux total' and, if that was the format
passed in, explain that the option is deprecated, and you can multiply flux rate by the duration of each period if a total flux is desired. This will catch very few people now that you've made aggregateSolute internal, but i think it'll help keep things clear for the next year or so.
I see this S3 consistency warning in the Appveyor build:
Is this warning related to the one you're seeing in the vignette? Could you copy that warning here so I can see? |
The vignette warning is just because I made |
OK, could you create an issue for updating the vignette (unless it's easy enough to do in this PR)? |
Also, when I Check locally, I'm seeing an error rather than a warning in the vignette. Is this different from what you're seeing?
|
Yeah that is the one. Apparently errors in vignettes generate warnings in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work on a tricky refactor. I've spun off a few additional issues i think we can address in separate PRs. The comments I left in here are things we should probably address in this PR. The big ones will be fixing the S3 consistency warning, deprecating the flux total
option, and ensuring that there's a date
column for any set of arguments to predLoad
/predConc
. And then some more minor clarifications and tidyings.
"In the meantime, please consider reporting instantaneous uncertainties only, ", | ||
"or using predLoad(getFittedModel(load.model), by=[format]) ", | ||
"In the meantime, please consider reporting instantaneous uncertainties only", | ||
"(by setting agg.by = \"unit\"), or using predLoad(getFittedModel(load.model), by=[format]) ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
R/aggregateSolute.R
Outdated
agg_preds$CI_lower <- agg_preds$Value - CI_quantile*agg_preds$SE | ||
agg_preds$CI_upper <- agg_preds$Value + CI_quantile*agg_preds$SE | ||
#returning NAs since this is not accurate | ||
agg_preds$CI_lower <- NA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lines 284-285 have similar needs, and all these calculations are now extraneous all the time, right? i think it's time to trust git to keep a copy of this code for us, and just replace the whole if(ci.agg)
block with
if(ci.agg) {
agg_preds$CI_lower <- NA
agg_preds$CI_upper <- NA
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(spinning off some additional cleanup tasks into #220)
R/loadComp.R
Outdated
#use aggregate solute to aggregate to agg.by, but warn and return NA for uncertainty | ||
if(agg.by != "unit") { | ||
preds <- aggregateSolute(preds, metadata = getMetadata(load.model), agg.by = agg.by, | ||
format = flux.or.conc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good Q. I like your suggestion to take out flux total
entirely and just accept conc
or flux
[interpreted as flux rate] instead. Let's eliminate flux total
everywhere in this function, and add a warning just before
format <- match.arg.loadflex(format, c("conc", "flux rate", "flux total"))
(line ~126; specifics of the above line will change to only accept 'conc' & 'flux'). The warning should check for 'flux total' and, if that was the format
passed in, explain that the option is deprecated, and you can multiply flux rate by the duration of each period if a total flux is desired. This will catch very few people now that you've made aggregateSolute internal, but i think it'll help keep things clear for the next year or so.
R/aggregateSolute.R
Outdated
@@ -227,7 +228,7 @@ aggregateSolute <- function( | |||
agg_preds <- as.data.frame(summarise( | |||
preds_filt, | |||
Value = mean(preds), | |||
SE = if(se.agg | ci.agg) SEofSum(dates, se.preds, cormat.function) else NA, # why isn't SEofSum divided by n()?? | |||
SE = NA, # why isn't SEofSum divided by n()?? #returning NAs since this is not accurate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can take out the # why isn't SEofSum divided by n()??
comment
@@ -101,8 +102,7 @@ | |||
#' time.step=as.difftime(1, units="days"), max.tao=as.difftime(10, units="days")) | |||
#' aggregateSolute(preds_example, metadata=metadata_example, format="conc", agg.by="month", | |||
#' cormat.function=new_correlation_assumption) | |||
#' | |||
#' @export | |||
#' } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spinning off some thoughts about deprecation messaging into issue #221 for later cleanup
# Get around data size constraints of predLoad and predConc by splitting the | ||
# prediction into smaller chunks as necessary. chunk.size is the maximum | ||
# number of values per chunk; rloadest can handle at most 176000 values by | ||
# default. nchunks is the number of chunks required. | ||
chunk.size <- 176000 | ||
chunk.size <- 175872 #multiple of 96, so breaks iv data on day boundaries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea (and inspired #224)
R/loadReg2.R
Outdated
nchunks <- ceiling(nrow(newdata) / chunk.size) | ||
datachunks <- lapply(1:nchunks, function(i) { | ||
newdata[((i-1)*chunk.size + 1):min(i*chunk.size, nrow(newdata)),] | ||
}) | ||
|
||
browser() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely don't want this in here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice if R CMD CHECK caught those
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so true!
R/loadReg2.R
Outdated
agg.by = agg.by) | ||
agg.by = "total" #going to rloadest | ||
if(nrow(newdata) > 176000) { | ||
stop(paste(strwrap("Sorry, rloadest can't handle more than 176,000 data points, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to 175,872 to reflect new chunk.size below?
R/loadReg2.R
Outdated
agg.by = "total" #going to rloadest | ||
if(nrow(newdata) > 176000) { | ||
stop(paste(strwrap("Sorry, rloadest can't handle more than 176,000 data points, | ||
and loadflex does not currently support aggregating uncertainties. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line will probably confuse users who don't understand how rloadest and loadflex interact. I think you could change this whole message to:
Sorry, rloadest can't aggregate more than 176,000 data points at a time.
Please change the agg.by argument to a shorter period, or else
supply a shorter period of data for newdata"
R/loadReg2.R
Outdated
# Add dates if requested | ||
if(date) { | ||
if(!is.data.frame(preds)) { | ||
preds <- data.frame(fit=preds) | ||
} | ||
# prepend the date column | ||
preds <- data.frame(date=getCol(metadata, newdata, "date"), preds) | ||
#preds <- data.frame(date=getCol(metadata, newdata, "date"), preds) | ||
#predLoad returns the dates, don't need to get them from metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure? if we just created the data.frame in line 468 above, then preds will be a data.frame with only a fit
column, no dates...
(however, if that's the only case when a date column needs to be added, it does look like these lines could be moved into the if(!is.data.frame(preds))
block just above, right?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see that, although predLoad
and predConc
both always return data frames containing the dates, so I wasn't sure why they would ever need to be reattached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i thought i remembered that with certain arguments predLoad just returned a vector...let me see if i can recreate that. maybe i'm misremembering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i've looked and still think the date column needs to be added - always, in fact. preds
is not the direct output from predLoad
or predConc
. it's either a vector (https://github.com/wdwatkins/loadflex/blob/9e4e520e0b88a04ce594aa1e7b5048e26dae0e2a/R/loadReg2.R#L394-L397) or a data.frame with columns fit
, lwr
, and upr
(https://github.com/wdwatkins/loadflex/blob/9e4e520e0b88a04ce594aa1e7b5048e26dae0e2a/R/loadReg2.R#L406-L420)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I will leave it then. I think this could be streamlined in the future though, I still don't see a reason for dropping the dates and then appending them again later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's a reasonable suggestion.
should probably add/change some tests so the new |
when you said, "should probably add/change some tests", did you mean that you planned to do so for this PR? if not, could you create an issue for it? could you also add updates or an issue for the vignette? would rather not have a broken vignette on the master branch for long. |
Might as well do it now. |
@aappling-usgs actually it might be better if you you do the vignette, you would probably have a better idea how to realistically incorporate this change into the workflow. I will make an issue. |
@aappling-usgs this should be good to go now, pending any additional review |
cool! good work. |
#199
One warning is for the vignette, there is also an existing S3 consistency warning from
predictSolute.loadComp
.