Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expand predictSolute to predict by time period #199

Open
aappling-usgs opened this issue Mar 20, 2017 · 6 comments
Open

expand predictSolute to predict by time period #199

aappling-usgs opened this issue Mar 20, 2017 · 6 comments
Assignees

Comments

@aappling-usgs
Copy link
Contributor

see also #174, which we resolved for batch mode but would like to correct more systematically throughout the package.

predictions need to happen across the full time period[s] of interest because they need to accommodate correlation among errors in estimates due to parameter uncertainty.

@aappling-usgs aappling-usgs created this issue from a note in ANA (Backlog) Mar 20, 2017
@aappling-usgs aappling-usgs added this to Do in maintenance Jun 16, 2017
@aappling-usgs
Copy link
Contributor Author

we could alternatively rewrite aggregateSolute to accept a model rather than predictions and to only include confidence intervals when it can do it well, by wrapping rloadest functionality or using approaches for interpolation/composite that embrace autocorrelation of errors better

@aappling-usgs
Copy link
Contributor Author

Basic challenge: to produce a monthly or annual estimate, we need more information than just the predictions. But the current structure of the package separates instantaneous/unit predictions from aggregation in such a way that the information isn't available when we need it:

  • predictSolute currently takes in a model and returns predictions at the temporal resolution of the discharge data (instantaneous/unit resolution).
  • aggregateSolute currently takes in a set of instantaneous predictions and returns monthly, annual, etc. predictions. But it doesn't take in a model object and therefore doesn't actually have enough information to do its job.

This mismatch exists because I didn't understand the uncertainty propagation problem completely enough 2 years ago. @wdwatkins, you and I have explored aspects of this problem since then. It's a different problem for each model type; I've backlogged the GitHub issues for fixing this problem for composite and interpolation and lm() models, but it's immediately fixable for loadReg2 models, and this issue is about restructuring a bit so that our fix for loadReg2 also paves the way for eventual fixes for the other model types.

I've proposed two possible solutions above but am leaning toward the first, which is consistent with the title of this issue: let's modify predictSolute to accept an argument that specifies the temporal resolution of interest and to return predictions at that resolution.

  • that new argument can take exactly the same form and name as agg.by in the current aggregateSolute function.
  • for loadReg2 models, predictSolute should translate the contents of the agg.by argument into something that can be passed to rloadest::predLoad to request load and uncertainty estimates at the desired temporal scale. rloadest::predLoad does uncertainty propagation well for all scales, as long as you ask it specifically for the scale you want.
  • for loadLm, loadInterp, and loadComp models, predictSolute should always return load or concentration estimates, but should only return numeric uncertainty estimates if the resolution is instantaneous (otherwise return NAs instead). Use the same warning language currently in place within aggregateSolute to explain why we're returning NAs instead of uncertainties when they're requested for aggregate estimates from these models. I hope we can add uncertainty estimates to some of these models later, but those are hard challenges for another day. This structural change will take a little tiny bite out of those challenges, and that's all I want for now.
  • we're essentially deprecating the aggregateSolute function here. It's possible that we can keep some of the code in place and use aggregateSolute as a helper to the revised predictSolute function, e.g., for aggregating loads (but not uncertainties) for loadLm, loadInterp, and loadComp models. But I think we should be thinking of predictSolute as the new go-to function for prediction at all temporal resolutions, such that aggregateSolute should no longer be needed by external users.

@wdwatkins wdwatkins self-assigned this Jul 31, 2017
@wdwatkins
Copy link
Contributor

wdwatkins commented Aug 1, 2017

  • send to rloadest for loadReg2 models
  • return NAs for other models
  • add some examples?

@wdwatkins
Copy link
Contributor

So it seems the mean water year and mean calendar year options will need to be eliminated, since those can't go into rloadest::predLoad? Or can we still incorporate those two options afterwords?

@aappling-usgs
Copy link
Contributor Author

Hmm, yep, those are harder. For loadflexBatch we restricted the data to complete years and then used predLoad('total') - do you agree that approach is about as rigorous as we could hope for? The loadflexBatch code is at https://github.com/USGS-R/loadflexBatch/blob/master/batchHelperFunctions.R#L268. If it sounds like a good long-term solution to you, then the next question is how hard it would be to add that logic to predictSolute.loadReg2 - what do you think?

@wdwatkins
Copy link
Contributor

Mm yeah I forgot that accomplishes the same thing. That should be doable, we might be able to just pull that code into a loadflex function so it stays in one place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
ANA
Backlog
Development

No branches or pull requests

2 participants