Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check SEs for aggregateSolute/summarizePreds #174

Closed
aappling-usgs opened this issue Feb 2, 2017 · 4 comments
Closed

check SEs for aggregateSolute/summarizePreds #174

aappling-usgs opened this issue Feb 2, 2017 · 4 comments
Projects

Comments

@aappling-usgs
Copy link
Contributor

Rich notes that the current SEs coming out of summarizePreds look excessively small, and the CIs excessively tight. Check that

  • we're using se.pred, not se.fit
  • we're using load SEs, not conc SEs

And then see if there's anything else that needs fixing to make sure these end up producing believable annual and multi-year predictions.

@aappling-usgs aappling-usgs created this issue from a note in ANA (Near-Term Priority) Feb 2, 2017
@aappling-usgs
Copy link
Contributor Author

We are using se.pred and load SEs, yes.

@aappling-usgs
Copy link
Contributor Author

aappling-usgs commented Mar 18, 2017

Both annual and multi-year SEs are smaller than for FLUXMASTER (about 20% of the FLUXMASTER values for MOGU). The corresponding values straight from rloadest are closer, though still ~60-80% of FLUXMASTER. I wonder if FLUXMASTER, like ESTIMATOR (apparently), makes some assumption of >0 correlation among daily values? Note from rloadest::LOADSEP:

See Tim Cohn's ESTIMATOR code (from Jan 2002) for an inner loop that adds correlation of random variability between predictions. This loop is ommitted here as "we lack the data to fit a larger AR model"

Either way, it does seem that multi-day estimates in loadflex need to be aggregated with components for both natural variability and uncertainty in the mean, if these estimates are to match the conventions used by rloadest and FLUXMASTER. See these references for the best explanations I've found so far:

  1. Section 6 of Cohn WRR 2005 (10.1029/2004WR003833)
  2. https://onlinecourses.science.psu.edu/stat501/node/274

The main issue, I think, is that aggregateSolute assumes (has to, as it's currently structured) 0 correlation between daily estimates. I thought this was fine because the LOADEST comments and Cohn 2005 section 6.2 indicate the autocorrelation is unkonwn and might as well be assumed to be zero, but turns out both of those comments refer to natural variability alone. Parameter uncertainty is correlated, at least for regression models, and that correlation must be taken into account when summing parameter-based estimates. This is why the FLUXMASTER documention includes a note that the total period of interest must be summed all at once. I have no idea what the parameter uncertainty might be for a composite or interpolation model. For rloadest models, we could at least move to using predLoad for entire intervals (expand the options in predictSolute()) rather than predicting and then aggregating separately.

@aappling-usgs
Copy link
Contributor Author

Bootstrapping could work for composite and interpolation models: http://www.sciencedirect.com/science/article/pii/S1364815215300220

@aappling-usgs
Copy link
Contributor Author

Closing because this is in an acceptable place for the batch mode application. See #199 for next steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
ANA
Done
Development

No branches or pull requests

1 participant