-
Notifications
You must be signed in to change notification settings - Fork 0
investigation group idea: how to validate new features #24
Comments
It is good to see attention to this issue. I think that the criteria as described so far may not be strong enough to prevent bloat. (3) is quite weak. (1) is best, especially if it is explicitly designed to compare to other plausible methods. Another consideration is the formal NOAA R2O transition track per: |
I think we can define specific statistical criteria that need to be met to make (3) more robust (eg. new feature needs to show all parameters are estimable and identifiable with minimal bias) |
Not so sure it is this easy. The estimability will likely depend on configuration of other parts of the model unrelated to the feature.
… On Jan 26, 2022, at 5:09 PM, Andrea-Havron-NOAA ***@***.***> wrote:
I think we can define specific statistical criteria that need to be met to make (3) more robust (eg. new feature needs to show all parameters are estimable and identifiable with minimal bias)
—
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
You are receiving this because you are subscribed to this thread.
|
@timjmiller but then wouldn't that mean the feature is validated, if it only breaks unrelated features break? As a concrete example, I tried doing self-testing on the double normal selex in SS using Simulation Based Calibration here and it failed really badly. This means that the parameterization is fundamentally incompatible with integration, for at least some configurations. I suspect this would be true for MLEs too, that there'd be large biases in parameters. So, presuming it fails the validation criteria, would we not include it in FIMS? I see a big gray area there. For me, the validation steps are to test for bugs in the code. This is a slightly different issue. |
Yeah, it might work for some model configurations and not others, but the
tests are not going to include the full set of possible configurations. I
think if it works in some configurations of interest, then that would be
sufficient?
…On Thu, Jan 27, 2022 at 11:56 AM Cole Monnahan ***@***.***> wrote:
@timjmiller <https://github.com/timjmiller> but then wouldn't that mean
the feature is validated, if it only breaks unrelated features break?
As a concrete example, I tried doing self-testing on the double normal
selex in SS using Simulation Based Calibration here
<https://arxiv.org/abs/1804.06788> and it failed really badly. This means
that the parameterization is fundamentally incompatible with integration,
for at least some configurations. I suspect this would be true for MLEs
too, that there'd be large biases in parameters. So, presuming it fails the
validation criteria, would we not include it in FIMS? I see a big gray area
there. For me, the validation steps are to test for bugs in the code. This
is a slightly different issue.
—
Reply to this email directly, view it on GitHub
<#24 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEIGN7DEHJ5IBM4XPDZGT73UYF2KXANCNFSM5JIJOJZQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Timothy J. Miller, PhD (he, him, his)
Research Fishery Biologist
NOAA, Northeast Fisheries Science Center
Woods Hole, MA
508-495-2365
|
@timjmiller yeah that's what I'm thinking too, that proves it is coded right. Presumably we can setup some data-saturated situations where we've hit asymptotic land and everything should work there too. |
@timjmiller and @Cole-Monnahan-NOAA, I agree. As the complexity of a model grows, it becomes challenging to demonstrate estimability for parameters under all possible combinations of a model, and I don’t think this bar should be a requirement for a feature to be included in FIMS. At the bare minimum, however, a new feature should at least demonstrate feature specific parameter estimability under a suite of examples in which the feature will most likely be used. Ideally, it would be helpful to document cases when estimability fails. |
We might be able to leverage the ROpenSci standards for statistical software - in particular, the time series, spatial, and Bayesian sections. |
Those are new to me but definitely worth investigating and considering. |
I like the idea of having, and archiving, feature specific tests. Making that test demonstrate performance is harder.
Here's a war story regarding how subtle a difference can be: |
This came up in discussions for the software design spec #20 - we want to avoid bloat so we should only add features if they are a documented best practice. We need some principles for what constitutes a best practice. Some ideas:
The text was updated successfully, but these errors were encountered: