-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add std to show for PerformanceEvaluation
#766
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #766 +/- ##
==========================================
+ Coverage 85.85% 85.92% +0.06%
==========================================
Files 36 36
Lines 3451 3460 +9
==========================================
+ Hits 2963 2973 +10
+ Misses 488 487 -1
Continue to review full report at Codecov.
|
Nice idea. The only issue I can think of is that |
Well spotted! Based on your suggestion, I got thinking about what would be the best solution for usability. I now implemented a conditional standard deviation column in 9cdd7ef. It is only shown when there are more than 1 folds. From the second
|
@rikhuijzer Thanks for this work 👍🏾
|
Thanks for the reviews both. Good comments. I now implemented some of your suggestions @ablaom. It now looks as follows:
I'm not super happy (yet) with the lengthy note though.
For me, the most important thing is to see whether something is wrong, that is, whether the reported cross-validation average makes sense or whether the scores fluctuate enormously.
If people only use the score to check whether something is wrong, then it should be fine. I personally get the objections that others have raised, but also find it a bit pedantic. There are many ways to shoot yourself in the foot with resampling techniques, so adding a explicit note may lead the reader to conclude that that is the only worry! Maybe we should link to a special resampling page which gives some guidelines such as "you can overfit a CV if you manually tune your model to your data" and "the variability estimate of CV is unreliable due to ...".
Could you tell me why this is the case? If we keep it simple with a
That's what I did again 👍 |
@rikhuijzer thanks for taking my suggestions and comments on board. I think we're on the same page. And you've changed my mind about adding the standard error as a (redundant) field. I think I'd prefer not to have a usage warning in the display - that seems like overkill. What if we change the heading "sterr" in the table to "1.96*SE" or "1.96 * std err" (which would be more consistent with standard meaning of "standard error") and relegate the warning to the |
It now looks like this:
I'm not so sure what to put where in the |
@rikhuijzer, @ablaom shouldn't we also add the standard errors calculated as a field of the |
@OkonSamuel I am agreeing with @rikhuijzer that we leave this out as a field, for the reasons he gives in his comment. |
I suggest adding something here. Maybe along the lines (separate paragraph):
|
As always: If any of you spots mistakes in this PR, feel free to modify it. You should both have the right permissions to do so 😄 |
@rikhuijzer Thanks for this contribution! 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏾
Sure. Thanks for the review @ablaom and @OkonSamuel. Much appreciated again! |
Suggestion to add the standard deviation to the
PerformanceEvaluation
output:This is useful to spot more easily how much variation there is on the reported measurement.
EDIT: End result after reviewer comments