Add R-square (coefficient of determination)#679
Conversation
|
I don't get it (yet): EDIT: This was resolved by not setting the aggregation field via |
|
I've read |
Codecov Report
@@ Coverage Diff @@
## dev #679 +/- ##
==========================================
- Coverage 85.29% 85.17% -0.13%
==========================================
Files 39 39
Lines 3414 3426 +12
==========================================
+ Hits 2912 2918 +6
- Misses 502 508 +6
Continue to review full report at Codecov.
|
Thanks @rikhuijzer for this PR.
Now the reason why your previous commit errored was because, you defined a wrong aggregation in your metadata function aggregation = RSquared()Although |
ablaom
left a comment
There was a problem hiding this comment.
Thanks for this contribution @rikhuijzer !
As @OkonSamuel correctly points out "aggregation" refers to how scores are combined in when we resample to estimate the expected value of the score on unseen data. So if I am doing 3-fold cross-validation, I'll get 3 R-squared scores and the questions is how should I combine them? Since R-squared is rather non-linear (in the sense I described above) I don't believe there's an obvious choice, so falling back to Mean(), as you currently have, is my recommendation.
If this was number-of-false-positives in classification, you would use Count().
For rms its possible to define an aggregation function f with the property that f([rms(A1), rms(A2), rms(A3)]) = rms(cat(A1, A2, A3)) (assuming A1, A2, A3 are data sets of same size) so that case has a special aggregator.
|
Thanks both for the comments, @ablaom @OkonSamuel 😄. I've implemented them. |
|
Now that the weighted |
ablaom
left a comment
There was a problem hiding this comment.
Looks good to me. Thanks again 🙏🏾
Oops, good catch. They should both be set to |
|
Nevermind, I'll do it when I merge your other PR. |
This PR adds the R² metric.
R² is more informative for linear regressions than MSE and RMSE (Chicco et al, 2021) because the scale is dimensionless (a percentage).
I value -3 in the test is manually calculated by me. All looks good.
EDIT: It doesn't work yet in CV. I get an
_check(::RSquare, Vector{Int}). I'll debug it tonight or so