-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate models against multiple measures (loss functions) simultaneously. #98
Comments
Agreed - this is similar to mlr's benchmark results table, once coerced to data frame. Regarding the "measures", I think all should come - optimally by default - with confidence intervals (conditional on fitted machine, i.e., performance guarantees for re-using fitted machine). Otherwise the comparison is, at best, meaningless, and at worst, misleading... The "quick hack" would be to define measures that are confidence interval widths of other measures, though this is sub-optimal because:
Also, to re-iterate: I don't think the CV results are a good source to compute standard errors from, naively. |
@ablaom I can write a quick fix for this: if measures is a list, return a namedtuple, else return only the result. Should I go ahead? |
Yes, thanks! At present measures are functions, so you can just keep the types unspecified (abstract). Later when we implement LossFunctions or similar, we can worry about types. |
@ablaom Yeah, a quick fix would be of the form:
Any comments? |
Sounds good. Rather than dispatching on being Function (which may change) I would dispatch on not being a vector. Ie, Note, you have two |
Original issue is resolved. Discussion of a more eleborate design has begun at Turing and I will open a new thread in due course. |
At present one can only specify a single measure in calls to
evaluate!
, as in this example:One wants to be able to specify, for example,
measure=[rms, rmsl, rmsp]
. Maybe change "measure" to "measures".At present return value is single number, for Holdout, and column vector for CV.
I suggest this remain the form of output in the single measure case. For multiple measures, return a named tuple, with keys like
[:rms, :rmsl, :rasp]
and each value being a number or vector according to what would have been returned in the single measure case. (Side note: such a named tuple is a Tables.jl table.)I think this is mostly orthogonal to decisions about implementing loss functions #91, so could proceed forthwith.
The text was updated successfully, but these errors were encountered: