-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
name, aliases cleanup for measures #450
Comments
This was referenced Nov 16, 2020
ablaom
added a commit
that referenced
this issue
Nov 18, 2020
Merged
|
Get rid of |
For what it's worth I just read through your proposal and it seems very reasonable to me; I think (B) in particular is very useful to make it more user friendly 👍 |
This was referenced Nov 26, 2020
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
tl;dr: see proposals A, B, C below.
I realize that measures is due for a thorough review and expansion (see, in particular https://github.com/alan-turing-institute/MLJBase.jl/issues/299) but this is a moving target (see eg, the very substantial #430) and I would like to fix a couple of minor but annoying things before things run away from us:
The main issue is a sad lack of consistency about the
name
attribute of a measure and a confusion about what it precisely represents. This makes programatic selection of measures buggy.Note we already have the
docstring
trait, which has (mostly) included a list of aliases for common instances of a measure. Recall that a measure is a type and that some measures have fields; for exampleCrossEntropy
haseps
, the cutoff threshold to preventNaN
.In MLJ, models also have a name attribute and this is the string version of the model type; it is not literally the model type because model code is loaded on demand only. So far, we load all measures into scope. However, this might change in the future; think of measures for probabilistic predictions created by MCMC. My first proposal is:
name
attribute for each measure, this should just be a "derived" trait that just returns the string version of the type name (as for models), as in"AUC"
,"FScore"
,"CrossEntropy"
(better would be"LogLoss"
; see (https://github.com/alan-turing-institute/MLJBase.jl/issues/299#issuecomment-721520032),"ZeroOneLoss"
, etc.Next, rather than throwing aliases for default instances into the doc-string, we should:
aliases
orinstance_aliases
; this returns a vector of strings, whose correspondingSymbol
expressions can be evaluated when relevant code has been loaded. For exampleTruePositiveRate
gets the value["true_positive_rate", "truepositive_rate", "tpr", "sensitivity", "recall", "hit_rate"]
. This trait would be listed second in the "registry" (what you print out withmeasures()
) to maximise discoverability.The final clean-up item would be:
FScore()
should just work; it currently doesn't.These changes are technically breaking and would require a new release, but I can't imagine they would not be that disruptive.
Thoughts anyone?
edit
And I forgot:
The text was updated successfully, but these errors were encountered: