Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

name, aliases cleanup for measures #450

Closed
4 tasks done
ablaom opened this issue Nov 16, 2020 · 4 comments · Fixed by #461
Closed
4 tasks done

name, aliases cleanup for measures #450

ablaom opened this issue Nov 16, 2020 · 4 comments · Fixed by #461
Assignees

Comments

@ablaom
Copy link
Member

ablaom commented Nov 16, 2020

tl;dr: see proposals A, B, C below.

I realize that measures is due for a thorough review and expansion (see, in particular https://github.com/alan-turing-institute/MLJBase.jl/issues/299) but this is a moving target (see eg, the very substantial #430) and I would like to fix a couple of minor but annoying things before things run away from us:

The main issue is a sad lack of consistency about the name attribute of a measure and a confusion about what it precisely represents. This makes programatic selection of measures buggy.

(name = area_under_curve, ...)
 (name = accuracy, ...)
 (name = balanced_accuracy, ...)
 (name = cross_entropy, ...)
 (name = FScore, ...)
 (name = false_discovery_rate, ...)
 (name = false_negative, ...)
 (name = false_negative_rate, ...)
 (name = false_positive, ...)
 (name = false_positive_rate, ...)
 (name = l1, ...)
 (name = l2, ...)
 (name = log_cosh, ...)
 (name = mae, ...)
 (name = mape, ...)
 (name = matthews_correlation, ...)
 (name = misclassification_rate, ...)
 (name = negative_predictive_value, ...)
 
 (name = L1HingeLoss(), ...)
 (name = L2HingeLoss(), ...)
 (name = L2MarginLoss(), ...)
 (name = LogitMarginLoss(), ...)
 (name = ModifiedHuberLoss(), ...)
 (name = PerceptronLoss(), ...)
 (name = SigmoidLoss(), ...)
 (name = SmoothedL1HingeLoss(), ...)
 (name = ZeroOneLoss(), ...)
 (name = HuberLoss(), ...)
 (name = L1EpsilonInsLoss(), ...)
 (name = L2EpsilonInsLoss(), ...)
 (name = LPDistLoss(), ...)
 (name = LogitDistLoss(), ...)
 (name = PeriodicLoss(), ...)
 (name = QuantileLoss(), ...)
 (name = confusion_matrix, ...)

Note we already have the docstring trait, which has (mostly) included a list of aliases for common instances of a measure. Recall that a measure is a type and that some measures have fields; for example CrossEntropy has eps, the cutoff threshold to prevent NaN.

In MLJ, models also have a name attribute and this is the string version of the model type; it is not literally the model type because model code is loaded on demand only. So far, we load all measures into scope. However, this might change in the future; think of measures for probabilistic predictions created by MCMC. My first proposal is:

Next, rather than throwing aliases for default instances into the doc-string, we should:

  • B. Add an new trait called aliases or instance_aliases; this returns a vector of strings, whose corresponding Symbol expressions can be evaluated when relevant code has been loaded. For example TruePositiveRate gets the value ["true_positive_rate", "truepositive_rate", "tpr", "sensitivity", "recall", "hit_rate"]. This trait would be listed second in the "registry" (what you print out with measures()) to maximise discoverability.

The final clean-up item would be:

  • C. Ensure that every measure has a default keyword constructor with defaults for all arguments. So FScore() should just work; it currently doesn't.

These changes are technically breaking and would require a new release, but I can't imagine they would not be that disruptive.

Thoughts anyone?

edit

And I forgot:

  • D. Make sure all current measure type names are exported, as well as their aliases.
@ablaom
Copy link
Member Author

ablaom commented Nov 16, 2020

cc @ven-k @OkonSamuel @tlienart @azev77

@ablaom
Copy link
Member Author

ablaom commented Nov 23, 2020

  • Also: import LossFunctions methods instead using them, and re-purpose make LossFunction instances callable to the names to give them same behaviour as the other measures. So, for example, I can do ZeroOneLoss()(yhat, y) where yhat is a vector of UnivariateFinite distributions (rather than floats, as in the LossFunctions API).

@ablaom
Copy link
Member Author

ablaom commented Nov 23, 2020

Get rid of beta as type parameter in FScore.

@ablaom ablaom self-assigned this Nov 23, 2020
@tlienart
Copy link
Collaborator

For what it's worth I just read through your proposal and it seems very reasonable to me; I think (B) in particular is very useful to make it more user friendly 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants