name, aliases cleanup for measures #450

ablaom · 2020-11-16T00:04:41Z

tl;dr: see proposals A, B, C below.

I realize that measures is due for a thorough review and expansion (see, in particular https://github.com/alan-turing-institute/MLJBase.jl/issues/299) but this is a moving target (see eg, the very substantial #430) and I would like to fix a couple of minor but annoying things before things run away from us:

The main issue is a sad lack of consistency about the name attribute of a measure and a confusion about what it precisely represents. This makes programatic selection of measures buggy.

(name = area_under_curve, ...)
 (name = accuracy, ...)
 (name = balanced_accuracy, ...)
 (name = cross_entropy, ...)
 (name = FScore, ...)
 (name = false_discovery_rate, ...)
 (name = false_negative, ...)
 (name = false_negative_rate, ...)
 (name = false_positive, ...)
 (name = false_positive_rate, ...)
 (name = l1, ...)
 (name = l2, ...)
 (name = log_cosh, ...)
 (name = mae, ...)
 (name = mape, ...)
 (name = matthews_correlation, ...)
 (name = misclassification_rate, ...)
 (name = negative_predictive_value, ...)
 ⋮
 (name = L1HingeLoss(), ...)
 (name = L2HingeLoss(), ...)
 (name = L2MarginLoss(), ...)
 (name = LogitMarginLoss(), ...)
 (name = ModifiedHuberLoss(), ...)
 (name = PerceptronLoss(), ...)
 (name = SigmoidLoss(), ...)
 (name = SmoothedL1HingeLoss(), ...)
 (name = ZeroOneLoss(), ...)
 (name = HuberLoss(), ...)
 (name = L1EpsilonInsLoss(), ...)
 (name = L2EpsilonInsLoss(), ...)
 (name = LPDistLoss(), ...)
 (name = LogitDistLoss(), ...)
 (name = PeriodicLoss(), ...)
 (name = QuantileLoss(), ...)
 (name = confusion_matrix, ...)

Note we already have the docstring trait, which has (mostly) included a list of aliases for common instances of a measure. Recall that a measure is a type and that some measures have fields; for example CrossEntropy has eps, the cutoff threshold to prevent NaN.

In MLJ, models also have a name attribute and this is the string version of the model type; it is not literally the model type because model code is loaded on demand only. So far, we load all measures into scope. However, this might change in the future; think of measures for probabilistic predictions created by MCMC. My first proposal is:

A. Rather than assigning the name attribute for each measure, this should just be a "derived" trait that just returns the string version of the type name (as for models), as in "AUC", "FScore", "CrossEntropy" (better would be "LogLoss"; see (https://github.com/alan-turing-institute/MLJBase.jl/issues/299#issuecomment-721520032), "ZeroOneLoss", etc.

Next, rather than throwing aliases for default instances into the doc-string, we should:

B. Add an new trait called aliases or instance_aliases; this returns a vector of strings, whose corresponding Symbol expressions can be evaluated when relevant code has been loaded. For example TruePositiveRate gets the value ["true_positive_rate", "truepositive_rate", "tpr", "sensitivity", "recall", "hit_rate"]. This trait would be listed second in the "registry" (what you print out with measures()) to maximise discoverability.

The final clean-up item would be:

C. Ensure that every measure has a default keyword constructor with defaults for all arguments. So FScore() should just work; it currently doesn't.

These changes are technically breaking and would require a new release, but I can't imagine they would not be that disruptive.

Thoughts anyone?

edit

And I forgot:

D. Make sure all current measure type names are exported, as well as their aliases.

The text was updated successfully, but these errors were encountered:

ablaom · 2020-11-16T00:05:43Z

cc @ven-k @OkonSamuel @tlienart @azev77

ablaom · 2020-11-23T20:18:50Z

Also: ~~import LossFunctions methods instead using them, and re-purpose~~ make LossFunction instances callable to ~~the names to~~ give them same behaviour as the other measures. So, for example, I can do ZeroOneLoss()(yhat, y) where yhat is a vector of UnivariateFinite distributions (rather than floats, as in the LossFunctions API).

ablaom · 2020-11-23T20:22:28Z

Get rid of beta as type parameter in FScore.

tlienart · 2020-11-24T08:54:13Z

For what it's worth I just read through your proposal and it seems very reasonable to me; I think (B) in particular is very useful to make it more user friendly 👍

This was referenced Nov 16, 2020

[Tracking] Improvements to measures JuliaAI/StatisticalMeasures.jl#17

Open

Meta issue: lssues for possible collaboration with UCL JuliaAI/MLJ.jl#673

Closed

ablaom added a commit that referenced this issue Nov 18, 2020

update name for new multiclass measures to anticipate #450

076a865

ablaom mentioned this issue Nov 23, 2020

Multiclass 3 #456

Merged

ablaom self-assigned this Nov 23, 2020

This was referenced Nov 26, 2020

Cleanup the measures (aka loss functions and scores) #459

Merged

For a 0.16.0 release #461

Merged

ablaom closed this as completed in #461 Nov 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

name, aliases cleanup for measures #450

name, aliases cleanup for measures #450

ablaom commented Nov 16, 2020 •

edited

Loading

ablaom commented Nov 16, 2020

ablaom commented Nov 23, 2020 •

edited

Loading

ablaom commented Nov 23, 2020

tlienart commented Nov 24, 2020

name, aliases cleanup for measures #450

name, aliases cleanup for measures #450

Comments

ablaom commented Nov 16, 2020 • edited Loading

ablaom commented Nov 16, 2020

ablaom commented Nov 23, 2020 • edited Loading

ablaom commented Nov 23, 2020

tlienart commented Nov 24, 2020

ablaom commented Nov 16, 2020 •

edited

Loading

ablaom commented Nov 23, 2020 •

edited

Loading