Evaluate models against multiple measures (loss functions) simultaneously. #98

ablaom · 2019-03-07T02:04:56Z

At present one can only specify a single measure in calls to evaluate!, as in this example:

julia> using MLJ
julia> task = load_boston();
julia> model = KNNRegressor(K=7);
julia> mach = machine(model, task);
julia> resampling = Holdout()
# Holdout @ 1…81: 
fraction_train          =>   0.7
shuffle                 =>   false

julia> evaluate!(mach, resampling=resampling, measure=rms)
7.929028718838733

One wants to be able to specify, for example, measure=[rms, rmsl, rmsp]. Maybe change "measure" to "measures".

At present return value is single number, for Holdout, and column vector for CV.

I suggest this remain the form of output in the single measure case. For multiple measures, return a named tuple, with keys like [:rms, :rmsl, :rasp] and each value being a number or vector according to what would have been returned in the single measure case. (Side note: such a named tuple is a Tables.jl table.)

I think this is mostly orthogonal to decisions about implementing loss functions #91, so could proceed forthwith.

The text was updated successfully, but these errors were encountered:

fkiraly · 2019-03-07T09:49:36Z

Agreed - this is similar to mlr's benchmark results table, once coerced to data frame.
Though I think the current design is better - i.e., intermediate results (e.g., samples of predictions, samples of losses) are stored somewhere , but what is returned is the aggregate results table, probably the primary object of interest. Is that what you were planning?

Regarding the "measures", I think all should come - optimally by default - with confidence intervals (conditional on fitted machine, i.e., performance guarantees for re-using fitted machine).

Otherwise the comparison is, at best, meaningless, and at worst, misleading...

The "quick hack" would be to define measures that are confidence interval widths of other measures, though this is sub-optimal because:

many CI are easy to compute with their "parent measures"
the user may wish to specify ahead significance level, or significance-level-after-multiple-testing-correction

Also, to re-iterate: I don't think the CV results are a good source to compute standard errors from, naively.

ayush1999 · 2019-03-16T12:08:02Z

@ablaom I can write a quick fix for this: if measures is a list, return a namedtuple, else return only the result. Should I go ahead?

ablaom · 2019-03-16T13:54:57Z

Yes, thanks! At present measures are functions, so you can just keep the types unspecified (abstract). Later when we implement LossFunctions or similar, we can worry about types.

ayush1999 · 2019-03-16T16:27:11Z

@ablaom Yeah, a quick fix would be of the form:

if measures isa Function:
    return measures(yhat, y[test])
else
    res = Dict()
    for measure in measures
        fitresult = measure(yhat, y[test])
        res[string(measure)] = fitresult
    end

#convert Dict to namedtuple
return NamedTuple{Tuple(Symbol.(keys(res)))}(values(res))

Any comments?

ablaom · 2019-03-18T08:50:13Z

Sounds good. Rather than dispatching on being Function (which may change) I would dispatch on not being a vector. Ie, if !(measures isa AbstractVector) etc

Note, you have two evaluate! methods to fix; one for Holdout and one for CV. In the second case you are returning a vector for each measure, in the first, just a number.

ablaom · 2019-03-20T16:25:23Z

Original issue is resolved. Discussion of a more eleborate design has begun at Turing and I will open a new thread in due course.

ablaom added enhancement New feature or request good first issue Good for newcomers labels Mar 7, 2019

ablaom assigned ablaom and unassigned ablaom Mar 16, 2019

ayush1999 mentioned this issue Mar 18, 2019

Evaluate against multiple measures #104

Merged

ablaom closed this as completed Mar 20, 2019

drcxcruz mentioned this issue Jul 15, 2020

LogisticModel: pdf(ŷ[i], 1) does not work after the last release #599

Closed

egolep mentioned this issue Jul 4, 2021

Following end-to-end tutorial on AMES but got error #815

Closed

egolep mentioned this issue Jul 12, 2021

MinMaxScaler (and more) #816

Open

refrancesco6 mentioned this issue Aug 16, 2021

Error MLJ in linux #833

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate models against multiple measures (loss functions) simultaneously. #98

Evaluate models against multiple measures (loss functions) simultaneously. #98

ablaom commented Mar 7, 2019

fkiraly commented Mar 7, 2019

ayush1999 commented Mar 16, 2019

ablaom commented Mar 16, 2019

ayush1999 commented Mar 16, 2019 •

edited

Loading

ablaom commented Mar 18, 2019

ablaom commented Mar 20, 2019

Evaluate models against multiple measures (loss functions) simultaneously. #98

Evaluate models against multiple measures (loss functions) simultaneously. #98

Comments

ablaom commented Mar 7, 2019

fkiraly commented Mar 7, 2019

ayush1999 commented Mar 16, 2019

ablaom commented Mar 16, 2019

ayush1999 commented Mar 16, 2019 • edited Loading

ablaom commented Mar 18, 2019

ablaom commented Mar 20, 2019

ayush1999 commented Mar 16, 2019 •

edited

Loading