Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate models against multiple measures (loss functions) simultaneously. #98

Closed
ablaom opened this issue Mar 7, 2019 · 6 comments
Closed
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@ablaom
Copy link
Member

ablaom commented Mar 7, 2019

At present one can only specify a single measure in calls to evaluate!, as in this example:

julia> using MLJ
julia> task = load_boston();
julia> model = KNNRegressor(K=7);
julia> mach = machine(model, task);
julia> resampling = Holdout()
# Holdout @ 1…81: 
fraction_train          =>   0.7
shuffle                 =>   false

julia> evaluate!(mach, resampling=resampling, measure=rms)
7.929028718838733

One wants to be able to specify, for example, measure=[rms, rmsl, rmsp]. Maybe change "measure" to "measures".

At present return value is single number, for Holdout, and column vector for CV.

I suggest this remain the form of output in the single measure case. For multiple measures, return a named tuple, with keys like [:rms, :rmsl, :rasp] and each value being a number or vector according to what would have been returned in the single measure case. (Side note: such a named tuple is a Tables.jl table.)

I think this is mostly orthogonal to decisions about implementing loss functions #91, so could proceed forthwith.

@ablaom ablaom added enhancement New feature or request good first issue Good for newcomers labels Mar 7, 2019
@fkiraly
Copy link
Collaborator

fkiraly commented Mar 7, 2019

Agreed - this is similar to mlr's benchmark results table, once coerced to data frame.
Though I think the current design is better - i.e., intermediate results (e.g., samples of predictions, samples of losses) are stored somewhere , but what is returned is the aggregate results table, probably the primary object of interest. Is that what you were planning?

Regarding the "measures", I think all should come - optimally by default - with confidence intervals (conditional on fitted machine, i.e., performance guarantees for re-using fitted machine).

Otherwise the comparison is, at best, meaningless, and at worst, misleading...

The "quick hack" would be to define measures that are confidence interval widths of other measures, though this is sub-optimal because:

  • many CI are easy to compute with their "parent measures"
  • the user may wish to specify ahead significance level, or significance-level-after-multiple-testing-correction

Also, to re-iterate: I don't think the CV results are a good source to compute standard errors from, naively.

@ayush1999
Copy link
Contributor

@ablaom I can write a quick fix for this: if measures is a list, return a namedtuple, else return only the result. Should I go ahead?

@ablaom ablaom assigned ablaom and unassigned ablaom Mar 16, 2019
@ablaom
Copy link
Member Author

ablaom commented Mar 16, 2019

Yes, thanks! At present measures are functions, so you can just keep the types unspecified (abstract). Later when we implement LossFunctions or similar, we can worry about types.

@ayush1999
Copy link
Contributor

ayush1999 commented Mar 16, 2019

@ablaom Yeah, a quick fix would be of the form:

if measures isa Function:
    return measures(yhat, y[test])
else
    res = Dict()
    for measure in measures
        fitresult = measure(yhat, y[test])
        res[string(measure)] = fitresult
    end

#convert Dict to namedtuple
return NamedTuple{Tuple(Symbol.(keys(res)))}(values(res)) 

Any comments?

@ablaom
Copy link
Member Author

ablaom commented Mar 18, 2019

Sounds good. Rather than dispatching on being Function (which may change) I would dispatch on not being a vector. Ie, if !(measures isa AbstractVector) etc

Note, you have two evaluate! methods to fix; one for Holdout and one for CV. In the second case you are returning a vector for each measure, in the first, just a number.

@ablaom
Copy link
Member Author

ablaom commented Mar 20, 2019

Original issue is resolved. Discussion of a more eleborate design has begun at Turing and I will open a new thread in due course.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants