Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revisiting MLJ <-> LossFunctions.jl integration #157

Closed
ablaom opened this issue Apr 3, 2023 · 2 comments
Closed

Revisiting MLJ <-> LossFunctions.jl integration #157

ablaom opened this issue Apr 3, 2023 · 2 comments

Comments

@ablaom
Copy link

ablaom commented Apr 3, 2023

I am significantly reworking/simplifying the measures (metrics) part of MLJBase (to move out to separate pkgs) and
revisiting a sticky point, as it relates to including the metrics from LossFunctions.jl.
At present, all LossFunctions.jl objects (e.g. LPDistLoss()) must be wrapped before
exposing to MLJ users, and this scheme is rather brittle, and I'd like to get rid of it.

The most significant reason we are wrapping in MLJBase is because LossFunctions.jl makes
all losses callable, and MLJBase also does this for its measures, but the calling
syntax is different. For example, the order of ground truth y and prediction ŷ in
MLJBase is reversed, but there are other differences. So we wrap the LossFunctions.jl
losses to get consistent calling syntax from within MLJBase.

Currently LossFunctions.jl is a dependency of MLJBase.jl. Our revised API will live in
separate StatisticalMeasuresBase.jl package (providing a method call(measure, ...)) with
MLJBase measures moving to a new StatisticalMeasures.jl package. At present, my
inclination (option 1) is:

  • drop LossFunctions.jl as (hard) dependency of the new packages

  • implement the new call API for LossFunctions.jl objects directly (no wrapping),
    ideally in LossFunctions.jl itself, or with glue code in StatisticalMeasures.jl which
    makes LossFunctions.jl a soft dependency

  • only make losses provided by StatisticalMeasures.jl callable; measures in other packages
    implementing the call API are free to implement different calling syntax

The downside is innevitable confusion for users who try to call LossFunctions.jl objects
as if the StatisticalMeasures.jl syntax applied.

I have also considered just porting the code from LossFunctions.jl to the new package, if
there was no objection here.

But perhaps others have a better suggestion?

Unfortunately, just extending LossFunctions.jl itself to accommodate the measures in
MLJBase is not an option, as its API is too specialized for the needs of MLJ.

@juliohm
Copy link
Member

juliohm commented Apr 4, 2023

@ablaom I remember back in the days that MLJ measures mixed a bunch of different concepts that are not formally loss functions, but just scores of some sort with tons of traits that didn't help in dispatch.

Can we organize a plan where LossFunctions.jl continues to exist as a self-contained package with just loss functions? We can then make sure that the API of LossFunctions.jl remains simple, the API of PenaltyFunctions.jl is revived and then these two are combined into a more generic ScoreFunctions.jl package that takes them as dependencies and provides a common API?

Any attempt to MLJ-ize the generic packages developed in JuliaML is of course not ideal. We don't want to commit to a particular way of doing ML in Julia, and would like to just provide well-tested building blocks for the different frameworks that exist and will exist in the future.

@ablaom
Copy link
Author

ablaom commented Apr 5, 2023

Your suggestion may be a good one, although it's not clear to me that PenaltyFunctions.jl could meet the needs of statistical metrics in the generality we need them in MLJ. If I'm wrong, and a framework along those line is created, I will be happy to use it, but I don't have the resources to contribute much to such a project. My priority is to provide a standalone metrics pkg that will meet MLJ's needs now and in the longer term, as an upgrade to the substantial built-in framework that exists already.

With this in mind, I am still interested in any comment you have to make on my post above, which you have not addressed.

Can we organize a plan where LossFunctions.jl continues to exist as a self-contained package with just loss functions?

I have no problem with LossFunctions.jl existing as a self-contained package. I'm only pointing out that the decision to make loss functions callable has raised issues in a related project and was wondering if someone might have a suggestion on how to mitigate the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants