Feature request: ability to convert scitype warnings into errors #908

DilumAluthge · 2022-03-04T08:16:42Z

Sometimes, when I do mach = MLJ.machine(model, X, y), it prints one or more warnings of the form:

Warning: the scitype of `X` is incompatible with...
Warning: the scitype of `y` is incompatible with...

Is there a way to convert those warnings to errors that are thrown immediately?

The text was updated successfully, but these errors were encountered:

ablaom · 2022-03-06T21:17:08Z

Interesting request.

I guess these are some possibilities:

change the default behavior to "throw an error" (with a keyword argument to override - the error message explains this override option)
add a kwarg to force error if required
introduce global variable to determine behaviour
throw an error in all cases

Option 1 is breaking but if that's the best behaviour that doesn't bother me (we have new breaking release in the works which already tweaks the type checking behaviour; see here). I'm not so keen on 3 myself - I try to avoid "hidden knowledge", which this would probably be. Not big on 4 either.

Thoughts?

@tlienart @OkonSamuel

ablaom · 2022-03-06T21:18:48Z

For the record, if the model declares Unkown scitypes (the fallback for models that don't declare anything) then type checks are completely ignored.

tlienart · 2022-03-07T07:15:30Z

IMO this should be mitigated by how frequent this is going to happen for a beginner user. And given it's quite common (e.g. people trying class stuff with continuous etc) if they get an error by default it will be really off-putting.

In that sense I'd prefer 3, a bit like you could have a global toggle for verbosity or indeed for colouring

ablaom · 2022-03-07T19:56:38Z

@tlienart Thanks for that. I am finding, however, that at least 90% of the time the warning ultimately leads to an error downstream (fit or predict). Perhaps you have a different experience?

tlienart · 2022-03-07T20:31:39Z

you're far better placed to judge this; my impression from working with folks who were used to scikitlearn is that with scikitlearn users typically don't think about data types (they should but they don't) and sklearn often kind of does the right thing anyway.
If I recall properly, you can get warnings in MLJ about harmless situations e.g. when a binary or numeric categorical feature gets cast to float, things should still work even though it's not ideal.

Maybe being strict is better here but then there must be significant documentation exposed by the code when errors show up about what to do (without having the users needing to read the entire scitypes docs, even if they should they won't want to do that whereas they might read a well crafted docstring). Just my impression though, I don't have as much experience as you guys with MLJ users

ablaom · 2022-03-07T20:35:40Z

@DilumAluthge Your thoughts?

DilumAluthge · 2022-03-07T20:49:07Z

I definitely don't like option 4, because I do think that users should have the ability to choose between the warning and the error.

Can you explain a little bit what option 3 would be? By "global option", do you mean e.g. an environment variable ENV["JULIA_MLJ_SCITYPE_WERROR"] = "error"? I'm not a huge fan of using global environment variables in situations like this.

I think the main two options to consider are option 1 and option 2. In both of those cases, when the user wants to specify what behavior they want, they would do one of the two following:

# The user does this if they want the scitype warning to just be a warning:
mach = MLJ.machine(model, X, y; scitype_check = :warn)

# The user does this if they want the scitype warning to be an error:
mach = MLJ.machine(model, X, y; scitype_check = :error)

And then, the only difference between option 1 and option 2 is what the default value of the scitype_check kwarg would be, i.e.:

Option 1: The default value is scitype_check = :error.
Option 2: The default value is scitype_check = :warn.

I think @tlienart has a good point about how new users might have a bad experience with option 1. So I think my recommendation would be to go with option 2. (But it would be good to get some more details on what the interface for option 3 would look like.)

ablaom · 2022-03-16T20:49:18Z

Okay, let's go with option 2: new kwarg, default is :warn.

davnn · 2022-03-30T12:05:05Z

Okay, let's go with option 2: new kwarg, default is :warn.

I would propose to combine 2 & 3 and, instead of using check_level=1, use check_level=DEFAULT_CHECK_LEVEL and provide a set_default_checklevel to let users configure the default.

global DEFAULT_CHECK_LEVEL = 1
set_default_checklevel(i::Integer) = (global DEFAULT_CHECK_LEVEL = i)

I'm not 100% sure if that works, though.

Edit: We might also discuss the naming of the check_level, imho something like check_scitypes would be more descriptive, or not?

ablaom · 2022-03-30T20:40:41Z

Edit: We might also discuss the naming of the check_level, imho something like check_scitypes would be more descriptive, or not?

How about scitype_check_level?

ablaom · 2022-03-31T02:56:10Z

@davnn 's suggestion sounds reasonable, and now implemented at JuliaAI/MLJBase.jl#753.

davnn · 2022-03-31T12:31:26Z

Edit: We might also discuss the naming of the check_level, imho something like check_scitypes would be more descriptive, or not?

How about scitype_check_level?

Maybe scitype_check is enough, or does that sound too much like a boolean argument? We also don't use verbosity_level after all..

@davnn 's suggestion sounds reasonable, and now implemented at JuliaAI/MLJBase.jl#753.

Thanks a lot for investigating how to implement such package-wide options - interesting and nice to know that it works!

ablaom · 2022-04-01T02:39:51Z

@davnn Thanks for the feedback.

I might imagine verbose could be a Bool, but not verbosity.

I think I'll stick with scitype_check_level to be sure there's no confusion.

ablaom self-assigned this Mar 25, 2022

ablaom mentioned this issue Mar 29, 2022

Allow user to control the level of scitype checks in machine constructor JuliaAI/MLJBase.jl#753

Merged

ablaom closed this as completed in JuliaAI/MLJBase.jl#753 May 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: ability to convert scitype warnings into errors #908

Feature request: ability to convert scitype warnings into errors #908

DilumAluthge commented Mar 4, 2022 •

edited

ablaom commented Mar 6, 2022 •

edited

ablaom commented Mar 6, 2022

tlienart commented Mar 7, 2022 •

edited

ablaom commented Mar 7, 2022

tlienart commented Mar 7, 2022 •

edited

ablaom commented Mar 7, 2022

DilumAluthge commented Mar 7, 2022

ablaom commented Mar 16, 2022

davnn commented Mar 30, 2022 •

edited

ablaom commented Mar 30, 2022

ablaom commented Mar 31, 2022

davnn commented Mar 31, 2022

ablaom commented Apr 1, 2022

Feature request: ability to convert scitype warnings into errors #908

Feature request: ability to convert scitype warnings into errors #908

Comments

DilumAluthge commented Mar 4, 2022 • edited

ablaom commented Mar 6, 2022 • edited

ablaom commented Mar 6, 2022

tlienart commented Mar 7, 2022 • edited

ablaom commented Mar 7, 2022

tlienart commented Mar 7, 2022 • edited

ablaom commented Mar 7, 2022

DilumAluthge commented Mar 7, 2022

ablaom commented Mar 16, 2022

davnn commented Mar 30, 2022 • edited

ablaom commented Mar 30, 2022

ablaom commented Mar 31, 2022

davnn commented Mar 31, 2022

ablaom commented Apr 1, 2022

DilumAluthge commented Mar 4, 2022 •

edited

ablaom commented Mar 6, 2022 •

edited

tlienart commented Mar 7, 2022 •

edited

tlienart commented Mar 7, 2022 •

edited

davnn commented Mar 30, 2022 •

edited