Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: ability to convert scitype warnings into errors #908

Closed
DilumAluthge opened this issue Mar 4, 2022 · 13 comments · Fixed by JuliaAI/MLJBase.jl#753
Closed
Assignees

Comments

@DilumAluthge
Copy link
Member

DilumAluthge commented Mar 4, 2022

Sometimes, when I do mach = MLJ.machine(model, X, y), it prints one or more warnings of the form:

Warning: the scitype of `X` is incompatible with...
Warning: the scitype of `y` is incompatible with...

Is there a way to convert those warnings to errors that are thrown immediately?

@ablaom
Copy link
Member

ablaom commented Mar 6, 2022

Interesting request.

I guess these are some possibilities:

  1. change the default behavior to "throw an error" (with a keyword argument to override - the error message explains this override option)
  2. add a kwarg to force error if required
  3. introduce global variable to determine behaviour
  4. throw an error in all cases

Option 1 is breaking but if that's the best behaviour that doesn't bother me (we have new breaking release in the works which already tweaks the type checking behaviour; see here). I'm not so keen on 3 myself - I try to avoid "hidden knowledge", which this would probably be. Not big on 4 either.

Thoughts?

@tlienart @OkonSamuel

@ablaom
Copy link
Member

ablaom commented Mar 6, 2022

For the record, if the model declares Unkown scitypes (the fallback for models that don't declare anything) then type checks are completely ignored.

@tlienart
Copy link
Collaborator

tlienart commented Mar 7, 2022

IMO this should be mitigated by how frequent this is going to happen for a beginner user. And given it's quite common (e.g. people trying class stuff with continuous etc) if they get an error by default it will be really off-putting.

In that sense I'd prefer 3, a bit like you could have a global toggle for verbosity or indeed for colouring

@ablaom
Copy link
Member

ablaom commented Mar 7, 2022

@tlienart Thanks for that. I am finding, however, that at least 90% of the time the warning ultimately leads to an error downstream (fit or predict). Perhaps you have a different experience?

@tlienart
Copy link
Collaborator

tlienart commented Mar 7, 2022

you're far better placed to judge this; my impression from working with folks who were used to scikitlearn is that with scikitlearn users typically don't think about data types (they should but they don't) and sklearn often kind of does the right thing anyway.
If I recall properly, you can get warnings in MLJ about harmless situations e.g. when a binary or numeric categorical feature gets cast to float, things should still work even though it's not ideal.

Maybe being strict is better here but then there must be significant documentation exposed by the code when errors show up about what to do (without having the users needing to read the entire scitypes docs, even if they should they won't want to do that whereas they might read a well crafted docstring). Just my impression though, I don't have as much experience as you guys with MLJ users

@ablaom
Copy link
Member

ablaom commented Mar 7, 2022

@DilumAluthge Your thoughts?

@DilumAluthge
Copy link
Member Author

I definitely don't like option 4, because I do think that users should have the ability to choose between the warning and the error.

Can you explain a little bit what option 3 would be? By "global option", do you mean e.g. an environment variable ENV["JULIA_MLJ_SCITYPE_WERROR"] = "error"? I'm not a huge fan of using global environment variables in situations like this.

I think the main two options to consider are option 1 and option 2. In both of those cases, when the user wants to specify what behavior they want, they would do one of the two following:

# The user does this if they want the scitype warning to just be a warning:
mach = MLJ.machine(model, X, y; scitype_check = :warn)

# The user does this if they want the scitype warning to be an error:
mach = MLJ.machine(model, X, y; scitype_check = :error)

And then, the only difference between option 1 and option 2 is what the default value of the scitype_check kwarg would be, i.e.:

  1. Option 1: The default value is scitype_check = :error.
  2. Option 2: The default value is scitype_check = :warn.

I think @tlienart has a good point about how new users might have a bad experience with option 1. So I think my recommendation would be to go with option 2. (But it would be good to get some more details on what the interface for option 3 would look like.)

@ablaom
Copy link
Member

ablaom commented Mar 16, 2022

Okay, let's go with option 2: new kwarg, default is :warn.

@davnn
Copy link
Collaborator

davnn commented Mar 30, 2022

Okay, let's go with option 2: new kwarg, default is :warn.

I would propose to combine 2 & 3 and, instead of using check_level=1, use check_level=DEFAULT_CHECK_LEVEL and provide a set_default_checklevel to let users configure the default.

global DEFAULT_CHECK_LEVEL = 1
set_default_checklevel(i::Integer) = (global DEFAULT_CHECK_LEVEL = i)

I'm not 100% sure if that works, though.

Edit: We might also discuss the naming of the check_level, imho something like check_scitypes would be more descriptive, or not?

@ablaom
Copy link
Member

ablaom commented Mar 30, 2022

Edit: We might also discuss the naming of the check_level, imho something like check_scitypes would be more descriptive, or not?

How about scitype_check_level?

@ablaom
Copy link
Member

ablaom commented Mar 31, 2022

@davnn 's suggestion sounds reasonable, and now implemented at JuliaAI/MLJBase.jl#753.

@davnn
Copy link
Collaborator

davnn commented Mar 31, 2022

Edit: We might also discuss the naming of the check_level, imho something like check_scitypes would be more descriptive, or not?

How about scitype_check_level?

Maybe scitype_check is enough, or does that sound too much like a boolean argument? We also don't use verbosity_level after all..

@davnn 's suggestion sounds reasonable, and now implemented at JuliaAI/MLJBase.jl#753.

Thanks a lot for investigating how to implement such package-wide options - interesting and nice to know that it works!

@ablaom
Copy link
Member

ablaom commented Apr 1, 2022

@davnn Thanks for the feedback.

I might imagine verbose could be a Bool, but not verbosity.

I think I'll stick with scitype_check_level to be sure there's no confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants