Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to know which models are regression models? #191

Closed
juliohm opened this issue Jul 30, 2019 · 9 comments
Closed

How to know which models are regression models? #191

juliohm opened this issue Jul 30, 2019 · 9 comments

Comments

@juliohm
Copy link
Contributor

juliohm commented Jul 30, 2019

Is your feature request related to a problem? Please describe.
I am interested in asserting that models are compatible with tasks. For example, I have defined a set of learning tasks here, and would like to make sure that for example a DecisionTreeRegressor can be used for a RegressionTask but not a DecisionTreeClassifier.

Describe the solution you'd like
I would like to have a trait for the models that tells whether or not the model is a regressor, a classifier, or a clustering model.

Describe alternatives you've considered
Didn't think of alternatives. Type traits seem like a good solution.

Additional context
I have strategies implemented for applying MLJ models in spatial problems. Being able to distinguish the models according to their nature is essential to a robust implementation.

I am happy to provide a PR if it is welcome.

@ablaom
Copy link
Member

ablaom commented Jul 30, 2019

There is a trait. It's called MLJBase.target_scitype_union .

You could do this:

julia> models(x -> x[:is_supervised] && x[:target_scitype_union]==Continuous)
Dict{Any,Any} with 7 entries:
  "MultivariateStats" => Any["RidgeRegressor"]
  "MLJ"               => Any["MLJ.Constant.DeterministicConstantRegressor", "KN…
  "DecisionTree"      => Any["DecisionTreeRegressor"]
  "ScikitLearn"       => Any["SVMLRegressor", "ElasticNet", "ElasticNetCV", "SV…
  "LIBSVM"            => Any["EpsilonSVR", "NuSVR"]
  "GLM"               => Any["OLSRegressor"]
  "XGBoost"           => Any["XGBoostRegressor"]

Try info(model) or info("model name") to get other searchable traits.
The current task interface also takes care of this kind of thing automatically:

julia> task = load_boston()
SupervisedTask @ 4…88

julia> task.target_scitype_union
Continuous

julia> models(task)
Dict{Any,Any} with 6 entries:
  "MultivariateStats" => Any["RidgeRegressor"]
  "MLJ"               => Any["MLJ.Constant.DeterministicConstantRegressor", "KN…
  "DecisionTree"      => Any["DecisionTreeRegressor"]
  "ScikitLearn"       => Any["SVMLRegressor", "ElasticNet", "ElasticNetCV", "SV…
  "LIBSVM"            => Any["EpsilonSVR", "NuSVR"]
  "XGBoost"           => Any["XGBoostRegressor"]

A smaller list, because this task has other traits being matched simultaneously.

@ablaom ablaom closed this as completed Jul 30, 2019
@juliohm
Copy link
Contributor Author

juliohm commented Jul 31, 2019

I dislike this API. It boils down to that issue #166 of tasks containing information about data. Now, because of this design choice, a model is considered a regression model because it is being applied to continuous data. So if I apply a classifier to data, which happens to be Float64, this classifier will be a regressor?

Can we have a trait for the model itself which is independent of the data?

@juliohm
Copy link
Contributor Author

juliohm commented Jul 31, 2019

We already have this information in most model names DecisionTreeRegressor, SVMLRegressor, which is an indicator that this property or trait should belong to the model without any reference to the data.

Issue #166 is really at the core of all my concerns regarding the current design.

@juliohm
Copy link
Contributor Author

juliohm commented Jul 31, 2019

In other words, given a MLJBase.Model, how do I know it is a regression model? Do I need to create a task to extract this information?

@ablaom
Copy link
Member

ablaom commented Jul 31, 2019

I think you misunderstand. input_scitype_union is a model trait:

julia> @load DecisionTreeRegressor
julia> MLJ.target_scitype_union(DecisionTreeRegressor())
Continuous

So you don't need tasks to do this, but, in the current design, a task infers this trait from the data it wraps.

@juliohm
Copy link
Contributor Author

juliohm commented Jul 31, 2019

I see, thanks for the clarification. That is perfect.

@ablaom
Copy link
Member

ablaom commented Jul 31, 2019

BTW. Expect a small change soon to the trait definitions to make them more general (less biased to Tabular data) and conceptually simpler.

@juliohm
Copy link
Contributor Author

juliohm commented Jul 31, 2019

Looking forward to it 👍

@fkiraly
Copy link
Collaborator

fkiraly commented Aug 1, 2019

(you should be able to use scitype to get this information easily, right, @ablaom ?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants