How to know which models are regression models? #191

juliohm · 2019-07-30T15:48:32Z

Is your feature request related to a problem? Please describe.
I am interested in asserting that models are compatible with tasks. For example, I have defined a set of learning tasks here, and would like to make sure that for example a DecisionTreeRegressor can be used for a RegressionTask but not a DecisionTreeClassifier.

Describe the solution you'd like
I would like to have a trait for the models that tells whether or not the model is a regressor, a classifier, or a clustering model.

Describe alternatives you've considered
Didn't think of alternatives. Type traits seem like a good solution.

Additional context
I have strategies implemented for applying MLJ models in spatial problems. Being able to distinguish the models according to their nature is essential to a robust implementation.

I am happy to provide a PR if it is welcome.

The text was updated successfully, but these errors were encountered:

ablaom · 2019-07-30T23:30:57Z

There is a trait. It's called MLJBase.target_scitype_union .

You could do this:

julia> models(x -> x[:is_supervised] && x[:target_scitype_union]==Continuous)
Dict{Any,Any} with 7 entries:
  "MultivariateStats" => Any["RidgeRegressor"]
  "MLJ"               => Any["MLJ.Constant.DeterministicConstantRegressor", "KN…
  "DecisionTree"      => Any["DecisionTreeRegressor"]
  "ScikitLearn"       => Any["SVMLRegressor", "ElasticNet", "ElasticNetCV", "SV…
  "LIBSVM"            => Any["EpsilonSVR", "NuSVR"]
  "GLM"               => Any["OLSRegressor"]
  "XGBoost"           => Any["XGBoostRegressor"]

Try info(model) or info("model name") to get other searchable traits.
The current task interface also takes care of this kind of thing automatically:

julia> task = load_boston()
SupervisedTask @ 4…88

julia> task.target_scitype_union
Continuous

julia> models(task)
Dict{Any,Any} with 6 entries:
  "MultivariateStats" => Any["RidgeRegressor"]
  "MLJ"               => Any["MLJ.Constant.DeterministicConstantRegressor", "KN…
  "DecisionTree"      => Any["DecisionTreeRegressor"]
  "ScikitLearn"       => Any["SVMLRegressor", "ElasticNet", "ElasticNetCV", "SV…
  "LIBSVM"            => Any["EpsilonSVR", "NuSVR"]
  "XGBoost"           => Any["XGBoostRegressor"]

A smaller list, because this task has other traits being matched simultaneously.

juliohm · 2019-07-31T00:17:26Z

I dislike this API. It boils down to that issue #166 of tasks containing information about data. Now, because of this design choice, a model is considered a regression model because it is being applied to continuous data. So if I apply a classifier to data, which happens to be Float64, this classifier will be a regressor?

Can we have a trait for the model itself which is independent of the data?

juliohm · 2019-07-31T00:20:20Z

We already have this information in most model names DecisionTreeRegressor, SVMLRegressor, which is an indicator that this property or trait should belong to the model without any reference to the data.

Issue #166 is really at the core of all my concerns regarding the current design.

juliohm · 2019-07-31T00:22:36Z

In other words, given a MLJBase.Model, how do I know it is a regression model? Do I need to create a task to extract this information?

ablaom · 2019-07-31T00:29:53Z

I think you misunderstand. input_scitype_union is a model trait:

julia> @load DecisionTreeRegressor
julia> MLJ.target_scitype_union(DecisionTreeRegressor())
Continuous

So you don't need tasks to do this, but, in the current design, a task infers this trait from the data it wraps.

juliohm · 2019-07-31T00:35:43Z

I see, thanks for the clarification. That is perfect.

ablaom · 2019-07-31T00:50:56Z

BTW. Expect a small change soon to the trait definitions to make them more general (less biased to Tabular data) and conceptually simpler.

juliohm · 2019-07-31T01:02:54Z

Looking forward to it 👍

fkiraly · 2019-08-01T15:23:00Z

(you should be able to use scitype to get this information easily, right, @ablaom ?)

ablaom closed this as completed Jul 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to know which models are regression models? #191

How to know which models are regression models? #191

juliohm commented Jul 30, 2019

ablaom commented Jul 30, 2019

juliohm commented Jul 31, 2019

juliohm commented Jul 31, 2019

juliohm commented Jul 31, 2019

ablaom commented Jul 31, 2019

juliohm commented Jul 31, 2019

ablaom commented Jul 31, 2019

juliohm commented Jul 31, 2019

fkiraly commented Aug 1, 2019

How to know which models are regression models? #191

How to know which models are regression models? #191

Comments

juliohm commented Jul 30, 2019

ablaom commented Jul 30, 2019

juliohm commented Jul 31, 2019

juliohm commented Jul 31, 2019

juliohm commented Jul 31, 2019

ablaom commented Jul 31, 2019

juliohm commented Jul 31, 2019

ablaom commented Jul 31, 2019

juliohm commented Jul 31, 2019

fkiraly commented Aug 1, 2019