Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Classification task: no models #153

Closed
davidbp opened this issue Jun 3, 2019 · 10 comments
Closed

Classification task: no models #153

davidbp opened this issue Jun 3, 2019 · 10 comments
Labels
bug Something isn't working

Comments

@davidbp
Copy link
Contributor

davidbp commented Jun 3, 2019

I wanted to see what models would MLJ propose for the iris dataset and got an empty dict.

julia> task = load_iris()
SupervisedTask{} @ 1…54


julia> models(task)
Dict{String,Any} with 0 entries

julia> task = load_boston()
SupervisedTask{} @ 2…35


julia> models(task)
Dict{String,Any} with 6 entries:
  "MultivariateStats" => Any["RidgeRegressor"]
  "MLJ"               => Any["KNNRegressor"]
  "DecisionTree"      => Any["DecisionTreeRegressor"]
  "ScikitLearn"       => Any["ElasticNet", "ElasticNetCV", "SVMRegressor", "SVMLRegressor", "SVMNuRegressor"]
  "LIBSVM"            => Any["EpsilonSVR", "NuSVR"]
  "XGBoost"           => Any["XGBoostRegressor"]

If I look simply at models I can clearly see a bunch of classification options

models()
Dict{Any,Any} with 9 entries:
  "MultivariateStats" => Any["RidgeRegressor", "PCA"]
  "MLJ"               => Any["ConstantClassifier", "Standardizer", "SimpleRidgeRegressor", "ProbabilisticNetwork", "OneHotEncoder", "Resampler", "KNNRegressor", "UnivariateBoxCoxTransform…
  "DecisionTree"      => Any["DecisionTreeRegressor", "DecisionTreeClassifier"]
  "ScikitLearn"       => Any["SVMLClassifier", "SVMNuClassifier", "ElasticNet", "ElasticNetCV", "SVMRegressor", "SVMLRegressor", "SVMNuRegressor", "SVMClassifier"]
  "LIBSVM"            => Any["EpsilonSVR", "LinearSVC", "NuSVR", "NuSVC", "SVC", "OneClassSVM"]
  "Clustering"        => Any["KMeans", "KMedoids"]
  "GLM"               => Any["OLSRegressor", "GLMCountRegressor"]
  "NaiveBayes"        => Any["GaussianNBClassifier", "MultinomialNBClassifier"]
  "XGBoost"           => Any["XGBoostCount", "XGBoostRegressor", "XGBoostClassifier"]

There are some options in the filter inside .julia/packages/MLJ/CbtVd/src/loading.jl that seem to give a somewhat unexpected behaviour.

My info

  [add582a8] MLJ v0.2.2
  [a7f614a8] MLJBase v0.2.2
  [d491faf4] MLJModels v0.2.3
  [a93c6f00] DataFrames v0.18.3
@davidbp davidbp closed this as completed Jun 3, 2019
@davidbp davidbp reopened this Jun 3, 2019
@ablaom
Copy link
Member

ablaom commented Jun 3, 2019

You are right, this is a bug! Thanks. But a curly one:

julia> using MLJ
julia> task = load_iris()
julia> task.target_scitype_union
Multiclass{3}

julia> info("SVMClassifier")[:target_scitype_union]
Finite

julia> Multiclass{3} <: Finite
true

But check this out:

julia> task.target_scitype_union <: info("SVMClassifier")[:target_scitype_union]
false

?????

There appear to be two distinct Finite, for despite

julia> info("SVMClassifier")[:target_scitype_union] === Finite
true

we have

julia> objectid(Finite)
0x2dc1c6a52a98fbfa

julia> objectid(info("SVMClassifier")[:target_scitype_union])
0x55508700e378482e

My only thought at the moment is that this has something to do with the JSON encoding/decoding of the model metadata. But this is weird.

@ablaom ablaom added the bug Something isn't working label Jun 4, 2019
@davidbp
Copy link
Contributor Author

davidbp commented Jun 4, 2019

I can reproduce what you explain.

model = "DecisionTreeClassifier"
pkg = "DecisionTree"
task = load_iris()
info_ =  MLJ.metadata()[pkg][model]

### It seems this function "models" in "MLJ/CbtVd/src/loading.jl" breaks because this statement is false
task.target_scitype_union <: info_[:target_scitype_union]
#returns false

Multiclass{3} <: Finite
#returns true

## Even though
task.target_scitype_union
#returns Multiclass{3}

info_[:target_scitype_union]
#returns Finite

Are there 2 "Finite" types defined in different places? Could this be happening because we might have several MLJ versions and MLJBase versions installed?

@davidbp
Copy link
Contributor Author

davidbp commented Jun 4, 2019

Well

Is this what is causing the problem?

julia> subtypes(info_[:target_scitype_union])
2-element Array{Any,1}:
 Multiclass{N}   
 OrderedFactor{N}

julia> subtypes(Finite)
2-element Array{Any,1}:
 Multiclass   
 OrderedFactor

julia> typeof(Finite{3})
DataType

julia> typeof(Finite)
UnionAll

@ablaom
Copy link
Member

ablaom commented Jun 5, 2019

Thank you indeed for that confirmation! I observed the same things. I think I've got it now. In definition of metadata() I make a deepcopy of the metadata dictionary, which also copies types (dah).

julia> abstract type Finite end
julia> deepcopy(Finite)
Finite

And then obviously

julia> objectid(T) == objectid(Finite)
false

ablaom added a commit that referenced this issue Jun 5, 2019
@ablaom
Copy link
Member

ablaom commented Jun 5, 2019

Resolved.

julia> using MLJ
julia> @load DecisionTreeClassifier
import MLJModels ✔
import DecisionTree ✔
import MLJModels.DecisionTree_.DecisionTreeClassifier ✔

julia> task = load_iris()
julia> models(task)
Dict{String,Any} with 2 entries:
  "ScikitLearn" => Any["SVMNuClassifier", "SVMClassifier", "SVMLClassifier"]
  "LIBSVM"      => Any["LinearSVC", "NuSVC", "SVC"]

julia> task.is_probabilistic = true
julia> models(task)
Dict{String,Any} with 4 entries:
  "MLJ"          => Any["ConstantClassifier", "DecisionTreeClassifier"]
  "DecisionTree" => Any["DecisionTreeClassifier"]
  "NaiveBayes"   => Any["GaussianNBClassifier"]
  "XGBoost"      => Any["XGBoostClassifier"]

@davidbp Thanks again for spotting this and spending time investigating!

@ablaom ablaom closed this as completed Jun 5, 2019
@davidbp
Copy link
Contributor Author

davidbp commented Jun 5, 2019

Thanks for solving the issue.

I actually thought the deepcopy could be the source of the error but I tried the following and got the same objectid:

julia> abstract type Foo end
julia> Foo2 = deepcopy(Foo)
Foo
julia> objectid(Foo2)
0x9edca13fe23d228c
julia> objectid(Foo)
0x9edca13fe23d228c

Interestingly enough it seems that it is what you just posted.
My issue is... What is T in the snippet you posted above?
If I try this

julia> abstract type Finite end
julia> deepcopy(Finite)
Finite
julia> objectid(T) == objectid(Finite)
false

I get

julia> T
ERROR: UndefVarError: T not defined

I tried to update MLJ and test it again but ...

(v1.1) pkg> update MLJ
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
 Installed MLJ ─ v0.2.3
  Updating `~/.julia/environments/v1.1/Project.toml`
  [add582a8] ↑ MLJ v0.2.2 ⇒ v0.2.3
  Updating `~/.julia/environments/v1.1/Manifest.toml`
  [add582a8] ↑ MLJ v0.2.2 ⇒ v0.2.3

julia> using MLJ

julia> @load DecisionTreeClassifier
┌ Info: A model named "DecisionTreeClassifier" is already loaded.
└ Nothing new loaded. 

julia> task = load_iris()
SupervisedTask{} @ 4…01

julia> models(task)
Dict{String,Any} with 0 entries

@davidbp
Copy link
Contributor Author

davidbp commented Jun 5, 2019

Let me add that if I do ]add https://github.com/alan-turing-institute/MLJ.jl then I get the behaviour you just described

julia> models(task)
Dict{String,Any} with 2 entries:
  "ScikitLearn" => Any["SVMNuClassifier", "SVMClassifier", "SVMLClassifier"]
  "LIBSVM"      => Any["LinearSVC", "NuSVC", "SVC"]

I still face some issues with the Package Manager, I have no idea why. For me it is allways best to simply add git_url than add package because updates give me problems more than I would like.

@giordano
Copy link
Member

giordano commented Jun 5, 2019

When you add a git URL or dev a package, the local repo won't be removed by ]rm PackageName and will stay in the branch it was before

@ablaom
Copy link
Member

ablaom commented Jun 5, 2019

@davidbp

Yes, I didn't faithfully copy my snippet. The behaviour depends on your type having parameters. Here's a corrected version of the example:

julia> abstract type Finite{N} end
julia> T = deepcopy(Finite)
Finite
julia> objectid(T) == objectid(Finite)
false

@ablaom
Copy link
Member

ablaom commented Jun 5, 2019

Regarding this:

"I tried to update MLJ and test it again but ...

(v1.1) pkg> update MLJ
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
 Installed MLJ ─ v0.2.3
  Updating `~/.julia/environments/v1.1/Project.toml`
  [add582a8] ↑ MLJ v0.2.2 ⇒ v0.2.3
  Updating `~/.julia/environments/v1.1/Manifest.toml`
  [add582a8] ↑ MLJ v0.2.2 ⇒ v0.2.3

julia> using MLJ

julia> @load DecisionTreeClassifier
┌ Info: A model named "DecisionTreeClassifier" is already loaded.
└ Nothing new loaded. 

julia> task = load_iris()
SupervisedTask{} @ 4…01

julia> models(task)
Dict{String,Any} with 0 entries

"

Looks to me that you updated your env but didn't restart your REPL session. I say this because you get "A model named DecisionTreeClassifier is already loaded". Updating you env won't replace the code you have already loaded (repeating "using MLJ" has no effect). Sorry, you probably realise this, but this my guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants