Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lazy activation of models not working from within packages #22

Closed
rssdev10 opened this issue May 31, 2019 · 17 comments
Closed

lazy activation of models not working from within packages #22

rssdev10 opened this issue May 31, 2019 · 17 comments

Comments

@rssdev10
Copy link

I'm trying to make a module with MLJModel:

module Abc

import XGBoost: dump_model, save, Booster

using MLJ
using MLJBase
import MLJModels

using MLJModels.XGBoost_

function __init__()
    @info "Abc"
end

end

but having an error ERROR: LoadError: UndefVarError: XGBoost_ not defined.

Looks like there is an issue with lazy activation in

@require XGBoost = "009559a3-9522-5dbb-924b-0b6ed2b22bb9" include("XGBoost.jl")

One workaround I found is

module Abc

using XGBoost
#import XGBoost: dump_model, save, Booster

using MLJ
using MLJBase
import MLJModels

include(joinpath(MLJModels.srcdir, "XGBoost.jl"))
#using MLJModels.XGBoost_

function __init__()
    @info "Abc"
end

end

Also I added debug output into function __init__ of the module MLJModels and I see that this method is called twice. I have something like:

[ Info: Precompiling Abc [top-level]
[ Info: MLJModels!!!
[ Info: MLJModels!!!
[ Info: Abc

May be it is related to a chain of __init__ methods.

@ablaom
Copy link
Member

ablaom commented Jun 9, 2019

I'm afraid I cannot reproduce your problem:

julia> module Abc
       
       import XGBoost: dump_model, save, Booster
       
       using MLJ
       using MLJBase
       import MLJModels
       
       using MLJModels.XGBoost_
       
       function __init__()
           @info "Abc"
       end
       
       end
[ Info: Recompiling stale cache file /Users/anthony/.julia/compiled/v1.1/XGBoost/rSeEh.ji for XGBoost [009559a3-9522-5dbb-924b-0b6ed2b22bb9]
[ Info: Abc
Main.Abc

julia> using MLJ

julia> task = load_boston()
SupervisedTask @ 585

julia> model = Abc.XGBoostRegressor()
MLJModels.XGBoost_.XGBoostRegressor(num_round = 1,
                                    booster = "gbtree",
                                    disable_default_eval_metric = 0,
                                    eta = 0.3,
                                    gamma = 0.0,
                                    max_depth = 6,
                                    min_child_weight = 1.0,
                                    max_delta_step = 0.0,
                                    subsample = 1.0,
                                    colsample_bytree = 1.0,
                                    colsample_bylevel = 1.0,
                                    lambda = 1.0,
                                    alpha = 0.0,
                                    tree_method = "auto",
                                    sketch_eps = 0.03,
                                    scale_pos_weight = 1.0,
                                    updater = "grow_colmaker",
                                    refresh_leaf = 1,
                                    process_type = "default",
                                    grow_policy = "depthwise",
                                    max_leaves = 0,
                                    max_bin = 256,
                                    predictor = "cpu_predictor",
                                    sample_type = "uniform",
                                    normalize_type = "tree",
                                    rate_drop = 0.0,
                                    one_drop = 0,
                                    skip_drop = 0.0,
                                    feature_selector = "cyclic",
                                    top_k = 0,
                                    tweedie_variance_power = 1.5,
                                    objective = "reg:linear",
                                    base_score = 0.5,
                                    eval_metric = "rmse",
                                    seed = 0,) @ 189

julia> mach = machine(model, task)
Machine{XGBoostRegressor} @ 199

julia> julia> evaluate!(mach)
┌ Info: Evaluating using cross-validation. 
│ nfolds=6. 
│ shuffle=false 
│ measure=MLJ.rms 
│ operation=StatsBase.predict 
└ Resampling from all rows. 
Cross-validating: 100%[=========================] Time: 0:00:01
6-element Array{Float64,1}:
 15.071084701486205
 16.70750413097405 
 22.12771143813795 
 20.89991496287021 
 15.434870166858115
 11.602463981185641

Have you got MLJModels in your load path? You need MLJModels and MLJ in your project. Perhaps send me the result of ]status -m or your Manifest.toml.

@rssdev10
Copy link
Author

Please see attached package.
abc.tar.gz
Run ./build.jl from the file.

I'm afraid I cannot reproduce your problem:

It might be concurrency issue and be unstable. I can not say that I see it always. But in most cases it is present.

julia version 1.0.3. MacOS

(Abc) pkg> status -m
Project Abc v0.1.0
    Status `~/projects/tmp/julia/Abc/Manifest.toml`
  [7d9fca2a] Arpack v0.3.1
  [9e28174c] BinDeps v0.8.10
  [b99e7846] BinaryProvider v0.5.4
  [336ed68f] CSV v0.5.5
  [324d7699] CategoricalArrays v0.5.4
  [34da2185] Compat v2.1.0
  [a93c6f00] DataFrames v0.18.3
  [864edb3b] DataStructures v0.15.0
  [b4f34e82] Distances v0.8.0
  [31c24e10] Distributions v0.20.0
  [cd3eb016] HTTP v0.8.2
  [83e8ac13] IniFile v0.5.0
  [82899510] IteratorInterfaceExtensions v1.0.0
  [682c06a0] JSON v0.20.0
  [2d691ee1] LIBLINEAR v0.5.1
  [b1bec4e5] LIBSVM v0.3.1
  [add582a8] MLJ v0.2.3
  [a7f614a8] MLJBase v0.2.2
  [d491faf4] MLJModels v0.2.3
  [739be429] MbedTLS v0.6.8
  [e1d29d7a] Missings v0.4.1
  [bac558e1] OrderedCollections v1.1.0
  [90014a1f] PDMats v0.9.7
  [69de0a69] Parsers v0.3.5
  [2dfb63ee] PooledArrays v0.5.2
  [92933f4c] ProgressMeter v1.0.0
  [1fd47b50] QuadGK v2.0.4
  [3cdcf5f2] RecipesBase v0.6.0
  [189a3867] Reexport v0.2.0
  [cbe49d4c] RemoteFiles v0.2.1
  [ae029012] Requires v0.5.2
  [79098fc4] Rmath v0.5.0
  [6e75b9c4] ScikitLearnBase v0.4.1
  [a2af1166] SortingAlgorithms v0.3.1
  [276daf66] SpecialFunctions v0.7.2
  [2913bbd2] StatsBase v0.30.0
  [4c63d2b9] StatsFuns v0.8.0
  [3783bdb8] TableTraits v1.0.0
  [bd369af6] Tables v0.2.5
  [30578b45] URIParser v0.4.0
  [ea10d353] WeakRefStrings v0.6.1
  [009559a3] XGBoost v0.3.1
  [2a0f44e3] Base64 
  [ade2ca70] Dates 
  [8bb1440f] DelimitedFiles 
  [8ba89e20] Distributed 
  [9fa8497b] Future 
  [b77e0a4c] InteractiveUtils 
  [76f85450] LibGit2 
  [8f399da3] Libdl 
  [37e2e46d] LinearAlgebra 
  [56ddb016] Logging 
  [d6f4376e] Markdown 
  [a63ad114] Mmap 
  [44cfe95a] Pkg 
  [de0858da] Printf 
  [9abbd945] Profile 
  [3fa0cd96] REPL 
  [9a3f8284] Random 
  [ea8e919c] SHA 
  [9e88b42a] Serialization 
  [1a1011a3] SharedArrays 
  [6462fe0b] Sockets 
  [2f01184e] SparseArrays 
  [10745b16] Statistics 
  [4607b0f0] SuiteSparse 
  [8dfed614] Test 
  [cf7118a7] UUIDs 
  [4ec0a83e] Unicode 

@ablaom
Copy link
Member

ablaom commented Jul 17, 2019

Strange. I still can't reproduce your problem after activating the environment you sent:

(working) pkg> activate .

(Abc) pkg> instantiate
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`

julia> module Abc

       import XGBoost: dump_model, save, Booster

       using MLJ
       using MLJBase
       import MLJModels

       using MLJModels.XGBoost_

       function __init__()
           @info "Abc"
       end

       end

Main.Abc

julia> using MLJ

julia> task = load_boston()
model = SupervisedTask{} @ 138

julia> model = Abc.XGBoostRegressor()
MLJModels.XGBoost_.XGBoostRegressor(num_round = 1,
                                    booster = "gbtree",
                                    disable_default_eval_metric = 0,
                                    eta = 0.3,
                                    gamma = 0.0,
                                    max_depth = 6,
                                    min_child_weight = 1.0,
                                    max_delta_step = 0.0,
                                    subsample = 1.0,
                                    colsample_bytree = 1.0,
                                    colsample_bylevel = 1.0,
                                    lambda = 1.0,
                                    alpha = 0.0,
                                    tree_method = "auto",
                                    sketch_eps = 0.03,
                                    scale_pos_weight = 1.0,
                                    updater = "grow_colmaker",
                                    refresh_leaf = 1,
                                    process_type = "default",
                                    grow_policy = "depthwise",
                                    max_leaves = 0,
                                    max_bin = 256,
                                    predictor = "cpu_predictor",
                                    sample_type = "uniform",
                                    normalize_type = "tree",
                                    rate_drop = 0.0,
                                    one_drop = 0,
                                    skip_drop = 0.0,
                                    feature_selector = "cyclic",
                                    top_k = 0,
                                    tweedie_variance_power = 1.5,
                                    objective = "reg:linear",
                                    base_score = 0.5,
                                    eval_metric = "rmse",
                                    seed = 0,) @ 598

julia> mach = machine(model, task)
Machine{XGBoostRegressor} @ 164


julia> evaluate!(mach)
┌ Info: Evaluating using cross-validation. 
│ nfolds=6. 
│ shuffle=false 
│ measure=MLJ.rms 
│ operation=StatsBase.predict 
└ Resampling from all rows. 
Cross-validating: 100%[=========================] Time: 0:00:02
6-element Array{Float64,1}:
 15.071084701486205
 16.70750413097405 
 22.12771143813795 
 20.89991496287021 
 15.434870166858115
 11.602463981185641

julia> versioninfo()
Julia Version 1.0.3
Commit 099e826241 (2018-12-18 01:34 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)
Environment:
  JULIA_PATH = /Applications/Julia-1.1.app/Contents/Resources/julia/bin/julia

Run on MacOS.

@rssdev10
Copy link
Author

rssdev10 commented Jul 17, 2019

Can you try to run it without REPL from command line with ./build.jl only? Again, I think something like concurrency issue is here.

Also I have a little bit older laptop:

julia> versioninfo()
Julia Version 1.0.3
Commit 099e826241 (2018-12-18 01:34 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, broadwell)

@ablaom ablaom changed the title wrong lazy activation of models lazy activation of models not working from within packages Jul 18, 2019
@ablaom
Copy link
Member

ablaom commented Jul 18, 2019

Yes, now I can reproduce your issue. Many thanks for this. I would say we have uncovered a limitation of Requires.jl. Do you not agree?

A secondary question is whether the @load macro will work when called within a package, for models in packages with native MLJ interface implementations (ie, outside of MLJModels). In this case there would be no lazy loading. Unfortunately, no such package actually exists but we will have some soon (or could construct a Dummy package).

edit July 23, 2020: Can confirm that if interface is provided by a package without use of requires, then issue is not there.

@rssdev10
Copy link
Author

Yes, I it might be restriction of Requires.jl. See also double call of __init__ as I mentioned in first message. But again, I almost sure that it is concurrency issue. I found that issue when had prepared the code for running as a web service.

So, some workaround we have. Regarding how to fix, as the issue confirmed, may be just put same issue with my sample to Requires.jl's list of issues if nobody can dive into it now.

Regarding loading of models, for now I'm using Booster(model_file = model_fn) exactly for XGBoost.

@ablaom
Copy link
Member

ablaom commented Sep 17, 2019

Although I am doubtful, thought it worth mentioning that there was a refactor of @load that possibly resolve this issue. MLJModels 0.4.0 (which now owns the method) incorporates the changes.

@ablaom
Copy link
Member

ablaom commented Oct 17, 2019

Update: This issue is unresolved under MLJModels 0.5.0.

@tlienart
Copy link
Collaborator

@ablaom is this still a (relevant) issue?

@ablaom
Copy link
Member

ablaom commented Nov 27, 2019

I believe it is still an issue. It seems one can't use MLJ to load models from within a package module. Some clues are provided above and in #321. I suspect (but have not confirmed) that this is a Requires issue. To reproduce be sure to follow the instructions of @rssdev10 exactly.

@cscherrer
Copy link

@ablaom @tlienart We're running into this issue as well

@ablaom
Copy link
Member

ablaom commented May 29, 2020

Noted. The long term plan is to "disintegrate" MLJModels into individual packages, eliminating all use of Requires.jl. Then loading a model with glue code currently provided by MLJModels, should be no different from loading models from packages that natively support the MLJ model interface (eg, EvoTrees.jl, MLJLinearModels.jl). In these cases, I am not aware of any issue, but let me know if you discover one.

@ablaom
Copy link
Member

ablaom commented Jul 23, 2020

Partial workaround is here: JuliaAI/MLJ.jl#613 (comment)

@OkonSamuel
Copy link
Member

I think we better start the disintegration of MLJModels fast

@ablaom
Copy link
Member

ablaom commented Jul 26, 2020

PR's welcome 😄 Happy to provide guidance. The repos are called `MLJGLMInterface.jl, and so forth). If you want to start on one, let me know which and I'll get you commit access.

Here's the issue: #244 (comment)

@OkonSamuel
Copy link
Member

Great. I will work on them in my spare time.

@ablaom ablaom mentioned this issue Nov 9, 2020
1 task
@ablaom
Copy link
Member

ablaom commented Jul 14, 2021

Pretty sure this has been resolved by above PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants