You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I'm just getting into MLJ and Julia, so please forgive me if this is obvious, but I can't find any detail on this anywhere.
If I attempt to include XGBoost in a Pipeline, save does not seem to properly serialize the pipeline. Please see the code below for examples to reproduce.
To Reproduce NON PIPELINE EXAMPLE
Using the following code generates two files at ./model1, specifically xgb.jlso and xgb.xgboost.model
import RDatasets
using MLJ
boston = RDatasets.dataset("MASS", "Boston"); # a DataFrame
y, X =unpack(boston, ==(:MedV), colname ->true);
XGBoostRegressor =@load XGBoostRegressor
xgb =XGBoostRegressor()
xgb_machine =machine(xgb, X, y)
train, test =partition(eachindex(y), 0.7, shuffle=true); # 70:30 splitfit!(xgb_machine, rows=train)
MLJ.save("model1/xgb.jlso", xgb_machine)
ml2 =machine("model1/xgb.jlso")
predict(ml2, X)
PIPELINE Example_
Using the following code generates one file at ./model2, specifically xgb.jlso
import RDatasets
using MLJ
boston = RDatasets.dataset("MASS", "Boston"); # a DataFrame
y, X =unpack(boston, ==(:MedV), colname ->true);
XGBoostRegressor =@load XGBoostRegressor
xgb =XGBoostRegressor()
pipe =@pipeline(xgb)
xgb_machine =machine(pipe, X, y)
train, test =partition(eachindex(y), 0.7, shuffle=true); # 70:30 splitfit!(xgb_machine, rows=train)
MLJ.save("model2/xgb_pipeline.jlso", xgb_machine)
ml2 =machine("model2/xgb_pipeline.jlso")
predict(ml2, X)
Expected behavior
I expect both of the above code snippets to work identically. The first non pipeline example returns predictions, the second example returns
```┌ Error: Failed to apply the operation predict to the machine Machine{XGBoostRegressor,…} @707, which receives it's data arguments from one or more nodes in a learning network. Possibly, one of these nodes is delivering data that is incompatible with the machine's model.
│ Model (XGBoostRegressor @439):
│ input_scitype = Table{var"#s45"} where var"#s45"<:(AbstractArray{var"#s13",1} where var"#s13"<:Continuous)
│ target_scitype =AbstractArray{Continuous,1}
│ output_scitype =Unknown
│
│ Incoming data:
│ arg of predict scitype
│ -------------------------------------------
│ Source @676 Table{Union{AbstractArray{Continuous,1}, AbstractArray{Count,1}}}
│
│ Learning network sources:
│ source scitype
│ -------------------------------------------
│ Source @676 Nothing
│ Source @800 Nothing
└ @ MLJBase /Users/timothy.whittaker/.julia/packages/MLJBase/00RAT/src/composition/learning_networks/nodes.jl:126
Call to XGBoost C function XGBoosterPredict failed: [18:06:35] /workspace/srcdir/xgboost/src/c_api/c_api.cc:498: DMatrix/Booster has not been intialized or has already been disposed.
Stacktrace:
[1] _apply(::Tuple{Node{Machine{MLJXGBoostInterface.XGBoostRegressor,true}},Machine{MLJXGBoostInterface.XGBoostRegressor,true}}, ::DataFrames.DataFrame; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /Users/timothy.whittaker/.julia/packages/MLJBase/00RAT/src/composition/learning_networks/nodes.jl:132
[2] _apply at /Users/timothy.whittaker/.julia/packages/MLJBase/00RAT/src/composition/learning_networks/nodes.jl:118 [inlined]
[3] Node at /Users/timothy.whittaker/.julia/packages/MLJBase/00RAT/src/composition/learning_networks/nodes.jl:113 [inlined]
[4] predict(::Pipeline254, ::NamedTuple{(:predict,),Tuple{Node{Machine{MLJXGBoostInterface.XGBoostRegressor,true}}}}, ::DataFrames.DataFrame) at /Users/timothy.whittaker/.julia/packages/MLJBase/00RAT/src/operations.jl:114
[5] predict(::Machine{Pipeline254,true}, ::DataFrames.DataFrame) at /Users/timothy.whittaker/.julia/packages/MLJBase/00RAT/src/operations.jl:83
[6] top-level scope at In[6]:2
[7] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091
**Additional context**
<!--
Add any other context about the problem here.
-->
**Versions**
<details>
Julia Version 1.5.4
Commit 69fcb5745b (2021-03-11 19:13 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
JULIA_HOME = /Applications/Julia-1.5.app/Contents/Resources/julia
<!--
Please run the following snippet and paste the output here.
using MLJ
nothing prints when running using MLJ, but here is the details of my project.
BSON = "fbb218c0-5317-5bc6-957e-2ee96dd4b1f0"
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
IJulia = "7073ff75-c697-5162-941a-fcdaad2a7d2a"
MLJ = "add582a8-e3ab-11e8-2d5e-e98b27df1bc7"
MLJDecisionTreeInterface = "c6f25543-311c-4c74-83dc-3ea6d1015661"
MLJModels = "d491faf4-2d78-11e9-2867-c94bc002c0b7"
MLJXGBoostInterface = "54119dfa-1dab-4055-a167-80440f4f7a91"
NearestNeighborModels = "636a865e-7cf4-491e-846c-de09b730eb36"
Pandas = "eadc2687-ae89-51f9-a5d9-86b5a6373a9c"
PyCall = "438e738f-606a-5dbb-bf0a-cddfbfd45ab0"
RDatasets = "ce6b1742-4840-55fa-b093-852dadbb1d8b"
Revise = "295af30f-e4ad-537b-8983-00126c2a3abe"
XGBoost = "009559a3-9522-5dbb-924b-0b6ed2b22bb9"
...
-->
</details>
<!-- Thanks for contributing! -->
The text was updated successfully, but these errors were encountered:
Thanks for reporting. This is a known issue. For models that are not pure julia (eg, wrap c-code, as in XGBoost) one needs a custom serialization method implemented for the model. While this is done for XGBoost - so that stand-alone XGBoost models (machines) can be serialised, this currently may not extend to the case where the model is inside any kind of composite model.
There is an open issue to fix this in special cases, for example, for the the TunedModel wrapper (#678), but the general problem will take some time be addressed , as it is a little complicated.
One workaround would be for you to switch to a pure julia tree boosting algorithm. I recommend the EvoTrees.jl models, which in MLJ are called EvoTreeRegressor, EvoTreeClassifier, EvoTreeCount, EvoTreeGaussian. The package is being actively maintained and the authors seem quite open to feature requests if there is something missing you would like.
Describe the bug
I'm just getting into MLJ and Julia, so please forgive me if this is obvious, but I can't find any detail on this anywhere.
If I attempt to include XGBoost in a Pipeline, save does not seem to properly serialize the pipeline. Please see the code below for examples to reproduce.
To Reproduce
NON PIPELINE EXAMPLE
Using the following code generates two files at
./model1
, specificallyxgb.jlso
andxgb.xgboost.model
PIPELINE Example_
Using the following code generates one file at
./model2
, specificallyxgb.jlso
Expected behavior
I expect both of the above code snippets to work identically. The first non pipeline example returns predictions, the second example returns
```┌ Error: Failed to apply the operation
predict
to the machine Machine{XGBoostRegressor,…} @707, which receives it's data arguments from one or more nodes in a learning network. Possibly, one of these nodes is delivering data that is incompatible with the machine's model.│ Model (XGBoostRegressor @439):
│ input_scitype = Table{var"#s45"} where var"#s45"<:(AbstractArray{var"#s13",1} where var"#s13"<:Continuous)
│ target_scitype =AbstractArray{Continuous,1}
│ output_scitype =Unknown
│
│ Incoming data:
│ arg of predict scitype
│ -------------------------------------------
│ Source @676 Table{Union{AbstractArray{Continuous,1}, AbstractArray{Count,1}}}
│
│ Learning network sources:
│ source scitype
│ -------------------------------------------
│ Source @676 Nothing
│ Source @800 Nothing
└ @ MLJBase /Users/timothy.whittaker/.julia/packages/MLJBase/00RAT/src/composition/learning_networks/nodes.jl:126
Call to XGBoost C function XGBoosterPredict failed: [18:06:35] /workspace/srcdir/xgboost/src/c_api/c_api.cc:498: DMatrix/Booster has not been intialized or has already been disposed.
Stacktrace:
[1] _apply(::Tuple{Node{Machine{MLJXGBoostInterface.XGBoostRegressor,true}},Machine{MLJXGBoostInterface.XGBoostRegressor,true}}, ::DataFrames.DataFrame; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /Users/timothy.whittaker/.julia/packages/MLJBase/00RAT/src/composition/learning_networks/nodes.jl:132
[2] _apply at /Users/timothy.whittaker/.julia/packages/MLJBase/00RAT/src/composition/learning_networks/nodes.jl:118 [inlined]
[3] Node at /Users/timothy.whittaker/.julia/packages/MLJBase/00RAT/src/composition/learning_networks/nodes.jl:113 [inlined]
[4] predict(::Pipeline254, ::NamedTuple{(:predict,),Tuple{Node{Machine{MLJXGBoostInterface.XGBoostRegressor,true}}}}, ::DataFrames.DataFrame) at /Users/timothy.whittaker/.julia/packages/MLJBase/00RAT/src/operations.jl:114
[5] predict(::Machine{Pipeline254,true}, ::DataFrames.DataFrame) at /Users/timothy.whittaker/.julia/packages/MLJBase/00RAT/src/operations.jl:83
[6] top-level scope at In[6]:2
[7] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1091
The text was updated successfully, but these errors were encountered: