Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGBoostClassifier can't be serialised. Add custom serialisation? #512

Closed
aviatesk opened this issue May 1, 2020 · 7 comments
Closed

XGBoostClassifier can't be serialised. Add custom serialisation? #512

aviatesk opened this issue May 1, 2020 · 7 comments
Assignees

Comments

@aviatesk
Copy link
Contributor

aviatesk commented May 1, 2020

Describe the bug

deserialized XGBoostClassifier can't predict.

This only happens for MLJ machine interface: XGBoost.save(xgboost::Booser, fname) -> predict(Booster(fname), X) works as expected.

/cc @aviks I would like to ping in a case you're interested in this

To Reproduce

using MLJ

n = 100000
X = DataFrame(A = rand(n), B = categorical(rand(1:10, n)))
y = categorical(rand(0:2, n))

@load XGBoostClassifier
xgb_machine = machine(XGBoostClassifier(), X, y)
MLJ.fit!(xgb_machine; verbosity = 0)
MLJ.predict(xgb_machine, X) # <- works
MLJ.save("xgboost.jlso", xgb_machine)

###

xgb_machine′ = machine("xgboost.jlso")
MLJ.predict(xgb_machine′, X) # <- errors

the final line results in the following error:

RROR: Call to XGBoost C function XGBoosterPredict failed: [16:29:15] src/c_api/c_api.cc:963: DMatrix/Booster has not been intialized or has already been disposed.

Stack trace returned 0 entries:


Stacktrace:
 [1] error(::String, ::String, ::String, ::String) at ./error.jl:42
 [2] XGBoosterPredict(::Ptr{Nothing}, ::Ptr{Nothing}, ::Int32, ::UInt32, ::Array{UInt64,1}) at /Users/aviatesk/.julia/packages/XGBoost/q0cZR/src/xgboost_wrapper_h.jl:15
 [3] ...

Versions

versioninfo():

Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i5-8279U CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Environment:
  JULIA_PROJECT = @.
  JULIA_EDITOR = atom  -a
  JULIA_NUM_THREADS = 4
  JULIA_PKG_DEVDIR = /Users/aviatesk/julia/packages

pkg] st

(data) pkg> st
Status `~/pigicon-root/pigicon/piginator.jl/data/Project.toml`
  [336ed68f] CSV v0.6.1
  [a93c6f00] DataFrames v0.20.2
  [1313f7d8] DataFramesMeta v0.5.0
  [682c06a0] JSON v0.21.0
  [7acf609c] LightGBM v0.2.1
  [add582a8] MLJ v0.11.2
  [a7f614a8] MLJBase v0.13.2
  [6ee0df7b] MLJLinearModels v0.3.2
  [e80e1ace] MLJModelInterface v0.2.2
  [d491faf4] MLJModels v0.9.9
  [1914dd2f] MacroTools v0.5.5
  [91a5bcdd] Plots v0.29.9
  [2913bbd2] StatsBase v0.33.0
  [f3b207a7] StatsPlots v0.14.5
  [009559a3] XGBoost v0.4.3
@OkonSamuel
Copy link
Member

cc @aviatesk . The culprit here is the mach.fitresult[1] returned after deserialization. Particularly the booster.handle Pointer is invalid. It is worth noting that XGBoost.save calls the libxgboost C library while MLJ makes use of JLSO library (it maybe that JLSO doesn't know how to deal with pointers??). @ablaom , @tlienart Maybe we should for this specfic case save using XGBoost library save??.

@ablaom
Copy link
Member

ablaom commented May 2, 2020

@OkonSamuel Thanks for investigating!

Maybe we should for this specfic case save using XGBoost library save??.

This is possible. One overloads the fallbacks recalled below, somewhere in the code where XGBoost has its implementation of the MLJ model API:

MLJModelInterface.save(file, model, fitresult, report; kwargs...) =
    JLSO.save(file,
              :model => model,
              :fitresult => fitresult,
              :report => report; kwargs...)
function MLJModelInterface.restore(file; kwargs...)
    dict = JLSO.load(file)
    return dict[:model], dict[:fitresult], dict[:report]
end

The tricky part is that, for the classifier, the MLJ fitresult has an additional part - a (categorical) element of the target - which is not part of the XGBoost learned parameters. This is essential, to keep track of the categorical encoding and complete pool of classes. So that needs to be saved as well. This issue is likely to recur in other classifiers wanting to use a custom save, so probably worth finding an elegant solution.

@ablaom ablaom changed the title deserialized XGBoostClassifier can't predict XGBoostClassifier can't be serialised. Add custom serialisation? May 8, 2020
@yalwan-iqvia
Copy link

I'm going to try this for LightGBM models -- I anticipated this issue and put some code in to deal with it but I never tested it in context of MLJ serialisation. If it works I'll come back and point at the code I have implemented within LightGBM.jl which the XGB maintainers could probably also make use of.

@ablaom
Copy link
Member

ablaom commented Sep 29, 2020

@yalwan-iqvia

The tricky part is that, for the classifier, the MLJ fitresult has an additional part - a (categorical) element of the target - which is not part of the XGBoost learned parameters. This is essential, to keep track of the categorical encoding and complete pool of classes. So that needs to be saved as well.

Do you have any suggestions about how we combine the two bits of information - the XGBoost learned parameters, and the complete pool of classes (eg, the CategoricalValue stored in the fitresult) - into the one file? Or perhaps we have two files and the user just specifies the xgboost one, with the other created/deserialized "silently" ?

@ablaom ablaom self-assigned this Sep 29, 2020
@ablaom
Copy link
Member

ablaom commented Sep 30, 2020

Okay, I have discovered some flaws in the API on the deserializing restoring side. Working on a PR for that.

@yalwan-iqvia
Copy link

yalwan-iqvia commented Sep 30, 2020

So for the LGBM I was only referring to the bit which causes the error as reported by user about a disposed model -- I made internal structs carry around serialised string version of model so even after its deepcopied elsewhere and the ptr is null then it can try to reload the string serialised model into memory before continuing upon attempting operations such as predict, etc. I didn't get around to trying this yet but I can point you at the code

Here's our test that this runs correctly after deepcopying:
https://github.com/IQVIA-ML/LightGBM.jl/blob/6384a4b63e5145de92968171337375459aeab3f0/test/ffi/base_utils.jl

Here's the bit where we make sure we still have a valid pointer after a deepcopy:
https://github.com/IQVIA-ML/LightGBM.jl/blob/master/src/wrapper.jl#L51

And finally, for cases which aren't covered by deepcopies somehow, we do this for safety
https://github.com/IQVIA-ML/LightGBM.jl/blob/master/src/fit.jl#L123

and in other places where we think we might continue after a SERde we use this:
https://github.com/IQVIA-ML/LightGBM.jl/blob/master/src/utils.jl#L79

@ablaom
Copy link
Member

ablaom commented Oct 19, 2020

After JuliaRegistries/General#23206 is merged, update MLJModels and serialisation should now work.

@ablaom ablaom closed this as completed Oct 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants