Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper way to serialize XGBoost model without exporting to a file #7351

Closed
DavorJ opened this issue Oct 20, 2021 · 3 comments · Fixed by #7686
Closed

Proper way to serialize XGBoost model without exporting to a file #7351

DavorJ opened this issue Oct 20, 2021 · 3 comments · Fixed by #7686

Comments

@DavorJ
Copy link

DavorJ commented Oct 20, 2021

I am using xgboost_1.4.1.1 in R.

Assume I have this model:

data(agaricus.train, package='xgboost')
data(agaricus.test, package='xgboost')
train <- agaricus.train
test <- agaricus.test
bst <- xgboost::xgboost(data = train$data, label = train$label, max_depth = 2,
                        eta = 1, nthread = 2, nrounds = 2,objective = "binary:logistic")

How do I serialize this model correctly without writing to disk?

I know the documentation is explicit to not use saveRDS() or serialize() on the bst object, and rather use xgboost::xgb.save(). But xgboost::xgb.save() can only write to a file.

If I do

raw <- xgboost::xgb.save.raw(bst)
xgboost::xgb.load(raw)

I get a strange error:

Warning in value[3L] :
The model had been generated by XGBoost version 1.0.0 or earlier and was loaded from a RDS file. We strongly ADVISE AGAINST using saveRDS() function, to ensure that your model can be read in current and upcoming XGBoost releases. Please use xgb.save() instead to preserve models for the long term. For more details and explanation, see https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html

caused by handle <- xgb.Booster.handle(modelfile = modelfile) within xgboost::xgb.load().

Looking into the xgboost R source, the following way seems to work well for loading the raw model:

xgboost::xgb.Booster.complete(xgboost:::xgb.handleToBooster(handle = xgboost::xgb.load.raw(raw)))

Note the unfortunate use of :::.

Can someone advise whether this is the correct way, and if so, whether xgboost::xgb.load() should be fixed? I think the if-structure should be changed to:

  if (typeof(modelfile) == "raw") {
    bst <- xgb.handleToBooster(xgb.load.raw(modelfile))
  } 
  else ...

and handle <- xgb.Booster.handle(modelfile = modelfile) should be moved to the else block?

@DavorJ DavorJ changed the title Proper way to store XGBoost model as raw Proper way to serialize XGBoost model without exporting to a file Oct 20, 2021
@hcho3
Copy link
Collaborator

hcho3 commented Oct 20, 2021

According to https://www.rdocumentation.org/packages/xgboost/versions/1.4.1.1/topics/a-compatibility-note-for-saveRDS-save, you should use xgb.load.raw to load model from a byte array.

@DavorJ
Copy link
Author

DavorJ commented Oct 21, 2021

Aren't we a bit too quick here with closing this?

xgboost::xgb.load.raw() returns a xgb.Booster.handle (which has limited use), not xgb.Booster. The solution I propose above is to convert the handle to xgb.Booster without an error. But to do so, I have to use an internal undocumented procedure (cf. :::). Hence the proposal to adjust the xgboost::xgb.load() which by definition returns xgb.Booster.

@trivialfis
Copy link
Member

Ran into this recently for #7571 . I worked around it in the tests, but we should properly support it in the load function as suggested by @DavorJ . Will work on it, but might bring a breaking change due to different return types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants