Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update R handles in-place #6903

Merged
merged 5 commits into from
Apr 29, 2021
Merged

Conversation

david-cortes
Copy link
Contributor

Addressing issue #6896

When de-serializing an object in R from raw bytes, the R handles become null as there is no C++ object to which to point to in-memory. In these situations, the C++ objects are recreated from raw bytes as needed, but the R handle objects are not modified in-place, which can lead to de-serializing multiple times.

This PR changes it so that R external pointer objects would be modified in-place in order to avoid redundant operations and memory usage. After the PR:

library(xgboost)
data(mtcars)
xgb_data <- xgb.DMatrix(as.matrix(mtcars[,-1]), label=mtcars[,1,drop=TRUE])
model <- xgb.train(list(objective="reg:squarederror"), xgb_data, nrounds=10)
saveRDS(model, file.path(tempdir(), "model.rds"))
model_new <- readRDS(file.path(tempdir(), "model.rds"))
print(model_new$handle)
<pointer: (nil)>
attr(,"class")
[1] "xgb.Booster.handle"
pred <- predict(model_new, xgb_data)
print(model_new$handle)
<pointer: 0x55609a7541f0>
attr(,"class")
[1] "xgb.Booster.handle"

@david-cortes
Copy link
Contributor Author

All R tests now passing. The C++ failing test is probably not related to this PR.

@trivialfis
Copy link
Member

Thanks, I will look into it. Please ignore the osx test. The Apple clang is not happy with the new libomp.

@trivialfis trivialfis self-assigned this Apr 28, 2021
@hcho3
Copy link
Collaborator

hcho3 commented Apr 28, 2021

@david-cortes Aside: we've issued a guidance saying that users should avoid using saveRDS() / readRDS() for long-term archival of models. See https://www.rdocumentation.org/packages/xgboost/versions/1.4.1.1/topics/a-compatibility-note-for-saveRDS-save. saveRDS() should be used only to save models for the short term. Are you aware of this guidance?

@david-cortes
Copy link
Contributor Author

I'm aware that saveRDS is not recommended, but that doesn't mean it's not useful. For example, when one restarts an R session keeping the variables/objects, these will be auto-saved through save/load. Or perhaps an xgboost model might be stored inside another object (e.g. from some ML wrapper framework), for which only save and saveRDS are usable.

@hcho3
Copy link
Collaborator

hcho3 commented Apr 28, 2021

@david-cortes Got it. Just keep in mind that you will need to migrate existing RDS files using xgb.save / xgb.load when upgrading XGBoost to a later version.

Copy link
Collaborator

@hcho3 hcho3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I got to learn about how objects work in R.

@hcho3 hcho3 merged commit 4e1a8b1 into dmlc:master Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants