Skip to content

Commit

Permalink
[R] Provide better guidance for persisting XGBoost model (#5964)
Browse files Browse the repository at this point in the history
* [R] Provide better guidance for persisting XGBoost model

* Update saving_model.rst

* Add a paragraph about xgb.serialize()
  • Loading branch information
hcho3 committed Aug 1, 2020
1 parent bf2990e commit 5a2dcd1
Show file tree
Hide file tree
Showing 17 changed files with 232 additions and 81 deletions.
2 changes: 1 addition & 1 deletion R-package/DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -64,5 +64,5 @@ Imports:
data.table (>= 1.9.6),
magrittr (>= 1.5),
stringi (>= 0.5.2)
RoxygenNote: 7.1.0
RoxygenNote: 7.1.1
SystemRequirements: GNU make, C++14
64 changes: 55 additions & 9 deletions R-package/R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -308,18 +308,64 @@ xgb.createFolds <- function(y, k = 10)
#' @name xgboost-deprecated
NULL

#' Do not use saveRDS() for long-term archival of models. Use xgb.save() instead.
#' Do not use \code{\link[base]{saveRDS}} or \code{\link[base]{save}} for long-term archival of
#' models. Instead, use \code{\link{xgb.save}} or \code{\link{xgb.save.raw}}.
#'
#' It is a common practice to use the built-in \code{saveRDS()} function to persist R objects to
#' the disk. While \code{xgb.Booster} objects can be persisted with \code{saveRDS()} as well, it
#' is not advisable to use it if the model is to be accessed in the future. If you train a model
#' with the current version of XGBoost and persist it with \code{saveRDS()}, the model is not
#' guaranteed to be accessible in later releases of XGBoost. To ensure that your model can be
#' accessed in future releases of XGBoost, use \code{xgb.save()} instead. For more details and
#' explanation, consult the page
#' It is a common practice to use the built-in \code{\link[base]{saveRDS}} function (or
#' \code{\link[base]{save}}) to persist R objects to the disk. While it is possible to persist
#' \code{xgb.Booster} objects using \code{\link[base]{saveRDS}}, it is not advisable to do so if
#' the model is to be accessed in the future. If you train a model with the current version of
#' XGBoost and persist it with \code{\link[base]{saveRDS}}, the model is not guaranteed to be
#' accessible in later releases of XGBoost. To ensure that your model can be accessed in future
#' releases of XGBoost, use \code{\link{xgb.save}} or \code{\link{xgb.save.raw}} instead.
#'
#' @details
#' Use \code{\link{xgb.save}} to save the XGBoost model as a stand-alone file. You may opt into
#' the JSON format by specifying the JSON extension. To read the model back, use
#' \code{\link{xgb.load}}.
#'
#' Use \code{\link{xgb.save.raw}} to save the XGBoost model as a sequence (vector) of raw bytes
#' in a future-proof manner. Future releases of XGBoost will be able to read the raw bytes and
#' re-construct the corresponding model. To read the model back, use \code{\link{xgb.load.raw}}.
#' The \code{\link{xgb.save.raw}} function is useful if you'd like to persist the XGBoost model
#' as part of another R object.
#'
#' Note: Do not use \code{\link{xgb.serialize}} to store models long-term. It persists not only the
#' model but also internal configurations and parameters, and its format is not stable across
#' multiple XGBoost versions. Use \code{\link{xgb.serialize}} only for checkpointing.
#'
#' For more details and explanation about model persistence and archival, consult the page
#' \url{https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html}.
#'
#' @name a-compatibility-note-for-saveRDS
#' @examples
#' data(agaricus.train, package='xgboost')
#' bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 2,
#' eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
#'
#' # Save as a stand-alone file; load it with xgb.load()
#' xgb.save(bst, 'xgb.model')
#' bst2 <- xgb.load('xgb.model')
#'
#' # Save as a stand-alone file (JSON); load it with xgb.load()
#' xgb.save(bst, 'xgb.model.json')
#' bst2 <- xgb.load('xgb.model.json')
#'
#' # Save as a raw byte vector; load it with xgb.load.raw()
#' xgb_bytes <- xgb.save.raw(bst)
#' bst2 <- xgb.load.raw(xgb_bytes)
#'
#' # Persist XGBoost model as part of another R object
#' obj <- list(xgb_model_bytes = xgb.save.raw(bst), description = "My first XGBoost model")
#' # Persist the R object. Here, saveRDS() is okay, since it doesn't persist
#' # xgb.Booster directly. What's being persisted is the future-proof byte representation
#' # as given by xgb.save.raw().
#' saveRDS(obj, 'my_object.rds')
#' # Read back the R object
#' obj2 <- readRDS('my_object.rds')
#' # Re-construct xgb.Booster object from the bytes
#' bst2 <- xgb.load.raw(obj2$xgb_model_bytes)
#'
#' @name a-compatibility-note-for-saveRDS-save
NULL

# Lookup table for the deprecated parameters bookkeeping
Expand Down
2 changes: 2 additions & 0 deletions R-package/R/xgb.Booster.R
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,8 @@ xgb.get.handle <- function(object) {
#' eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
#' saveRDS(bst, "xgb.model.rds")
#'
#' # Warning: The resulting RDS file is only compatible with the current XGBoost version.
#' # Refer to the section titled "a-compatibility-note-for-saveRDS-save".
#' bst1 <- readRDS("xgb.model.rds")
#' if (file.exists("xgb.model.rds")) file.remove("xgb.model.rds")
#' # the handle is invalid:
Expand Down
6 changes: 5 additions & 1 deletion R-package/R/xgb.save.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,11 @@
#'
#' Note: a model can also be saved as an R-object (e.g., by using \code{\link[base]{readRDS}}
#' or \code{\link[base]{save}}). However, it would then only be compatible with R, and
#' corresponding R-methods would need to be used to load it.
#' corresponding R-methods would need to be used to load it. Moreover, persisting the model with
#' \code{\link[base]{readRDS}} or \code{\link[base]{save}}) will cause compatibility problems in
#' future versions of XGBoost. Consult \code{\link{a-compatibility-note-for-saveRDS-save}} to learn
#' how to persist models in a future-proof way, i.e. to make the model accessible in future
#' releases of XGBoost.
#'
#' @seealso
#' \code{\link{xgb.load}}, \code{\link{xgb.Booster.complete}}.
Expand Down
62 changes: 62 additions & 0 deletions R-package/man/a-compatibility-note-for-saveRDS-save.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

15 changes: 0 additions & 15 deletions R-package/man/a-compatibility-note-for-saveRDS.Rd

This file was deleted.

2 changes: 2 additions & 0 deletions R-package/man/xgb.Booster.complete.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 5 additions & 5 deletions R-package/man/xgb.create.features.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion R-package/man/xgb.cv.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions R-package/man/xgb.dump.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

16 changes: 8 additions & 8 deletions R-package/man/xgb.importance.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 6 additions & 6 deletions R-package/man/xgb.model.dt.tree.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 3 additions & 3 deletions R-package/man/xgb.plot.deepness.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 5a2dcd1

Please sign in to comment.