Skip to content

Commit 5a2dcd1

Browse files
authored
[R] Provide better guidance for persisting XGBoost model (dmlc#5964)
* [R] Provide better guidance for persisting XGBoost model * Update saving_model.rst * Add a paragraph about xgb.serialize()
1 parent bf2990e commit 5a2dcd1

17 files changed

+232
-81
lines changed

R-package/DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,5 +64,5 @@ Imports:
6464
data.table (>= 1.9.6),
6565
magrittr (>= 1.5),
6666
stringi (>= 0.5.2)
67-
RoxygenNote: 7.1.0
67+
RoxygenNote: 7.1.1
6868
SystemRequirements: GNU make, C++14

R-package/R/utils.R

Lines changed: 55 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -308,18 +308,64 @@ xgb.createFolds <- function(y, k = 10)
308308
#' @name xgboost-deprecated
309309
NULL
310310

311-
#' Do not use saveRDS() for long-term archival of models. Use xgb.save() instead.
311+
#' Do not use \code{\link[base]{saveRDS}} or \code{\link[base]{save}} for long-term archival of
312+
#' models. Instead, use \code{\link{xgb.save}} or \code{\link{xgb.save.raw}}.
312313
#'
313-
#' It is a common practice to use the built-in \code{saveRDS()} function to persist R objects to
314-
#' the disk. While \code{xgb.Booster} objects can be persisted with \code{saveRDS()} as well, it
315-
#' is not advisable to use it if the model is to be accessed in the future. If you train a model
316-
#' with the current version of XGBoost and persist it with \code{saveRDS()}, the model is not
317-
#' guaranteed to be accessible in later releases of XGBoost. To ensure that your model can be
318-
#' accessed in future releases of XGBoost, use \code{xgb.save()} instead. For more details and
319-
#' explanation, consult the page
314+
#' It is a common practice to use the built-in \code{\link[base]{saveRDS}} function (or
315+
#' \code{\link[base]{save}}) to persist R objects to the disk. While it is possible to persist
316+
#' \code{xgb.Booster} objects using \code{\link[base]{saveRDS}}, it is not advisable to do so if
317+
#' the model is to be accessed in the future. If you train a model with the current version of
318+
#' XGBoost and persist it with \code{\link[base]{saveRDS}}, the model is not guaranteed to be
319+
#' accessible in later releases of XGBoost. To ensure that your model can be accessed in future
320+
#' releases of XGBoost, use \code{\link{xgb.save}} or \code{\link{xgb.save.raw}} instead.
321+
#'
322+
#' @details
323+
#' Use \code{\link{xgb.save}} to save the XGBoost model as a stand-alone file. You may opt into
324+
#' the JSON format by specifying the JSON extension. To read the model back, use
325+
#' \code{\link{xgb.load}}.
326+
#'
327+
#' Use \code{\link{xgb.save.raw}} to save the XGBoost model as a sequence (vector) of raw bytes
328+
#' in a future-proof manner. Future releases of XGBoost will be able to read the raw bytes and
329+
#' re-construct the corresponding model. To read the model back, use \code{\link{xgb.load.raw}}.
330+
#' The \code{\link{xgb.save.raw}} function is useful if you'd like to persist the XGBoost model
331+
#' as part of another R object.
332+
#'
333+
#' Note: Do not use \code{\link{xgb.serialize}} to store models long-term. It persists not only the
334+
#' model but also internal configurations and parameters, and its format is not stable across
335+
#' multiple XGBoost versions. Use \code{\link{xgb.serialize}} only for checkpointing.
336+
#'
337+
#' For more details and explanation about model persistence and archival, consult the page
320338
#' \url{https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html}.
321339
#'
322-
#' @name a-compatibility-note-for-saveRDS
340+
#' @examples
341+
#' data(agaricus.train, package='xgboost')
342+
#' bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 2,
343+
#' eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
344+
#'
345+
#' # Save as a stand-alone file; load it with xgb.load()
346+
#' xgb.save(bst, 'xgb.model')
347+
#' bst2 <- xgb.load('xgb.model')
348+
#'
349+
#' # Save as a stand-alone file (JSON); load it with xgb.load()
350+
#' xgb.save(bst, 'xgb.model.json')
351+
#' bst2 <- xgb.load('xgb.model.json')
352+
#'
353+
#' # Save as a raw byte vector; load it with xgb.load.raw()
354+
#' xgb_bytes <- xgb.save.raw(bst)
355+
#' bst2 <- xgb.load.raw(xgb_bytes)
356+
#'
357+
#' # Persist XGBoost model as part of another R object
358+
#' obj <- list(xgb_model_bytes = xgb.save.raw(bst), description = "My first XGBoost model")
359+
#' # Persist the R object. Here, saveRDS() is okay, since it doesn't persist
360+
#' # xgb.Booster directly. What's being persisted is the future-proof byte representation
361+
#' # as given by xgb.save.raw().
362+
#' saveRDS(obj, 'my_object.rds')
363+
#' # Read back the R object
364+
#' obj2 <- readRDS('my_object.rds')
365+
#' # Re-construct xgb.Booster object from the bytes
366+
#' bst2 <- xgb.load.raw(obj2$xgb_model_bytes)
367+
#'
368+
#' @name a-compatibility-note-for-saveRDS-save
323369
NULL
324370

325371
# Lookup table for the deprecated parameters bookkeeping

R-package/R/xgb.Booster.R

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,8 @@ xgb.get.handle <- function(object) {
111111
#' eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
112112
#' saveRDS(bst, "xgb.model.rds")
113113
#'
114+
#' # Warning: The resulting RDS file is only compatible with the current XGBoost version.
115+
#' # Refer to the section titled "a-compatibility-note-for-saveRDS-save".
114116
#' bst1 <- readRDS("xgb.model.rds")
115117
#' if (file.exists("xgb.model.rds")) file.remove("xgb.model.rds")
116118
#' # the handle is invalid:

R-package/R/xgb.save.R

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,11 @@
1313
#'
1414
#' Note: a model can also be saved as an R-object (e.g., by using \code{\link[base]{readRDS}}
1515
#' or \code{\link[base]{save}}). However, it would then only be compatible with R, and
16-
#' corresponding R-methods would need to be used to load it.
16+
#' corresponding R-methods would need to be used to load it. Moreover, persisting the model with
17+
#' \code{\link[base]{readRDS}} or \code{\link[base]{save}}) will cause compatibility problems in
18+
#' future versions of XGBoost. Consult \code{\link{a-compatibility-note-for-saveRDS-save}} to learn
19+
#' how to persist models in a future-proof way, i.e. to make the model accessible in future
20+
#' releases of XGBoost.
1721
#'
1822
#' @seealso
1923
#' \code{\link{xgb.load}}, \code{\link{xgb.Booster.complete}}.

R-package/man/a-compatibility-note-for-saveRDS-save.Rd

Lines changed: 62 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

R-package/man/a-compatibility-note-for-saveRDS.Rd

Lines changed: 0 additions & 15 deletions
This file was deleted.

R-package/man/xgb.Booster.complete.Rd

Lines changed: 2 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

R-package/man/xgb.create.features.Rd

Lines changed: 5 additions & 5 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

R-package/man/xgb.cv.Rd

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

R-package/man/xgb.dump.Rd

Lines changed: 4 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)