[RFC] Unify device configuration. #7308

trivialfis · 2021-10-11T12:47:28Z

This is a continuation of #4600

Overview

Use global configuration

From my perspective, this method is cleaner and covers both DMatrix and Booster so it's listed first. An easier-to-implement solution is described in the next section.

Define a new device parameter for XGBoost as global configuration and remove existing parameters including gpu_hist, gpu_id, and predictor. For the native Python interface, it will look like this:

with xgboost.config_context(device="CUDA:0"):
    Xy = xgb.DMatrix(X, y)
    xgboost.train({"tree_method": "hist"}, Xy)
    xgboost.predict(Xy)

The above code snippet should run on the first CUDA device, using GPU implementation of hist tree method. Also, the prediction should run on the same device regardless of the location of input data. The scikit-learn interface will look like this:

clf = xgb.XGBClassifier(device="CUDA:0", tree_method="hist")

while the config context is created internally in each function of XGBClassifier. For R users, we also have the xgb.set_config function that changes global parameters.

JVM packages are lagging behind. But in theory, we can have something similar. For the Java binding, we can define functions that are similar to R or Python xgb.set_config to set the global parameter. For Scala binding, we have high-level estimators like XGBClassifier in Python so we can handle the configuration internally.

Last but not least, the C interface is the basis of all other interfaces, so its implementation should be trivial.

For handling existing code, my suggestion would be simply to throw an informative error. For example, if the user has specified gpu_hist, then we require device also to be set.

Alternative solution

This might be more practical in short term. The device parameter doesn't have to be a global parameter. Like the currently available gpu_id, which is a parameter for the booster object. Hence we can keep it that way and reuse the gpu_id parameter. This is still a breaking change due to other removed parameter but require lesser changes. For the native Python interface, it will look like this:

Xy = xgb.DMatrix(X, y)
booster = xgboost.train({"tree_method": "hist", "gpu_id": "CUDA:0"}, Xy)
# or
booster = xgboost.train({"tree_method": "hist", "gpu_id": "0"}, Xy)
# for compatibility reason.

booster.set_param({"gpu_id": "CPU"})
# Use CPU for prediction.
booster.predict(Xy)

Motivation

Device management has been a headache over the past. We have removed the n_gpus parameter in the 1.0 release, which helped clean up the code a little bit. But there are still many other issues in the current device management configuration. The most important one is, we need a single source of information about device ordinal. Currently, the device is chosen based on the following inputs:

gpu_id parameter: the supposed only authority that was rarely honored.
tree_method parameter. gpu_hist or not.
predictor parameter: gpu_predictor or not.
data: Whether data is already on the device (like cupy, cuDF).
At the first iteration, where XGBoost tries to avoid copying data into the device by using CPU prediction.
environment: Does the environment have GPU at all? This might happen after users load a pickled model on CPU only machine.
model: The user might continue training on an existing model, in which case we don't want to pull the data into GPU for initial prediction.
custom objective: The gradient returned is on CPU while XGBoost might be running on GPU.

As one might see, there are too many correlated factors influencing the decision of device ordinal, and sometimes they are conflicting with each other. For instance, Setting "gpu_hist" leads to gpu_id >= 0

"gpu_hist" -> gpu_id = 0

then if a user wants to run prediction on the CPU, the predictor might be set:

booster.set_param({"predictor": "cpu_predictor"})

Then what's the current gpu_id? I don't know. The problem is getting worse with inplace prediction and GPU data inputs. Also, with the OneAPI proposal from Intel, we have a growing number of configurations, and the existing approach simply cannot handle the complexity.

Implementation

Depending on which solution is chosen, global parameter or booster parameter, we might opt for a different implementation. But the general state transition should be the same.

For compatibility, if gpu_predictor or gpu_hist is chosen, Consistent device must also be specified, otherwise, there will be an error. By consistent, it means the device should be set as CUDA:x. This is a breaking change, but can be handled with a crafted error message.
If device is selected to be CUDA, then the tree method must be one of the {hist, gpu_hist, auto}. All of them will become gpu_hist internally. For any other tree methods, XGBoost will throw a not implemented error. We can have approx running on
GPU if needed, but that's beyond the scope of this RFC.
For inplace prediction, the device will continue to be chosen automatically, no change is needed.
For scikit-learn interface, which uses inplace prediction automatically, the change would be matching the input data type to the device. Or we simply revert the configuration and let the user decide whether inplace prediction is desired. This one is a bit more tricky as it helps reducing memory usage and latency dramatically, especially for dask. We can use more thoughts on this.
As for the heuristic of avoiding copying data to GPU for the first prediction, my plan is to remove it and make the copy anyway. The memory usage is unlikely to exceed quantile sketching. Or we can run prediction in batches like the initialization of ellpack.

Based on these rules, we have removed predictor, tree_method, memory conservation heuristic, and data input type from the decision-making process. Lastly, there are the environment and the custom objective. These 2 can be continued to be handled as it's.

The text was updated successfully, but these errors were encountered:

trivialfis · 2021-10-11T12:51:02Z

cc @RAMitchell @hcho3 @wbo4958 @JohnZed @dantegd @vepifanov @ShvetsKS @pseudotensor

This might be the most significant breaking change in a long time. Please help with comments and suggestions.

trivialfis · 2021-10-11T13:06:02Z

A previous attempt is at https://github.com/dmlc/xgboost/pull/6971/files . I have written some thoughts in the code comment there but largely summarized here.

wbo4958 · 2021-10-11T22:14:13Z

Looks like JVM can follow your suggestion easily.

RAMitchell · 2021-10-12T07:16:21Z

First example looks good to me:

with xgboost.config_context(device="CUDA:0"):
    Xy = xgb.DMatrix(X, y)
    xgboost.train({"tree_method": "hist"}, Xy)
    xgboost.predict(Xy)

Is going to be slightly tedious to implement as we have to change every language binding, but seems like a very positive change.

trivialfis · 2021-10-13T12:40:29Z

For implementing the change, I would like to create an independent branch in dmlc during development so that we can run CI with incremental changes.

This is the one last PR for removing omp global variable. * Add context object to the `DMatrix`. This bridges `DMatrix` with #7308 . * Require context to be available at the construction time of booster. * Add `n_threads` support for R csc DMatrix constructor. * Remove `omp_get_max_threads` in R glue code. * Remove threading utilities that rely on omp global variable.

trivialfis · 2023-02-17T20:46:24Z

I'm working on this now:

Unify the context object between booster and DMatrix. DMatrix will still contain a context constructed during initialization. But afterward, all operations will use the context from the booster. (Use booster context in DMatrix. #8896).
Remove predictor. ([breaking] Remove the predictor param, allow fallback to prediction using DMatrix. #9129, Improve test coverage with predictor configuration. #9354)
Unify the tree methods. (Unify the hist tree method for different devices. #9363, Fix warning message for device. #9402)
Define the new device parameter for dispatching. (Define the new device parameter. #9362)
Dask (Handle the new device parameter in dask and demos. #9386)
Spark ([jvm-packages] Add the new device parameter. #9385)
PySpark ([pyspark] Handle the device parameter in pyspark. #9390)
Document review. (Remove unmaintained jvm readme and dev scripts. #9395, Device ordinal doc #9398)
Add approx with GPU. (Implement sketching with Hessian on GPU. #9399, Initial GPU support for the approx tree method. #9414)

We won't use the global variable and will keep each context local to booster instead. This saves us from handling multi-threaded applications.

cc @razdoburdin

trivialfis added the status: RFC label Oct 11, 2021

trivialfis mentioned this issue Oct 11, 2021

New way of specifying device. #4600

Closed

trivialfis added the type: roadmap label Oct 21, 2021

trivialfis added this to 2.0 in 2.0 Roadmap Oct 21, 2021

trivialfis mentioned this issue Nov 3, 2021

Partition optimization #7208

Open

trivialfis mentioned this issue Jan 26, 2022

Remove omp_get_max_threads #7608

Merged

trivialfis mentioned this issue Mar 14, 2022

Dev/unified dispatching prototype #7724

Closed

trivialfis moved this from 2.0 TODO to 2.0 In Progress in 2.0 Roadmap Aug 4, 2022

This was referenced Aug 4, 2022

Use config_context in sklearn interface. #8141

Merged

[WIP] Use config_context for device configuration. #8144

Closed

trivialfis mentioned this issue Sep 6, 2022

Common interface for collective communication #8057

Merged

This was referenced Dec 4, 2022

Rename and extract Context. #8528

Merged

[DMLC-xgboost-2.0.0-SNAPSHOT] CrossValidation tests FAILED on Multi GPUs hosts: Multiple processes within communication group running on same CUDA device is not supported. #8522

Closed

trivialfis mentioned this issue Jan 4, 2023

Fix loading GPU pickle with a CPU-only xgboost distribution. #8632

Merged

trivialfis mentioned this issue Jul 6, 2023

Define the new device parameter. #9362

Merged

6 tasks

trivialfis mentioned this issue Jul 16, 2023

[jvm-packages] Add the new device parameter. #9385

Merged

trivialfis closed this as completed Jul 31, 2023

2.0 Roadmap automation moved this from 2.0 In Progress to 2.0 Done Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Unify device configuration. #7308

[RFC] Unify device configuration. #7308

trivialfis commented Oct 11, 2021 •

edited

trivialfis commented Oct 11, 2021

trivialfis commented Oct 11, 2021

wbo4958 commented Oct 11, 2021

RAMitchell commented Oct 12, 2021

trivialfis commented Oct 13, 2021 •

edited

trivialfis commented Feb 17, 2023 •

edited

[RFC] Unify device configuration. #7308

[RFC] Unify device configuration. #7308

Comments

trivialfis commented Oct 11, 2021 • edited

Overview

Use global configuration

Alternative solution

Motivation

Implementation

trivialfis commented Oct 11, 2021

trivialfis commented Oct 11, 2021

wbo4958 commented Oct 11, 2021

RAMitchell commented Oct 12, 2021

trivialfis commented Oct 13, 2021 • edited

trivialfis commented Feb 17, 2023 • edited

trivialfis commented Oct 11, 2021 •

edited

trivialfis commented Oct 13, 2021 •

edited

trivialfis commented Feb 17, 2023 •

edited