Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dev/unified dispatching prototype #7724

Closed

Conversation

razdoburdin
Copy link
Contributor

In continuous of #5659 and #6212.
Here I present a way for dispatching the various devices (cpu/cuda device/oneapi device).
This request contains only the changes being related to all the devices. The code for oneapi devices support is planned to be added later.

The main idea of dispatching was discussed in #6212. A new global parameter called device_selector is added. This parameter determines the device where the calculations will be made as well as the specific kernel that will be executed. So if the user configures XGBoost by the following parameters:
clf = xgboost.XGBClassifier(... , objective='multi:softmax', tree_method='hist')
the cpu version of the library will be executed. But if the user add device_selector="oneapi:gpu":
clf = xgboost.XGBClassifier(... , device_selector='oneapi:gpu', objective='multi:softmax', tree_method='hist')
the specific code for oneapi GPU will be used.

For cuda the relative logic is not implemented, thus for this case it is just an alternative way for setting the gpu_id. For saving backward compatibility with the existing user code, the priority of gpu_id is made higher.

The additional feature added by this request is an independent specification of devices for fitting and prediction. If the user specifies device_selector='fit:oneapi:gpu; predict:cpu', oneapi GPU will be used for fitting, and CPU will be used for prediction.

@trivialfis trivialfis added this to 1.6 In Progress in 2.0 Roadmap via automation Mar 12, 2022
@trivialfis trivialfis moved this from 1.6 In Progress to 2.0 in 2.0 Roadmap Mar 12, 2022
…GPUCoordDescent LogitSerializationTest.GpuHist LogitSerializationTest.GPUCoordDescent MultiClassesSerializationTest.GpuHist MultiClassesSerializationTest.GPUCoordDescent
@trivialfis
Copy link
Member

Thank you for working on this! I also wrote a higher level RFC #7308 for future device dispatching, which should be complementary to this PR.

I will look into this in more detail later.

dmitry.razdoburdin added 2 commits March 14, 2022 17:48
@@ -31,6 +32,22 @@ struct GenericParameter : public XGBoostParameter<GenericParameter> {
bool fail_on_invalid_gpu_id {false};
bool validate_parameters {false};

/* Device dispatcher object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

}

void DeviceSelector::Init(const std::string& user_input_device_selector) {
int fit_position = user_input_device_selector.find(fit_.Prefix());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's appropriate that we don't distinguish between predict and fit? Whatever device the user has specified, we will use it everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, a user can configure the prediction on CPU and fitting on GPU by specifying 'predictor=cpu_predictor', right? The idea here is to provide the user a unified way for selecting devices in both fitting and prediction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will remove gpu_predictor and gpu_hist (hopefully in this release) as documented in #7308 . The expected result is we will have only one (global) parameter device to control the dispatch:

with xgb.config_context(device="sycl:0"):
   booster.predict(X)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see.
If you don't plan supporting different devices for fitting and prediction, this feature is inappropriate. Fortunately, it can easily be reduced to the uniform device descriptor for both stages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That avoids some internal conflicts, it's difficult to configure the states with the current design. We have been working on using https://github.com/dmlc/xgboost/blob/master/include/xgboost/generic_parameters.h as the context object for XGBoost. Maybe we can integrate the device selector in this PR with it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make some progress on setting up the interface and keep you posted. Thank you for working on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @trivialfis ,
are there any progress in this direction? May be some help from our side can be useful?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@razdoburdin I have had some experiments on this recently, the problem is distributed environments and multi-threaded environments (like python async). We need to share the device index between all workers and all threads, which needs some synchronization strategies.

We don't need any synchronization if the device id is limited to booster as a local variable. But if we were to extend it to DMatrix as well (for constructing DMatrix from various sources of data), then the issue becomes a headache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @trivialfis , have we a chance to implement such or similar concept in xgboost 2.0? Maybe you need some help in it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm still planning it as a major breaking change for 2.0. Got distracted away in the 1.7 by the new pyspark interface. Expect some progress next month. Sorry for the slow update.

@trivialfis trivialfis moved this from 2.0 TODO to 2.0 In Progress in 2.0 Roadmap Mar 20, 2023
@trivialfis
Copy link
Member

This is mostly complete now. #7308 (comment)

@trivialfis trivialfis closed this Jul 20, 2023
@trivialfis trivialfis moved this from 2.0 In Progress to 2.0 Done in 2.0 Roadmap Jul 20, 2023
@razdoburdin razdoburdin deleted the dev/unified_dispatching_prototype branch May 21, 2024 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
2.0 Roadmap
  
2.0 Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants