Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

only copy the model once when predicting multiple batches #4457

Merged
merged 5 commits into from May 14, 2019

Conversation

rongou
Copy link
Contributor

@rongou rongou commented May 10, 2019

dh::safe_cuda(cudaMemcpyAsync(dh::Raw(tree_group_), model.tree_info.data(),
sizeof(int) * model.tree_info.size(),
cudaMemcpyHostToDevice));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code would look better as a separate method. In my opinion, this looks more logical, and reduces the number of function parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -361,10 +364,14 @@ class GPUPredictor : public xgboost::Predictor {
DeviceOffsets(batch.offset, batch.data.Size(), &device_offsets);
batch.data.Reshard(GPUDistribution::Explicit(devices_, device_offsets));

// TODO(rongou): only copy the model once for all the batches.
if (batch_offset == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be hoisted out of the for loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

dh::ExecuteIndexShards(&shards_, [&](int idx, DeviceShard& shard) {
shard.InitModel(model, h_tree_segments, h_nodes);
});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving it outside of the loop. You won't need the conditional then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

void PredictInternal
(const SparsePage& batch, const MetaInfo& info,
HostDeviceVector<bst_float>* predictions,
size_t tree_begin, size_t tree_end, int n_classes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need n_classes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed by the prediction kernel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it belongs to the model, could you move it to InitModel()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -361,10 +364,14 @@ class GPUPredictor : public xgboost::Predictor {
DeviceOffsets(batch.offset, batch.data.Size(), &device_offsets);
batch.data.Reshard(GPUDistribution::Explicit(devices_, device_offsets));

// TODO(rongou): only copy the model once for all the batches.
if (batch_offset == 0) {
dh::ExecuteIndexShards(&shards_, [&](int idx, DeviceShard& shard) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the code processing the tree nodes above (lines 335-350) has to do with model initialization, consider moving it (together with calls to DeviceShard::InitModel()) to a separate method of GPUPredictor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

void PredictInternal
(const SparsePage& batch, const MetaInfo& info,
HostDeviceVector<bst_float>* predictions,
size_t tree_begin, size_t tree_end, int n_classes) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it belongs to the model, could you move it to InitModel()?

if (tree_end - tree_begin == 0) { return; }
monitor_.StartCuda("DevicePredictInternal");

void InitModel(const gbm::GBTreeModel &model, size_t tree_begin, size_t tree_end) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider placing & consistently, i.e. const gbm::GBTreeModel&, auto& or const gbm::GBTreeModel &, auto &. In two of the InitModel() methods, & is placed differently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

shard.PredictInternal(batch, dmat->Info(), out_preds, model,
h_tree_segments, h_nodes, tree_begin, tree_end);
shard.PredictInternal(batch, dmat->Info(), out_preds, tree_begin, tree_end,
model.param.num_output_group);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all three of these parameters can be stored in the shard after InitModel().

However, I'll leave this up to you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@rongou
Copy link
Contributor Author

rongou commented May 14, 2019

@RAMitchell this is ready to merge. Thanks!

@hcho3
Copy link
Collaborator

hcho3 commented May 14, 2019

@RAMitchell Do we want this in 0.90?

@hcho3
Copy link
Collaborator

hcho3 commented May 14, 2019

@rongou @canonizer @sriramch Can you do me a favor and explain what this PR does? Is this a follow-up to #4284 (external memory with single GPU) and #4438 (external memory with multiple GPUs)?

@rongou
Copy link
Contributor Author

rongou commented May 14, 2019

@hcho3 yes it's an optimization and some refactoring. In external memory mode when we are running prediction on multiple batches, we should only copy the model to GPU once instead of every batch.

@RAMitchell
Copy link
Member

@hcho3 this is a fairly low impact change, up to you whether you want to include it. I will merge so as not to have prs sitting around.

@RAMitchell RAMitchell merged commit a9ec2dd into dmlc:master May 14, 2019
@rongou rongou deleted the copy-model-once branch May 15, 2019 16:37
@lock lock bot locked as resolved and limited conversation to collaborators Aug 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants