De-duplicate GPU parameters. #4454

trivialfis · 2019-05-10T07:25:39Z

Only define gpu_id and n_gpus in Learner TrainParam
Disable all GPU usage when GPU related parameters are not specified.

trivialfis · 2019-05-11T12:06:03Z

@RAMitchell It turns out there are some numeric difference between CPU and CUDA transform weren't caught by the tests before. After disabling GPU this test won't pass:

xgboost/tests/python-gpu/test_gpu_prediction.py

Line 13 in be0f346

def test_predict(self):

I tried to diff the outputs from two different settings. The difference mostly come from GetGradient, starts at 1e-6 and accumulates up. To pass the test rtol needs to be changed to 1e-3 from 1e-5. Here is the mismatch rate with 5000x10 data set:

E               AssertionError: 
E               Not equal to tolerance rtol=0.0001, atol=0
E               
E               Mismatch: 0.04%
E                x: array([-0.108287,  0.163954,  0.096447, ..., -0.173721, -0.115342,
E                      -0.577273], dtype=float32)
E                y: array([-0.108287,  0.163954,  0.096447, ..., -0.173721, -0.115342,
E                      -0.577273], dtype=float32)

trivialfis · 2019-05-11T12:08:08Z

Another issue is, I'm currently hacking the configuration in Learner TrainParam, honestly I don't fully understand what's happening in there.

trivialfis · 2019-05-11T12:21:24Z

One last thing to do is either delay the configuration of Objective to after data is known, or resize the label checking result vector every time GetGradient is called.

RAMitchell

So in summary this PR provides the GPUSet::Global() method, where any algorithm can simply fetch the global configuration of GPUs instead of manually reading parameters and configuring this. This is nice because removes quite a bit of code from other algorithms while still allowing them to manually specify this if necessary.

This PR solves the problem where GPUs were being used when the user expects only CPU algorithms. This was happening in the objective functions due to the default parameter of n_gpus=1. This is solved by pushing the configuration into the learner, which has more global knowledge about whether CPU or GPU algorithms should be used.

The danger here is mostly in the very difficult to understand configurations happening in the learner, but this is no different to existing code e.g. where we manually configure updater parameters based on tree_method. It would be good to make sure the behaviour is expected upon serialisation/deserialisation of the model.

I like the way you have named functions in the learner to more explicitly describe the configuration that is happening. I think this area of code can be improved a lot in future.

One feature that would be nice to have is to log information (at some verbosity level) about GPUs selected upon initialisation of the global singleton.

Looks good to me, although I'm not sure if it should go in 0.9. This could be considered a bug fix.

@canonizer can I get a review from you as well please.

trivialfis · 2019-05-13T05:19:31Z

it would be good to make sure the behaviour is expected upon serialisation/deserialisation of the model.

Good point, will try to make a test later.

One feature that would be nice to have is to log information

Will add a LOG(INFO) in GPUSet::Init.

trivialfis · 2019-05-13T11:49:22Z

@RAMitchell Actually I don't want to merge this PR. I remember there were users who train different models on different GPUs. Making a global variable will break it. I need to think of something that doesn't break the functionality but also keep our implementation clean.

trivialfis · 2019-05-13T11:53:36Z

Suggestions are welcomed.

trivialfis · 2019-05-15T04:50:40Z

@hcho3 @RAMitchell I have a local branch that passes a pointer to const LearnerTrainParam to Predictor, Metric, TreeUpdater, GBM and LinearUpdater via Create method in respective class. The change is quite massive and I would like to hear about your opinions before proceeding.

The benefit of passing LearnerTrainParam is we can eliminate the duplication of gpu_id, n_gpus without creating a global variable. As another benefit, nthread is also passed around, creating an opportunity of eliminating global OpenMP variable.

The downside is the restructuring is massive, and all Create methods need to accept an addition parameter.

trivialfis · 2019-05-15T05:19:28Z

Also, this is a blocker for JSon.

hcho3 · 2019-05-15T07:54:53Z

Current structure of parameter handling causes duplication, as learning parameters are duplicated into objectives, metrics, and updaters. So your proposal has merits. Let me think over this and get back to you.

trivialfis · 2019-05-15T08:33:53Z

@hcho3 It also fixes the bug that XGBoost choosing GPU aggressively due to lack of global configuration. Currently if the user don't supply n_gpus=0 parameter XGBoost will run metrics and objectives on GPU by default.

src/linear/updater_gpu_coordinate.cu

src/learner.cc

include/xgboost/gbm.h

canonizer · 2019-05-15T12:00:37Z

tests/cpp/helpers.h


 #if defined(__CUDACC__)
 #define DeclareUnifiedTest(name) GPU ## name
 #else
 #define DeclareUnifiedTest(name) name
 #endif

+#if defined(__CUDACC__)
+#define NGPUS() 1


Could you use #define NGPUS 1, without (), as there are no parameters?

Could you use #define NGPUS 1, without (), as there are no parameters?

@trivialfis Second this comment.

trivialfis · 2019-05-15T16:21:13Z

@hcho3 Could you please help taking a look in Jenkins' cache? It seems clang-tidy is running on an old copy of this PR: https://xgboost-ci.net/blue/organizations/jenkins/xgboost/detail/PR-4454/10/pipeline

In latest commit gpu_predictor.cu:376:3 is not a constructor.

RAMitchell

I like your solution. Agree with the others that a const shared pointer is much better than a raw pointer.

trivialfis · 2019-05-16T13:03:59Z

@RAMitchell @canonizer Could you please explain why shared pointer is better?

From my point of view, objects like Predictor do not own LearnerTrainParam, so its "borrowing" a pointer from Learner. That's like when you want to access the internal data of std::vector by calling data(), or std::string by calling c_str(), you get a raw (const) pointer. With the returned raw pointer you are only borrowing its content, but should not manage the content of the pointer (deallocate) because the ownership is still in std::string/std::vector.

Passing shared_ptr means the ownership is shared. It's a semantic issue. See
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rf-smart
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rr-ptr
and related topics.

I'm open to suggestions, but I need more context to understand why shared pointer is a better choice.

trivialfis · 2019-05-16T17:55:01Z

@hcho3

Could you please help taking a look in Jenkins' cache?

Never mind, it works. Thanks.

trivialfis · 2019-05-17T18:06:28Z

Summary

Since this PR does a lots of things, here is a summary for @hcho3 when the release after 0.90 hit.

This PR unifies the two parameters gpu_id and n_gpus as part of LearnerTrainParam. Other than code maintenance, the unification will fix the behaviour that XGBoost choosing GPU aggressively. Previously XGBoost will run metrics and objectives on single GPU by default due to lack of global configuration. After this PR, Learner will be able to choose between CPU/GPU based on training algorithm and predictor. Also, clearing out the duplication will get us one step closer to JSON RFC. In future we might be able to use nthreads from this parameters set to eliminate the use of OpenMP's global variable.

To implement such configuration, I added an extra parameter learner_param_ const* in every factory Create method and stored a LearnTrainParam const * in base class. The only one instance of LearnerTrainParam is stored in Learner class (not LearnerImpl). Since during deallocation, Learner is the last one to go (after Predictor, Metric ...) so there's no need for extra memory management for this parameter. I chose raw pointer over smart pointer, which is still in discussion with @canonizer and @RAMitchell .

Also, this PR fixes pickling issue when model is trained on GPU but later loaded on CPU-only devices.

Added tests for Learner should show the correctness of configuration.

This reverts commit 7ee09ee.

trivialfis · 2019-05-27T04:38:12Z

closes #4494
closes #4361

hcho3 · 2019-05-27T18:23:26Z

@trivialfis Thanks for fixing the pickling issue

pseudotensor · 2019-05-28T11:08:10Z

@hcho3 Yes, this works great. Please merge! :)

hcho3 · 2019-05-28T19:25:45Z

@pseudotensor Thanks for trying it out. I will review this PR and approve it today.

hcho3

LGTM overall. I have some comments about style.

include/xgboost/metric.h

src/learner.cc

include/xgboost/objective.h

include/xgboost/predictor.h

tests/cpp/test_learner.cc

tests/python-gpu/load_pickle.py

tests/python-gpu/test_pickling.py

* Change NGPUS. * Remove static_cast. * Change obj func Create parameters' order.

pseudotensor · 2019-05-28T23:26:07Z

ModuleNotFoundError: No module named 'xgboost.training'
That's just from testing, not me.

hcho3 · 2019-05-29T00:11:29Z

@pseudotensor Can we have a repro script?

hcho3 · 2019-05-29T00:14:05Z

@trivialfis Currently, Python tests on Win64 are not idempotent, as they attempt to install and remove XGBoost package into the system Python environment. So running more than one jobs in the same machine will cause problems (one job installs XGBoost and another removes it). For now, I limited the number of jobs to 1 for Windows workers (and restarted tests), but I'd like to make Win64 tests idempotent in a follow-up PR.

pseudotensor · 2019-05-29T03:07:25Z

@pseudotensor Can we have a repro script?

That's just from the CI, seems fixed.

trivialfis · 2019-05-29T03:11:42Z

@pseudotensor sounds like the reason described by @hcho3 .

hcho3 · 2019-05-29T03:59:53Z

Thanks for clarification. I will follow up with a PR to stabilize Windows CI

trivialfis marked this pull request as ready for review May 11, 2019 12:16

trivialfis force-pushed the unify-ngpus branch from 5112b1f to c243695 Compare May 12, 2019 16:15

trivialfis changed the title ~~[WIP] Unify GPU parameters.~~ Unify GPU parameters. May 12, 2019

trivialfis requested review from hcho3 and RAMitchell and removed request for hcho3 May 12, 2019 16:38

RAMitchell reviewed May 12, 2019

View reviewed changes

trivialfis changed the title ~~Unify GPU parameters.~~ [WIP] Unify GPU parameters. May 13, 2019

trivialfis force-pushed the unify-ngpus branch from 0703579 to 5001ee5 Compare May 15, 2019 08:23

canonizer suggested changes May 15, 2019

View reviewed changes

RAMitchell reviewed May 16, 2019

View reviewed changes

trivialfis force-pushed the unify-ngpus branch from 061142e to 4d1bb72 Compare May 16, 2019 17:21

trivialfis changed the title ~~[WIP] Unify GPU parameters.~~ Unify GPU parameters. May 16, 2019

trivialfis force-pushed the unify-ngpus branch from 4d1bb72 to 3227f0c Compare May 17, 2019 13:24

trivialfis mentioned this pull request May 17, 2019

Use std::thread instead of OMP for GPUs. #4302

Closed

trivialfis added 7 commits May 27, 2019 09:42

Return a GPUSet from LearnerTrainParam.

5caa2a3

Revert "Return a GPUSet from LearnerTrainParam."

cffd63f

This reverts commit 7ee09ee.

Fix lint.

f5e1ae0

Fix gpu_coord_descent.

8a456aa

Thorough tests for GPU configuration.

ae39f71

Rename method.

c4663a5

Fix gpu pickling.

93625c0

trivialfis force-pushed the unify-ngpus branch from b57c6a9 to 93625c0 Compare May 27, 2019 03:45

Fix Windows environment.

0de3fdc

hcho3 approved these changes May 28, 2019

View reviewed changes

hcho3 changed the title ~~Unify GPU parameters.~~ De-duplicate GPU parameters. May 28, 2019

trivialfis added 3 commits May 29, 2019 06:08

Address reviewer's comments.

adaf734

* Change NGPUS. * Remove static_cast. * Change obj func Create parameters' order.

Small clean up for Learner.

582e3ba

Fix predictor parameter.

5fb782f

pseudotensor mentioned this pull request May 29, 2019

@pseudotensor Can we have a repro script? #4512

Closed

trivialfis merged commit c589eff into dmlc:master May 29, 2019

trivialfis mentioned this pull request May 29, 2019

Unable to load model trained on multi-gpu machine, on single-gpu machine, #4361

Closed

trivialfis deleted the unify-ngpus branch May 29, 2019 15:09

hcho3 mentioned this pull request May 31, 2019

Is “gpu_hist” only supported by GPU-enabled machine? #4199

Closed

lock bot locked as resolved and limited conversation to collaborators Aug 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

De-duplicate GPU parameters. #4454

De-duplicate GPU parameters. #4454

trivialfis commented May 10, 2019

trivialfis commented May 11, 2019

trivialfis commented May 11, 2019

trivialfis commented May 11, 2019 •

edited

Loading

RAMitchell left a comment

trivialfis commented May 13, 2019

trivialfis commented May 13, 2019

trivialfis commented May 13, 2019

trivialfis commented May 15, 2019 •

edited

Loading

trivialfis commented May 15, 2019

hcho3 commented May 15, 2019

trivialfis commented May 15, 2019 •

edited

Loading

canonizer May 15, 2019

hcho3 May 28, 2019 •

edited

Loading

trivialfis commented May 15, 2019 •

edited

Loading

RAMitchell left a comment

trivialfis commented May 16, 2019 •

edited

Loading

trivialfis commented May 16, 2019

trivialfis commented May 17, 2019 •

edited

Loading

trivialfis commented May 27, 2019 •

edited

Loading

hcho3 commented May 27, 2019

pseudotensor commented May 28, 2019

hcho3 commented May 28, 2019

hcho3 left a comment

pseudotensor commented May 28, 2019 •

edited

Loading

hcho3 commented May 29, 2019

hcho3 commented May 29, 2019 •

edited

Loading

pseudotensor commented May 29, 2019

trivialfis commented May 29, 2019

hcho3 commented May 29, 2019

De-duplicate GPU parameters. #4454

De-duplicate GPU parameters. #4454

Conversation

trivialfis commented May 10, 2019

trivialfis commented May 11, 2019

trivialfis commented May 11, 2019

trivialfis commented May 11, 2019 • edited Loading

RAMitchell left a comment

Choose a reason for hiding this comment

trivialfis commented May 13, 2019

trivialfis commented May 13, 2019

trivialfis commented May 13, 2019

trivialfis commented May 15, 2019 • edited Loading

trivialfis commented May 15, 2019

hcho3 commented May 15, 2019

trivialfis commented May 15, 2019 • edited Loading

canonizer May 15, 2019

Choose a reason for hiding this comment

hcho3 May 28, 2019 • edited Loading

Choose a reason for hiding this comment

trivialfis commented May 15, 2019 • edited Loading

RAMitchell left a comment

Choose a reason for hiding this comment

trivialfis commented May 16, 2019 • edited Loading

trivialfis commented May 16, 2019

trivialfis commented May 17, 2019 • edited Loading

Summary

trivialfis commented May 27, 2019 • edited Loading

hcho3 commented May 27, 2019

pseudotensor commented May 28, 2019

hcho3 commented May 28, 2019

hcho3 left a comment

Choose a reason for hiding this comment

pseudotensor commented May 28, 2019 • edited Loading

hcho3 commented May 29, 2019

hcho3 commented May 29, 2019 • edited Loading

pseudotensor commented May 29, 2019

trivialfis commented May 29, 2019

hcho3 commented May 29, 2019

trivialfis commented May 11, 2019 •

edited

Loading

trivialfis commented May 15, 2019 •

edited

Loading

trivialfis commented May 15, 2019 •

edited

Loading

hcho3 May 28, 2019 •

edited

Loading

trivialfis commented May 15, 2019 •

edited

Loading

trivialfis commented May 16, 2019 •

edited

Loading

trivialfis commented May 17, 2019 •

edited

Loading

trivialfis commented May 27, 2019 •

edited

Loading

pseudotensor commented May 28, 2019 •

edited

Loading

hcho3 commented May 29, 2019 •

edited

Loading