Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

De-duplicate GPU parameters. #4454

Merged
merged 15 commits into from
May 29, 2019
Merged

De-duplicate GPU parameters. #4454

merged 15 commits into from
May 29, 2019

Conversation

trivialfis
Copy link
Member

  • Only define gpu_id and n_gpus in Learner TrainParam
  • Disable all GPU usage when GPU related parameters are not specified.

@trivialfis
Copy link
Member Author

@RAMitchell It turns out there are some numeric difference between CPU and CUDA transform weren't caught by the tests before. After disabling GPU this test won't pass:

def test_predict(self):

I tried to diff the outputs from two different settings. The difference mostly come from GetGradient, starts at 1e-6 and accumulates up. To pass the test rtol needs to be changed to 1e-3 from 1e-5. Here is the mismatch rate with 5000x10 data set:

E               AssertionError: 
E               Not equal to tolerance rtol=0.0001, atol=0
E               
E               Mismatch: 0.04%
E                x: array([-0.108287,  0.163954,  0.096447, ..., -0.173721, -0.115342,
E                      -0.577273], dtype=float32)
E                y: array([-0.108287,  0.163954,  0.096447, ..., -0.173721, -0.115342,
E                      -0.577273], dtype=float32)

@trivialfis
Copy link
Member Author

Another issue is, I'm currently hacking the configuration in Learner TrainParam, honestly I don't fully understand what's happening in there.

@trivialfis trivialfis marked this pull request as ready for review May 11, 2019 12:16
@trivialfis
Copy link
Member Author

trivialfis commented May 11, 2019

One last thing to do is either delay the configuration of Objective to after data is known, or resize the label checking result vector every time GetGradient is called.

@trivialfis trivialfis changed the title [WIP] Unify GPU parameters. Unify GPU parameters. May 12, 2019
@trivialfis trivialfis requested review from hcho3 and RAMitchell and removed request for hcho3 May 12, 2019 16:38
Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in summary this PR provides the GPUSet::Global() method, where any algorithm can simply fetch the global configuration of GPUs instead of manually reading parameters and configuring this. This is nice because removes quite a bit of code from other algorithms while still allowing them to manually specify this if necessary.

This PR solves the problem where GPUs were being used when the user expects only CPU algorithms. This was happening in the objective functions due to the default parameter of n_gpus=1. This is solved by pushing the configuration into the learner, which has more global knowledge about whether CPU or GPU algorithms should be used.

The danger here is mostly in the very difficult to understand configurations happening in the learner, but this is no different to existing code e.g. where we manually configure updater parameters based on tree_method. It would be good to make sure the behaviour is expected upon serialisation/deserialisation of the model.

I like the way you have named functions in the learner to more explicitly describe the configuration that is happening. I think this area of code can be improved a lot in future.

One feature that would be nice to have is to log information (at some verbosity level) about GPUs selected upon initialisation of the global singleton.

Looks good to me, although I'm not sure if it should go in 0.9. This could be considered a bug fix.

@canonizer can I get a review from you as well please.

@trivialfis
Copy link
Member Author

it would be good to make sure the behaviour is expected upon serialisation/deserialisation of the model.

Good point, will try to make a test later.

One feature that would be nice to have is to log information

Will add a LOG(INFO) in GPUSet::Init.

@trivialfis
Copy link
Member Author

@RAMitchell Actually I don't want to merge this PR. I remember there were users who train different models on different GPUs. Making a global variable will break it. I need to think of something that doesn't break the functionality but also keep our implementation clean.

@trivialfis trivialfis changed the title Unify GPU parameters. [WIP] Unify GPU parameters. May 13, 2019
@trivialfis
Copy link
Member Author

Suggestions are welcomed.

@trivialfis
Copy link
Member Author

trivialfis commented May 15, 2019

@hcho3 @RAMitchell I have a local branch that passes a pointer to const LearnerTrainParam to Predictor, Metric, TreeUpdater, GBM and LinearUpdater via Create method in respective class. The change is quite massive and I would like to hear about your opinions before proceeding.

The benefit of passing LearnerTrainParam is we can eliminate the duplication of gpu_id, n_gpus without creating a global variable. As another benefit, nthread is also passed around, creating an opportunity of eliminating global OpenMP variable.

The downside is the restructuring is massive, and all Create methods need to accept an addition parameter.

@trivialfis
Copy link
Member Author

Also, this is a blocker for JSon.

@hcho3
Copy link
Collaborator

hcho3 commented May 15, 2019

Current structure of parameter handling causes duplication, as learning parameters are duplicated into objectives, metrics, and updaters. So your proposal has merits. Let me think over this and get back to you.

@trivialfis
Copy link
Member Author

trivialfis commented May 15, 2019

@hcho3 It also fixes the bug that XGBoost choosing GPU aggressively due to lack of global configuration. Currently if the user don't supply n_gpus=0 parameter XGBoost will run metrics and objectives on GPU by default.

src/linear/updater_gpu_coordinate.cu Show resolved Hide resolved
src/learner.cc Show resolved Hide resolved
include/xgboost/gbm.h Show resolved Hide resolved
include/xgboost/gbm.h Show resolved Hide resolved

#if defined(__CUDACC__)
#define DeclareUnifiedTest(name) GPU ## name
#else
#define DeclareUnifiedTest(name) name
#endif

#if defined(__CUDACC__)
#define NGPUS() 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use #define NGPUS 1, without (), as there are no parameters?

Copy link
Collaborator

@hcho3 hcho3 May 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use #define NGPUS 1, without (), as there are no parameters?

@trivialfis Second this comment.

@trivialfis
Copy link
Member Author

trivialfis commented May 15, 2019

@hcho3 Could you please help taking a look in Jenkins' cache? It seems clang-tidy is running on an old copy of this PR: https://xgboost-ci.net/blue/organizations/jenkins/xgboost/detail/PR-4454/10/pipeline

In latest commit gpu_predictor.cu:376:3 is not a constructor.

Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your solution. Agree with the others that a const shared pointer is much better than a raw pointer.

@trivialfis
Copy link
Member Author

trivialfis commented May 16, 2019

@RAMitchell @canonizer Could you please explain why shared pointer is better?

From my point of view, objects like Predictor do not own LearnerTrainParam, so its "borrowing" a pointer from Learner. That's like when you want to access the internal data of std::vector by calling data(), or std::string by calling c_str(), you get a raw (const) pointer. With the returned raw pointer you are only borrowing its content, but should not manage the content of the pointer (deallocate) because the ownership is still in std::string/std::vector.

Passing shared_ptr means the ownership is shared. It's a semantic issue. See
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rf-smart
https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rr-ptr
and related topics.

I'm open to suggestions, but I need more context to understand why shared pointer is a better choice.

@trivialfis
Copy link
Member Author

@hcho3

Could you please help taking a look in Jenkins' cache?

Never mind, it works. Thanks.

@trivialfis trivialfis changed the title [WIP] Unify GPU parameters. Unify GPU parameters. May 16, 2019
@trivialfis
Copy link
Member Author

trivialfis commented May 17, 2019

Summary

Since this PR does a lots of things, here is a summary for @hcho3 when the release after 0.90 hit.

This PR unifies the two parameters gpu_id and n_gpus as part of LearnerTrainParam. Other than code maintenance, the unification will fix the behaviour that XGBoost choosing GPU aggressively. Previously XGBoost will run metrics and objectives on single GPU by default due to lack of global configuration. After this PR, Learner will be able to choose between CPU/GPU based on training algorithm and predictor. Also, clearing out the duplication will get us one step closer to JSON RFC. In future we might be able to use nthreads from this parameters set to eliminate the use of OpenMP's global variable.

To implement such configuration, I added an extra parameter learner_param_ const* in every factory Create method and stored a LearnTrainParam const * in base class. The only one instance of LearnerTrainParam is stored in Learner class (not LearnerImpl). Since during deallocation, Learner is the last one to go (after Predictor, Metric ...) so there's no need for extra memory management for this parameter. I chose raw pointer over smart pointer, which is still in discussion with @canonizer and @RAMitchell .

Also, this PR fixes pickling issue when model is trained on GPU but later loaded on CPU-only devices.

Added tests for Learner should show the correctness of configuration.

@trivialfis
Copy link
Member Author

trivialfis commented May 27, 2019

closes #4494
closes #4361

@hcho3
Copy link
Collaborator

hcho3 commented May 27, 2019

@trivialfis Thanks for fixing the pickling issue

@pseudotensor
Copy link
Contributor

@hcho3 Yes, this works great. Please merge! :)

@hcho3
Copy link
Collaborator

hcho3 commented May 28, 2019

@pseudotensor Thanks for trying it out. I will review this PR and approve it today.

Copy link
Collaborator

@hcho3 hcho3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. I have some comments about style.

include/xgboost/metric.h Show resolved Hide resolved
src/learner.cc Show resolved Hide resolved
src/learner.cc Outdated Show resolved Hide resolved
include/xgboost/objective.h Outdated Show resolved Hide resolved
include/xgboost/predictor.h Outdated Show resolved Hide resolved
tests/cpp/test_learner.cc Outdated Show resolved Hide resolved
tests/cpp/test_learner.cc Show resolved Hide resolved
tests/cpp/test_learner.cc Show resolved Hide resolved
tests/python-gpu/load_pickle.py Show resolved Hide resolved
tests/python-gpu/test_pickling.py Show resolved Hide resolved
@hcho3 hcho3 changed the title Unify GPU parameters. De-duplicate GPU parameters. May 28, 2019
* Change NGPUS.
* Remove static_cast.
* Change obj func Create parameters' order.
@pseudotensor
Copy link
Contributor

pseudotensor commented May 28, 2019

ModuleNotFoundError: No module named 'xgboost.training'
That's just from testing, not me.

@hcho3
Copy link
Collaborator

hcho3 commented May 29, 2019

@pseudotensor Can we have a repro script?

@hcho3
Copy link
Collaborator

hcho3 commented May 29, 2019

@trivialfis Currently, Python tests on Win64 are not idempotent, as they attempt to install and remove XGBoost package into the system Python environment. So running more than one jobs in the same machine will cause problems (one job installs XGBoost and another removes it). For now, I limited the number of jobs to 1 for Windows workers (and restarted tests), but I'd like to make Win64 tests idempotent in a follow-up PR.

@pseudotensor
Copy link
Contributor

@pseudotensor Can we have a repro script?

That's just from the CI, seems fixed.

@trivialfis
Copy link
Member Author

@pseudotensor sounds like the reason described by @hcho3 .

@hcho3
Copy link
Collaborator

hcho3 commented May 29, 2019

Thanks for clarification. I will follow up with a PR to stabilize Windows CI

@trivialfis trivialfis deleted the unify-ngpus branch May 29, 2019 15:09
@lock lock bot locked as resolved and limited conversation to collaborators Aug 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants