-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
De-duplicate GPU parameters. #4454
Conversation
@RAMitchell It turns out there are some numeric difference between CPU and CUDA transform weren't caught by the tests before. After disabling GPU this test won't pass:
I tried to diff the outputs from two different settings. The difference mostly come from
|
Another issue is, I'm currently hacking the configuration in Learner TrainParam, honestly I don't fully understand what's happening in there. |
One last thing to do is either delay the configuration of Objective to after data is known, or resize the label checking result vector every time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in summary this PR provides the GPUSet::Global() method, where any algorithm can simply fetch the global configuration of GPUs instead of manually reading parameters and configuring this. This is nice because removes quite a bit of code from other algorithms while still allowing them to manually specify this if necessary.
This PR solves the problem where GPUs were being used when the user expects only CPU algorithms. This was happening in the objective functions due to the default parameter of n_gpus=1. This is solved by pushing the configuration into the learner, which has more global knowledge about whether CPU or GPU algorithms should be used.
The danger here is mostly in the very difficult to understand configurations happening in the learner, but this is no different to existing code e.g. where we manually configure updater parameters based on tree_method. It would be good to make sure the behaviour is expected upon serialisation/deserialisation of the model.
I like the way you have named functions in the learner to more explicitly describe the configuration that is happening. I think this area of code can be improved a lot in future.
One feature that would be nice to have is to log information (at some verbosity level) about GPUs selected upon initialisation of the global singleton.
Looks good to me, although I'm not sure if it should go in 0.9. This could be considered a bug fix.
@canonizer can I get a review from you as well please.
Good point, will try to make a test later.
Will add a |
@RAMitchell Actually I don't want to merge this PR. I remember there were users who train different models on different GPUs. Making a global variable will break it. I need to think of something that doesn't break the functionality but also keep our implementation clean. |
Suggestions are welcomed. |
@hcho3 @RAMitchell I have a local branch that passes a pointer to The benefit of passing The downside is the restructuring is massive, and all |
Also, this is a blocker for JSon. |
Current structure of parameter handling causes duplication, as learning parameters are duplicated into objectives, metrics, and updaters. So your proposal has merits. Let me think over this and get back to you. |
@hcho3 It also fixes the bug that XGBoost choosing GPU aggressively due to lack of global configuration. Currently if the user don't supply |
tests/cpp/helpers.h
Outdated
|
||
#if defined(__CUDACC__) | ||
#define DeclareUnifiedTest(name) GPU ## name | ||
#else | ||
#define DeclareUnifiedTest(name) name | ||
#endif | ||
|
||
#if defined(__CUDACC__) | ||
#define NGPUS() 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use #define NGPUS 1
, without ()
, as there are no parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use #define NGPUS 1, without (), as there are no parameters?
@trivialfis Second this comment.
@hcho3 Could you please help taking a look in Jenkins' cache? It seems clang-tidy is running on an old copy of this PR: https://xgboost-ci.net/blue/organizations/jenkins/xgboost/detail/PR-4454/10/pipeline In latest commit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your solution. Agree with the others that a const shared pointer is much better than a raw pointer.
@RAMitchell @canonizer Could you please explain why shared pointer is better? From my point of view, objects like Passing I'm open to suggestions, but I need more context to understand why shared pointer is a better choice. |
Never mind, it works. Thanks. |
SummarySince this PR does a lots of things, here is a summary for @hcho3 when the release after 0.90 hit. This PR unifies the two parameters To implement such configuration, I added an extra parameter Also, this PR fixes pickling issue when model is trained on GPU but later loaded on CPU-only devices. Added tests for |
@trivialfis Thanks for fixing the pickling issue |
@hcho3 Yes, this works great. Please merge! :) |
@pseudotensor Thanks for trying it out. I will review this PR and approve it today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. I have some comments about style.
* Change NGPUS. * Remove static_cast. * Change obj func Create parameters' order.
|
@pseudotensor Can we have a repro script? |
@trivialfis Currently, Python tests on Win64 are not idempotent, as they attempt to install and remove XGBoost package into the system Python environment. So running more than one jobs in the same machine will cause problems (one job installs XGBoost and another removes it). For now, I limited the number of jobs to 1 for Windows workers (and restarted tests), but I'd like to make Win64 tests idempotent in a follow-up PR. |
That's just from the CI, seems fixed. |
@pseudotensor sounds like the reason described by @hcho3 . |
Thanks for clarification. I will follow up with a PR to stabilize Windows CI |
gpu_id
andn_gpus
in LearnerTrainParam