-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature proposal] Support RAPIDS Memory Manager (RMM) #5861
Comments
To my knowledge, XGBoost uses xgboost/src/common/device_helpers.cuh Lines 397 to 405 in b77e3e3
Can RMM work with Thrust and Cub allocators? |
What @daxiongshu is suggesting is allowing configuring/customizing XGBoost to use something other than hardcoding it to use the For more detail on RMM's interface, see https://github.com/rapidsai/rmm#using-rmm-in-c. |
@jrhemstad Thanks for clarifying. Does RMM provide a replacement with a semantic similar to the cached allocator EDIT. I just found rapidsai/rmm#290. RMM doesn't yet provide an equivalent of |
@trivialfis @RAMitchell Can the use of |
I don't think we need to replace caching allocator. thrust allocator on the other hand might be worth thinking about. |
Please elaborate why. Does the caching allocator use a small amount of GPU memory? |
Yup. The allocator used for host device vector is thrust allocator. |
How about the DMatrix? Does it also use the Thrust allocator? Then we can let users use RMM instead of the Thrust allocator and potentially achieve a memory saving. |
Yup, DMatrix is backed by thrust allocator. Not sure about ellpack, will check tomorrow. |
Correct, we don't yet have a wrapper for CUB's caching allocator. As you found, I worked on this in rapidsai/rmm#290, but I quickly saw it was much slower than our pool allocator, so I didn't pursue it further. For example, here are results from one of our benchmarks of doing 1000 allocations/frees of uniformly random sizes in a range For the CUB caching allocator, I used the same parameters that are used in XGBoost here: xgboost/src/common/device_helpers.cuh Line 373 in b77e3e3
As you can see from this (limited) benchmark, the pool implementation is 100-10000x faster than CUB's caching scheme. Code I used to run this is here: rapidsai/rmm#430 Benchmark code is here: https://github.com/rapidsai/rmm/blob/branch-0.15/benchmarks/random_allocations/random_allocations.cpp |
I think RMM could also give us more tools to manage and track memory usage. |
Yes, there are several other benefits of RMM other than performance. Because it defines a single, common interface for device memory allocation, it allows us to build additional tools around that interface. For example, logging, leak checking, thread safety, benchmarks, etc. |
RAPIDS is a GPU based data analytics framework developed by NVIDIA. A typical all-GPU workflow is to use RAPIDS cudf to read data and create features, and then use xgboost GPU to train. The two libraries end up competing for memory on the GPU and could lead to OOM since device memory is often limited on GPU.
To mitigate this issue, I propose that xgboost supports RMM: RAPIDS Memory Manager as an alternative memory allocator and share memory pool with other RAPIDS libraries.
The text was updated successfully, but these errors were encountered: