Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed external memory support for GPU, Lazy XGBoost GPU application to Dask RAPIDS partitions #5851

Closed
declan-hernon opened this issue Jul 3, 2020 · 4 comments

Comments

@declan-hernon
Copy link

Hi guys,

I have a couple of feature requests in the same ball park:

  • Distributed external memory support for GPU
  • Lazy XGBoost GPU application to Dask RAPIDS partitions

In a nutshell, I am working with a very large data set currently, and despite all my effort into minimising the amount of memory used (I.e., ensuring each feature uses no more data than it needs by specifying data types, using gradient based subsampling), it still cannot fit into memory on multiple GPUs.

I am on a cloud provider, so of course I could distribute this data to many machines - however, by my reckoning to fit all data in memory I would need thousands of GPUs. At this scale, this often involves raising many requests internally in order to get approval to raise quota limits on the cloud. On top of this, I don't need to train models particularly fast - typically they will be trained at most on a weekly interval. So why GPUs? Because I think they will be fundamentally cheaper than CPU - a single V100 took only half an hour longer than 72 Xeon chips, despite being half the cost.

Therefore, I think a nice feature would be if I could load data from libsvm format across many GPUs in a cluster, or alternatively, if there was a method to lazily apply XGBoost to Dask partitions rather than forcing Dask to persist in memory.

@trivialfis
Copy link
Member

Em, let me take another look into the gradient based sampling. I think the external memory is based on dmlc-core, which splits data by thread and might generate unexpected memory usage.

@trivialfis
Copy link
Member

@declan-hernon What's the size of your data in memory? And how many memory do you have? Could you try this: #5901 ?

@alokehp
Copy link

alokehp commented Jan 3, 2022

I just ran across a similar problem with my work. Has there been a way to integrate xgboost DaskDeviceQuantileDMatrix with external memory version? Any comment will be really appreciated.

@trivialfis
Copy link
Member

DaskDeviceQuantileDMatrix is sort of like external memory in a way that it iterates over all partitions in a worker, which avoids a concatenation and a CSR matrix. It saves memory but there's an upper bound. Full implementation of external memory for GPU is unlikely since copying data from external memory to host memory to device memory is just too expensive (at least 100x more than actual computation).

Closing this now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants