Distributed external memory support for GPU, Lazy XGBoost GPU application to Dask RAPIDS partitions #5851

declan-hernon · 2020-07-03T10:10:13Z

Hi guys,

I have a couple of feature requests in the same ball park:

Distributed external memory support for GPU
Lazy XGBoost GPU application to Dask RAPIDS partitions

In a nutshell, I am working with a very large data set currently, and despite all my effort into minimising the amount of memory used (I.e., ensuring each feature uses no more data than it needs by specifying data types, using gradient based subsampling), it still cannot fit into memory on multiple GPUs.

I am on a cloud provider, so of course I could distribute this data to many machines - however, by my reckoning to fit all data in memory I would need thousands of GPUs. At this scale, this often involves raising many requests internally in order to get approval to raise quota limits on the cloud. On top of this, I don't need to train models particularly fast - typically they will be trained at most on a weekly interval. So why GPUs? Because I think they will be fundamentally cheaper than CPU - a single V100 took only half an hour longer than 72 Xeon chips, despite being half the cost.

Therefore, I think a nice feature would be if I could load data from libsvm format across many GPUs in a cluster, or alternatively, if there was a method to lazily apply XGBoost to Dask partitions rather than forcing Dask to persist in memory.

trivialfis · 2020-07-06T12:30:42Z

Em, let me take another look into the gradient based sampling. I think the external memory is based on dmlc-core, which splits data by thread and might generate unexpected memory usage.

trivialfis · 2020-07-18T00:12:30Z

@declan-hernon What's the size of your data in memory? And how many memory do you have? Could you try this: #5901 ?

alokehp · 2022-01-03T01:18:51Z

I just ran across a similar problem with my work. Has there been a way to integrate xgboost DaskDeviceQuantileDMatrix with external memory version? Any comment will be really appreciated.

trivialfis · 2022-01-16T11:06:53Z

DaskDeviceQuantileDMatrix is sort of like external memory in a way that it iterates over all partitions in a worker, which avoids a concatenation and a CSR matrix. It saves memory but there's an upper bound. Full implementation of external memory for GPU is unlikely since copying data from external memory to host memory to device memory is just too expensive (at least 100x more than actual computation).

Closing this now.

trivialfis added the feature-request label Nov 5, 2020

trivialfis mentioned this issue Dec 17, 2020

xgboost.dask.train freezes on large dataset but dask_ml.xgboost.train succeeds #6217

Closed

trivialfis mentioned this issue Jul 7, 2021

Export Python Interface for external memory. #7070

Merged

7 tasks

trivialfis closed this as completed Jan 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed external memory support for GPU, Lazy XGBoost GPU application to Dask RAPIDS partitions #5851

Distributed external memory support for GPU, Lazy XGBoost GPU application to Dask RAPIDS partitions #5851

declan-hernon commented Jul 3, 2020

trivialfis commented Jul 6, 2020

trivialfis commented Jul 18, 2020

alokehp commented Jan 3, 2022

trivialfis commented Jan 16, 2022

Distributed external memory support for GPU, Lazy XGBoost GPU application to Dask RAPIDS partitions #5851

Distributed external memory support for GPU, Lazy XGBoost GPU application to Dask RAPIDS partitions #5851

Comments

declan-hernon commented Jul 3, 2020

trivialfis commented Jul 6, 2020

trivialfis commented Jul 18, 2020

alokehp commented Jan 3, 2022

trivialfis commented Jan 16, 2022