You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a couple of feature requests in the same ball park:
Distributed external memory support for GPU
Lazy XGBoost GPU application to Dask RAPIDS partitions
In a nutshell, I am working with a very large data set currently, and despite all my effort into minimising the amount of memory used (I.e., ensuring each feature uses no more data than it needs by specifying data types, using gradient based subsampling), it still cannot fit into memory on multiple GPUs.
I am on a cloud provider, so of course I could distribute this data to many machines - however, by my reckoning to fit all data in memory I would need thousands of GPUs. At this scale, this often involves raising many requests internally in order to get approval to raise quota limits on the cloud. On top of this, I don't need to train models particularly fast - typically they will be trained at most on a weekly interval. So why GPUs? Because I think they will be fundamentally cheaper than CPU - a single V100 took only half an hour longer than 72 Xeon chips, despite being half the cost.
Therefore, I think a nice feature would be if I could load data from libsvm format across many GPUs in a cluster, or alternatively, if there was a method to lazily apply XGBoost to Dask partitions rather than forcing Dask to persist in memory.
The text was updated successfully, but these errors were encountered:
Em, let me take another look into the gradient based sampling. I think the external memory is based on dmlc-core, which splits data by thread and might generate unexpected memory usage.
I just ran across a similar problem with my work. Has there been a way to integrate xgboost DaskDeviceQuantileDMatrix with external memory version? Any comment will be really appreciated.
DaskDeviceQuantileDMatrix is sort of like external memory in a way that it iterates over all partitions in a worker, which avoids a concatenation and a CSR matrix. It saves memory but there's an upper bound. Full implementation of external memory for GPU is unlikely since copying data from external memory to host memory to device memory is just too expensive (at least 100x more than actual computation).
Hi guys,
I have a couple of feature requests in the same ball park:
In a nutshell, I am working with a very large data set currently, and despite all my effort into minimising the amount of memory used (I.e., ensuring each feature uses no more data than it needs by specifying data types, using gradient based subsampling), it still cannot fit into memory on multiple GPUs.
I am on a cloud provider, so of course I could distribute this data to many machines - however, by my reckoning to fit all data in memory I would need thousands of GPUs. At this scale, this often involves raising many requests internally in order to get approval to raise quota limits on the cloud. On top of this, I don't need to train models particularly fast - typically they will be trained at most on a weekly interval. So why GPUs? Because I think they will be fundamentally cheaper than CPU - a single V100 took only half an hour longer than 72 Xeon chips, despite being half the cost.
Therefore, I think a nice feature would be if I could load data from libsvm format across many GPUs in a cluster, or alternatively, if there was a method to lazily apply XGBoost to Dask partitions rather than forcing Dask to persist in memory.
The text was updated successfully, but these errors were encountered: