@@ -138,6 +138,8 @@ the GPU. Following is a snippet from :ref:`sphx_glr_python_examples_external_mem
138138
139139 # It's important to use RMM for GPU-based external memory to improve performance.
140140 # If XGBoost is not built with RMM support, a warning will be raised.
141+ # We use the pool memory resource here, you can also try the `ArenaMemoryResource` for
142+ # improved memory fragmentation handling.
141143 mr = rmm.mr.PoolMemoryResource(rmm.mr.CudaAsyncMemoryResource())
142144 rmm.mr.set_current_device_resource(mr)
143145 # Set the allocator for cupy as well.
@@ -278,13 +280,15 @@ determines the time it takes to run inference, even if a C2C link is available.
278280 Xy_valid = xgboost.ExtMemQuantileDMatrix(it_valid, max_bin = n_bins, ref = Xy_train)
279281
280282 In addition, since the GPU implementation relies on asynchronous memory pool, which is
281- subject to memory fragmentation even if the ``CudaAsyncMemoryResource `` is used. You might
282- want to start the training with a fresh pool instead of starting training right after the
283- ETL process. If you run into out-of-memory errors and you are convinced that the pool is
284- not full yet (pool memory usage can be profiled with ``nsight-system ``), consider tuning
285- the RMM memory resource like using ``rmm.mr.CudaAsyncMemoryResource `` in conjunction with
286- ``rmm.mr.BinningMemoryResource(mr, 21, 25) `` instead of the
287- ``rmm.mr.PoolMemoryResource(mr) `` shown in the example.
283+ subject to memory fragmentation even if the :py:class: `~rmm.mr.CudaAsyncMemoryResource ` is
284+ used. You might want to start the training with a fresh pool instead of starting training
285+ right after the ETL process. If you run into out-of-memory errors and you are convinced
286+ that the pool is not full yet (pool memory usage can be profiled with ``nsight-system ``),
287+ consider tuning the RMM memory resource like using
288+ :py:class: `~rmm.mr.CudaAsyncMemoryResource ` in conjunction with
289+ :py:class: `BinningMemoryResource(mr, 21, 25) <rmm.mr.BinningMemoryResource> ` instead of
290+ the :py:class: `~rmm.mr.PoolMemoryResource `. Alternately, the
291+ :py:class: `~rmm.mr.ArenaMemoryResource ` is also an excellent option.
288292
289293During CPU benchmarking, we used an NVMe connected to a PCIe-4 slot. Other types of
290294storage can be too slow for practical usage. However, your system will likely perform some
0 commit comments