# Using GPUs with Dask



- Authors: NCI Virtual Research Environment Team
- Keywords: Dask, GPU
- Creation Date: 2021-May
---

As part of the [RAPIDs](https://rapids.ai/) ecosystem, NVIDIA provide a way to integrate `Dask` with their `CUDA` stack for GPU operations. 

This includes `CuPy` for array computations, `CuDF` for `DataFrame` style operations and an interface with `Dask` for parallel GPU computing. 

We are going to go through a quick example of how `Dask` can be taken to the next level of performance using `GPUs`.

In [1]:
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
cluster = LocalCUDACluster()
client = Client(cluster)

ModuleNotFoundError: No module named 'dask_cuda'

Notice we use a different import above:

```python
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
cluster = LocalCUDACluster()
client = Client(cluster)

```

In [5]:
print(client)
print(client.dashboard_link)

NameError: name 'client' is not defined

We will need a few other packages.

In [2]:
import numpy as np
import dask
import dask.array as da
import cupy as cp

ModuleNotFoundError: No module named 'cupy'

We are going to make a large amount of random garbage data on the GPU.

In [8]:
rs = da.random.RandomState(RandomState=cp.random.RandomState, seed=12)  # <-- we specify cupy here

x = rs.random((100000, 1000), chunks=(10000,1000))

print(x.nbytes / 1e9) # in GBs.

 # so quick we don't need to wait, this would take an age on a CPU. 


NameError: name 'cp' is not defined

Notice that our chunks are now `CuPy` arrays. This is important and shows `Dask's` flexibility in building task graphs and chunking.

In [None]:
%%time
x = x.persist()

In [3]:
x

NameError: name 'x' is not defined

Lets do a little computation on our array, in this case some linear algebra. 

We will compute a Singular Value Decomposition (SVD). See [here](https://en.wikipedia.org/wiki/Singular_value_decomposition) for more details. 

The ins and outs of the computation are not super important, but suffice to know its not the cheapest computation around. 

In [None]:
u, s, v = da.linalg.svd(x)

In [4]:
%%time 
u.compute()

NameError: name 'u' is not defined

Now lets do the same thing on the CPU using `Dask` without GPU support. First lets close our first client.



In [None]:
rs2 = da.random.RandomState(RandomState=np.random.RandomState, seed=12)  # <-- we specify cupy here

x2 = rs2.random((100000, 1000), chunks=(10000,1000))

print(x2.nbytes / 1e9) # in GBs.

In [None]:
%%time
x2 = x2.persist()

In [None]:
x2

In [None]:
u2, s2, v2 = da.linalg.svd(x2)

In [None]:
%%time
u2.compute()

Can you compare the execution times on CPU and GPU?

## Challenge 

Can you compare the CPU vs GPU performance of a different linear algebra operation using `Dask`? Perhaps try a `lu` decomposition with `p, l, u. = da.linalg.lu()`

In [9]:
# compare linalg.lu CPU vs GPU performance

## Disclaimers

We have not gone into depth into how Dask-CUDA deals with the threads, blocks and dims of an `NVIDIA` GPU. Tuning these can give you even more performance depending on shape, size and arangement of your data. 

This is a specialist topic and is better explained by coming to one of out GPU hackathons. Ask when the next ones are at the end!

## Conclusion

You have now learned how to mak your workflows even faster using a GPU and Dask. 

**Jump over to [Notebook 6](./dask_ml_06.ipynb) now.** 