In order to use the PTX backend of TornadoVM, you will need a CUDA compatible device (NVIDIA GPUs with CUDA support).
You will need to setup the
CUDA Toolkit
. If you don’t have it installed already, you can follow this guide.Make sure you follow the environment setup to add the required environment variables.
Depending on the installation, you might also have to expand your
C_INCLUDE_PATH
and LD_LIBRARY_PATH
variables to include the CUDA headers.$ export C_INCLUDE_PATH=/usr/local/cuda/include:${C_INCLUDE_PATH}
$ export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH}
To ensure that the installation has been successful, you can run the following commands: nvidia-smi
and nvcc --version
.
The output of nvidia-smi
should be similar to:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1650 Off | 00000000:01:00.0 Off | N/A |
| N/A 51C P8 1W / N/A | 73MiB / 3914MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1095 G /usr/lib/xorg/Xorg 36MiB |
| 0 1707 G /usr/lib/xorg/Xorg 36MiB |
+-----------------------------------------------------------------------------+
The output of nvcc --version
should be similar to:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Oct_23_19:24:38_PDT_2019
Cuda compilation tools, release 10.2, V10.2.89
Install TornadoVM as described here:
installation
.Build TornadoVM with the
PTX
backend selected and run tornado --devices
.The output of the TornadoVM build containing both backends (PTX and OpenCL) should look like this:
Number of Tornado drivers: 2
Total number of devices : 1
Tornado device=0:0
CUDA-PTX -- GeForce GTX 1650
Global Memory Size: 3.8 GB
Local Memory Size: 48.0 KB
Workgroup Dimensions: 3
Max WorkGroup Configuration: [1024, 1024, 64]
Device OpenCL C version: N/A
Total number of devices : 2
Tornado device=1:0
NVIDIA CUDA -- GeForce GTX 1650
Global Memory Size: 3.8 GB
Local Memory Size: 48.0 KB
Workgroup Dimensions: 3
Max WorkGroup Configuration: [1024, 1024, 64]
Device OpenCL C version: OpenCL C 1.2
Tornado device=1:1
Intel(R) OpenCL HD Graphics -- Intel(R) Gen9 HD Graphics NEO
Global Memory Size: 24.8 GB
Local Memory Size: 64.0 KB
Workgroup Dimensions: 3
Max WorkGroup Configuration: [256, 256, 256]
Device OpenCL C version: OpenCL C 2.0
Note that the first Tornado driver will always correspond to the CUDA device detected by the PTX backend.
In some cases, running nvidia-smi
might show the error NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver
. This can happen because the driver module is not loaded due to a blacklist file.
You can remove this by running: sudo rm /etc/modprobe.d/blacklist-nvidia.conf
On Ubuntu, the driver can also fail to load if it is not selected in prime-select
. In order to select it, you can run prime-select nvidia
or prime-select on-demand
.
LIBRARY_PATH
variable to the libcuda
library in order to build TornadoVM.Example:
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda/lib64/stubs
After these changes, a reboot might be required for the driver module to be loaded.
We have tested the PTX backend of TornadoVM on the following configurations:
GPU
| |
Arch
| |
PTX ISA
| Version |
Target
| |
Driver
| version |
CUDA
| version |
Status
| |
---|---|---|---|---|---|---|
RTX 3070 |
Ampere |
8.6 |
sm_86 |
510.54 |
11.8 |
OK |
RTX 2060 |
Turing |
7.5 |
sm_75 |
510.54 |
11.6 |
OK |
Quadro GP100 |
Pascal |
6.0 |
sm_60 |
384.111 |
9.0 |
Fun ctional |
GeForce GTX 1650 |
Turing |
6.5 |
sm_75 |
440.100 |
10.2 |
OK |
GeForce 930MX |
Maxwell |
6.4 |
sm_50 |
418.56 |
10.1 |
OK |
GeForce 930MX |
Maxwell |
6.5 |
sm_50 |
450.36 |
11.0 |
OK |
DISCLAIMER:
The PTX backend might fail with the Quadro GP100
, driver 384.111
, with segmentation faults for some of the unit test due to driver issues.