GPU tools

vdeo edited this page Oct 22, 2021 · 13 revisions

This page is frequently updated and tested for recent kernel and driver versions. Current instructions were updated and tested on May 29, 2020 for system configuration:

Ubuntu 20.04 (Server)
Linux 5.6.10-rt5
nvidia 440.82

Steps and commands may vary for other configurations.

Need Help ? Post questions on the RTC config chat room.

1. NVIDIA driver

1.1. Blacklist Nvidia nouveau driver

echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nvidia-nouveau.conf
echo "options nouveau modeset=0" | sudo tee -a /etc/modprobe.d/blacklist-nvidia-nouveau.conf

Confirm the content of the new modprobe config file:

cat /etc/modprobe.d/blacklist-nvidia-nouveau.conf
	blacklist nouveau
	options nouveau modeset=0

Regenerate initramfs:

sudo update-initramfs -u

Then reboot.

1.2. Install driver

cd $HOME
mkdir -p soft && cd soft
chmod +x
./ -x
cd /home/scexao/soft/NVIDIA-Linux-x86_64-440.82/
sudo IGNORE_PREEMPT_RT_PRESENCE=1 ./nvidia-installer

If prompted (depends on version), answer as follows:

  • "Register kernel module" -> NO
  • "Unable to find 32bit install" -> OK
  • "Install 32 bit compatibility" -> NO
  • "An incomplete installation of libglvnd was found. All of the essential libglvnd libraries are present, but one or more optional components are missing. Do you want to install a full copy of libglvnd? This will overwrite any existing libglvnd libraries." -> Install and Overwrite
  • "Would you like to run the nvidia-xconfig utility to automatically update your X configuration file so that the NVIDIA X driver will be used when you restart X? Any pre-existing X configuration file will be backed up." -> NO


2.1. Install CUDA toolkit

Check which version to use on the nvidia CUDA website, and follow instructions for runtime(local) installer:

cd ${HOME}/soft
chmod +x

Run the installer:

sudo ./

Note CUDA may refuse to install - it usually requires a gcc a few versions back of the system's. See 2.1.1 below.

  • Accept EULA
  • Unselect nvidia driver (already installed)
  • Select install

Note: these lines have been added to profile file:

export PATH=$PATH:/usr/local/cuda-10.1/bin
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/cuda-10.1/lib64

Note: To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.q/bin

2.1.1 Installing alternatives

In case the system's gcc is too recent. This is the case on Ubuntu 20.04 (gcc suite v9, with CUDA compiling with gcc suite up to 8.). Change the 9s in what's below with the system versions, and the 8s with the CUDA-required version.

sudo apt install gcc-$CUDA_WANTS g++-$CUDA_WANTS gfortran-$CUDA_WANTS gfortran
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-$SYSTEM $SYSTEM
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-$SYSTEM $SYSTEM
sudo update-alternatives --install /usr/bin/gfortran f95 /usr/bin/gfortran-$SYSTEM $SYSTEM
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-$CUDA_WANTS $CUDA_WANTS
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-$CUDA_WANTS $CUDA_WANTS
sudo update-alternatives --install /usr/bin/gfortran gfortran /usr/bin/gfortran-$CUDA_WANTS $CUDA_WANTS

Switch the system to $CUDA_WANTS version:

sudo update-alternatives --set g++ /usr/bin/g++-$CUDA_WANTS
sudo update-alternatives --set gcc /usr/bin/gcc-$CUDA_WANTS
sudo update-alternatives --set gfortran /usr/bin/gfortran-$CUDA_WANTS

After installing CUDA, revert:

sudo update-alternatives --auto g++
sudo update-alternatives --auto gcc
sudo update-alternatives --auto gfortran

2.2. Test

To run a given CUDA sample test:

cd /usr/local/cuda/samples/**X_XXX**/**TestName**
sudo make

We recommend to run the following to check everything is fine:

  • 1_Utilities/deviceQuery
  • 1_Utilities/bandwidthTest
  • 1_Utilities/p2pBandwidthLatencyTest
  • 0_Simple/matrixMulCUBLAS
  • 0_Simple/simpleMultiGPU

Output for 0_Simple/matrixMulCUBLAS:

[Matrix Multiply CUBLAS] - Starting...
GPU Device 0: "GeForce RTX 2080 Ti" with compute capability 7.5

GPU Device 0: "GeForce RTX 2080 Ti" with compute capability 7.5

MatrixA(640,480), MatrixB(480,320), MatrixC(640,320)
Computing result using CUBLAS...done.
Performance= 5042.82 GFlop/s, Time= 0.039 msec, Size= 196608000 Ops
Computing result using host CPU...done.
Comparing CUBLAS Matrix Multiply with CPU results: PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

NOTE: To run graphics examples:

sudo apt-get install libglu1-mesa libxi-dev libxmu-dev libglu1-mesa-dev freeglut3 freeglut3-dev


3.1. Installation

Alternative 1: Install OpenBLAS, LAPACK, and gfortran:

sudo apt-get install libopenblas-base libopenblas-dev
sudo apt-get install liblapack-dev
sudo apt install gfortran

Select the compute capability for GPU(s) installed on your system. See table below for examples. The deviceQuery CUDA sample will indicate those values for the GPUs in the system.

GPU Compute capability Architecture
RTX A6000 8.6 (Req. CUDA 11+) Ampere
RTX3080Ti 8.6 Ampere
RTX2080Ti 7.5 Turing
GTX1080Ti 6.1 Pascal
GTX980Ti 5.2 Maxwell

Install magma:

cd ${HOME}/soft
gunzip magma-2.5.3.tar.gz
tar -xvf magma-2.5.3.tar
cd magma-2.5.3
mkdir build
cd build
cmake -DGPU_TARGET='sm_75,sm_61,sm_52' ..
make -j $(nproc)
sudo make install

Add to .bashrc or .profile:

export LD_LIBRARY_PATH=/usr/local/magma/lib:$LD_LIBRARY_PATH
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/magma/lib/pkgconfig/

Alternative 2: Install using MKL from anaconda:

Install gfortran:

sudo apt install gfortran

setup .bashrc

export CONDA_ROOT=$HOME/miniconda3

Download and installation

bash -b -p $CONDA_ROOT
conda install -y numpy mkl-include


MAGMA is available here :

extract the tgz file and go into the new directory

wget -O - | tar xz
cd magma-2.5.1

Configure with Makefile

You have to create your own based on

sed -i -e 's:/intel64: -Wl,-rpath=$(CUDADIR)/lib64 -Wl,-rpath=$(MKLROOT)/lib:'

just compile the shared target (and test if you want)

export CUDA_ROOT=/usr/local/cuda
export NCPUS=8


  • sm_XX is compatible with the compute capability. For example, sm_60 for Tesla Tesla P100
  • NCPUS is the number of CPUs in your system

To install libraries and include files in a given prefix, run:

sudo GPU_TARGET=sm_XX MKLROOT=$MKLROOT CUDADIR=$CUDA_ROOT make install prefix=/usr/local/magma

Add to .bashrc:

export LD_LIBRARY_PATH=/usr/local/magma/lib:$LD_LIBRARY_PATH
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/magma/lib/pkgconfig/

3.2. Checking CFLAGS and LIBS

Check that inclusion of magma will not override the gnu11 C standard adopted by cacao :

pkg-config --cflags magma

should return :

-DNDEBUG -DADD_ -fopenmp -I/usr/local/magma/include -I/usr/local/cuda/include

If CFLAGS includes "-std=c99", edit the magma.pc file to remove it.

You may also need to tweak LIBS:

pkg-config --libs magma

should return :

-L/usr/local/magma/lib -L/usr/local/cuda/lib64 -lmagma_sparse -lmagma -fopenmp -lopenblas -lcublas -lcusparse -lcudart -lcudadevrt

Note: Tweaks to the magma.pc will most likely be required if using older ubuntu distributions (16.04). Installation on ubuntu 18.04 and 19.04 will likely produce the desired magma.pc output.

4. Tensorflow (optional)

4.1. Anaconda

Get the link to the latest installer bash script from

cd ${HOME}/soft
# Accept the default install directory, and say ok to running conda init
# Restart terminal
# By default, anaconda puts the environment name in the shell prompt. To disable this, do
conda config --set changeps1 False

4.2. Tensorflow

Set up for tensorflow. now super easy thanks to conda!

# Make sure CUDA and NVIDIA drivers are already installed
# Make a new conda environment containing Tensorflow for GPU
conda create --name tf_gpu tensorflow-gpu

Activate the new environment:

conda activate tf_gpu

Note - to activate it and go back to the base environment, do

 conda deactivate

While in the environment, install other useful packages, e.g.

conda install astropy matplotlib keras ipython

Test the installation - within python, do:

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

and check it sees all the devices.

