HSF-India HEP Software Workshop at NISER

Introduction to GPU programming

Set-up instructions

Start by launching the following notebook

https://binderhub.ssl-hep.org/v2/gh/research-software-collaborations/courses-hsf-india-december2023/gpugpu_true

Getting familiar with the GPU

Lets check how many / what type of GPUs are available in the system. To do this, simply run :

nvidia-smi

Lets try to answer some questions :

How many GPUs does the system have?
What type of GPUs does the system have?
What is the GPUs global memory?

Exercise 1: "Hello world" with CUDA

During the lecture we saw a "Hello World" CUDA kernel. Lets try and run it ourselves! To compile and run the cuda_hello.cu CUDA script simply do :

nvcc cuda_hello.cu -o cuda_hello -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_80,code=sm_80
./cuda_hello

Lets try and answer some questions :

What do you observe?
Why is this happening?
What can we do to fix this?

If you get stuck you can take a look at the cuda_hello.cu script in the solutions directory.

Exercise 2: "Calculating a dot product" with CUDA

Goal of this excersise is to write a CUDA kernel that calculates the dot product of two vectors on the GPU.

Start by tacking a look at the file dot_product.cu.
Find and properly update all parts of the script denoted with FIXME.

To compile and run the CUDA script you can do :

nvcc dot_product.cu -o dot_product -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_80,code=sm_80
./dot_product

Lets try and answer some questions :

When you dont make use of the CUDA specific atomicAdd() function, what do you observe? Is the result correct? Do you get consistent results if you run the kernel multiple times?
Why is this happening?

If you get stuck you can take a look at the dot_product.cu script in the solutions directory.

Exercise 3 : Matrix multiplication on the GPU

Goal of this excersise is to write a CUDA kernel that performs a 2-dimensional square matrix multiplication on the GPU.

Start by taking a look at the file matrix_multiplication.cu.
We can respresent the 2-D matrix in 1-D as shown in the image below. This will make copying the matrix from the host to device and from device to host easier. The size of each matrix is DSIZE*DSIZE.
Find and properly update all parts of the script denoted with FIXME.

To compile and run the CUDA script you can do :

nvcc matrix_multiplication.cu -o matrix_multiplication -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_80,code=sm_80
./matrix_multiplication

Lets try and answer some questions :

Compare the time it takes to perform the matrix multiplication on the CPU and the GPU.
- What do you observe? (Keep in mind that you are running on shared resources. You can try invoking the program a few times and then select the best run.)
Try changing the size of the matrix by mutiplying DSIZE by 2,4, and 8.
- What do you observe now? How does the CPU and GPU time scale?

If you get stuck you can take a look at the matrix_multiplication.cu script in the solutions directory.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
solutions		solutions
README.md		README.md
cuda_hello.cu		cuda_hello.cu
deviceInfo.cu		deviceInfo.cu
dot_product.cu		dot_product.cu
linearized_matrix.png		linearized_matrix.png
matrix_multiplication.cu		matrix_multiplication.cu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HSF-India HEP Software Workshop at NISER

Introduction to GPU programming

Set-up instructions

Getting familiar with the GPU

Exercise 1: "Hello world" with CUDA

Exercise 2: "Calculating a dot product" with CUDA

Exercise 3 : Matrix multiplication on the GPU

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HSF-India HEP Software Workshop at NISER

Introduction to GPU programming

Set-up instructions

Getting familiar with the GPU

Exercise 1: "Hello world" with CUDA

Exercise 2: "Calculating a dot product" with CUDA

Exercise 3 : Matrix multiplication on the GPU

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages