This is a small collection of toy HPC problems, mainly focused as both self-exercise and practicing the use of high performance numerical libraries such as BLAS, LAPACK, and NVIDIA Performance Libraries (NVPL).
The source code is typed with experimentation in mind, thus the implementations will not be 100% correct but follows how the official NVPL sample code (https://github.com/NVIDIA/NVPLSamples) was created.
This repository does not accept any pull requests. Any requests would be ignored.
Currently, this repository assumes that the user owns or have access to:
- Nvidia Grace Superchip w/ 240GB memory, and
nvhpc/25.5module (fornvccompiler).
The code does not have any other dependencies besides NVPL.
How to use:
-
Clone the repository:
https://github.com/accable/hpc_toy.git
-
Navigate to the project directory:
cd hpc_toy cd src
-
To compile the .c files, first load the
nvhpcmodule:module load nvhpc/25.5
Then compile the files using the
nvccompiler:nvc 1_1_heat_diffusion_dynmem.c -o foo
If BLAS/LAPACK is required, simply add
-lblasor-llapackon the compiler arguments:nvc 1_2_heat_diffusion_dynmem_lapack.c -o foolapack -llapack
Since some of the files require nvplTENSOR, add
-lnvpl_tensoron the compiler arguments:nvc 2_0_multi_head_attention_cutensor.c -o footensor -lnvpl_tensor
Some of the files also require specific way of compiling, add
-Mnvplon the compiler arguments:nvc 2_1_multi_head_attention_blas.c -o fooblas -Mnvpl=blas
-
Run the application:
./foo