html
Download pdf version here <../../cheatsheet/cheatsheet.pdf>
- Getting alpaka: https://github.com/alpaka-group/alpaka
- Issue tracker, questions, support: https://github.com/alpaka-group/alpaka/issues
- All alpaka names are in namespace alpaka and header file alpaka/alpaka.hpp
This document assumes
- Define in-kernel thread indexing type
- Define accelerator type (CUDA, OpenMP,etc.)
-
- AcceleratorType:
- AccGpuCudaRt,
AccGpuHipRt, AccCpuSycl, AccFpgaSyclIntel, AccGpuSyclIntel, AccCpuOmp2Blocks, AccCpuOmp2Threads, AccCpuTbbBlocks, AccCpuThreads, AccCpuSerial
- Create platform and select a device by index
- Create a queue for a device
-
- Property:
- Blocking
NonBlocking
- Put a task for execution
- Wait for all operations in the queue
- Create an event
- Put an event to the queue
- Check if the event is completed
- Wait for the event (and all operations put to the same queue before it)
Memory allocation and transfers are symmetric for host and devices, both done via alpaka API
- Create a CPU device for memory allocation on the host side
- Allocate a buffer in host memory
- (Optional, affects CPU – GPU memory copies) Prepare it for asynchronous memory copies
- Create a view to host memory represented by a pointer
- Create a view to host std::vector
- Create a view to host std::array
- Get a raw pointer to a buffer or view initialization, etc.
- Allocate a buffer in device memory
- Enqueue a memory copy from host to device
- Enqueue a memory copy from device to host
- Automatically select a valid kernel launch configuration
- Manually set a kernel launch configuration
- Instantiate a kernel and create a task that will run it (does not launch it yet)
acc parameter of the kernel is provided automatically, does not need to be specified here
- Put the kernel for execution
- Define a kernel as a C++ functor
ALPAKA_FN_ACC
is required for kernels and functions called inside, acc
is mandatory first parameter, its type is the template parameter
- Access multi-dimensional indices and extents of blocks, threads, and elements
- Access components of and destructuremulti-dimensional indices and extents
- Linearize multi-dimensional vectors
- Allocate static shared memory variable
- Get dynamic shared memory pool, requires the kernel to specialize
- Synchronize threads of the same block
- Atomic operations
- Memory fences on block-, grid- or device level (guarantees LoadLoad and StoreStore ordering)
- Warp-level operations
- Math functions take acc as additional first argument
Similar for other math functions.
- Generate random numbers