Skip to content

Latest commit

 

History

History
302 lines (201 loc) · 7.34 KB

cheatsheet.rst

File metadata and controls

302 lines (201 loc) · 7.34 KB

Cheatsheet

html

Download pdf version here <../../cheatsheet/cheatsheet.pdf>

General

Accelerator, Platform and Device

Define in-kernel thread indexing type
Define accelerator type (CUDA, OpenMP,etc.)
AcceleratorType:
AccGpuCudaRt,

AccGpuHipRt, AccCpuSycl, AccFpgaSyclIntel, AccGpuSyclIntel, AccCpuOmp2Blocks, AccCpuOmp2Threads, AccCpuTbbBlocks, AccCpuThreads, AccCpuSerial

Create platform and select a device by index

Queue and Events

Create a queue for a device
Property:
Blocking

NonBlocking

Put a task for execution
Wait for all operations in the queue
Create an event
Put an event to the queue
Check if the event is completed
Wait for the event (and all operations put to the same queue before it)

Memory

Memory allocation and transfers are symmetric for host and devices, both done via alpaka API

Create a CPU device for memory allocation on the host side
Allocate a buffer in host memory
(Optional, affects CPU – GPU memory copies) Prepare it for asynchronous memory copies
Create a view to host memory represented by a pointer
Create a view to host std::vector
Create a view to host std::array
Get a raw pointer to a buffer or view initialization, etc.
Allocate a buffer in device memory
Enqueue a memory copy from host to device
Enqueue a memory copy from device to host

Kernel Execution

Automatically select a valid kernel launch configuration
Manually set a kernel launch configuration
Instantiate a kernel and create a task that will run it (does not launch it yet)

acc parameter of the kernel is provided automatically, does not need to be specified here

Put the kernel for execution

Kernel Implementation

Define a kernel as a C++ functor

ALPAKA_FN_ACC is required for kernels and functions called inside, acc is mandatory first parameter, its type is the template parameter

Access multi-dimensional indices and extents of blocks, threads, and elements
Access components of and destructuremulti-dimensional indices and extents
Linearize multi-dimensional vectors
Allocate static shared memory variable
Get dynamic shared memory pool, requires the kernel to specialize
Synchronize threads of the same block
Atomic operations
Memory fences on block-, grid- or device level (guarantees LoadLoad and StoreStore ordering)
Warp-level operations
Math functions take acc as additional first argument

Similar for other math functions.

Generate random numbers