aurora-runtime

This package is an attempt to reproduce NVIDIA's CUDA Runtime API [1], i.e. enable the user to write device kernels and launch them in a quasi-grid structure on NEC's Aurora SX-TSUBASA vector engine.

To that end, we wrap NEC's VE Offload [2] and UDMA [3] APIs their such that the usage mimics CUDA's runtime API.

Installation and Example

The installation is as easy as a breeze! The dependencies on the target systems are:

python (>= 3.5)
cmake (>= 3.10)
reasonably new gcc/g++ (eg. from scl devtoolset-8)
NEC Aurora SDK (ncc, libs) - under /opt/nec
LLVM-VE (llvm/clang): https://sx-aurora.com/repos/veos/ef_extra under /opt/nec

For installation,

Clone this repository:

$ git clone https://github.com/dthuerck/aurora_runtime.git

Download and build dependencies:

$ cd aurora_runtime
$ chmod +x init.sh
$ ./init.sh

That's it! Now we can build an example application featuring GEMA (256x256 batched matrix addition) and GEMM (256x256 batched matrix multiplication):

$ mkdir build && cd build
$ cmake ..
$ make

Finally, run the example with ./app-test and watch your Aurora hard at work!

Using the runtime

The runtime API functions are listed in .runtime/include/aurora_runtime.h, their usage is demonstrated in the example (see app-test.cc).

The runtime centers around the concept of a (virtual) processing group; basically, we write kernels and each kernel is then executed in a batch of size n via offload and OpenMP. Roughly speaking (for people familiar with CUDA), each processing group is a block and the batch corresponds to a grid of size n. The runtime offers the following variables that are set in kernel functions:

__pg__ix: the index of the processing group (index in the batch)
__num_pgs: the batch size / number of processing groups
__pe__ix / __pg_size: reserved for future use

Lastly, the most important part: kernels are conventional C-functions with the annotation ve_kernel and saved with a .cve extension.

The build process is fully automated and supported by CMake. For details, please refer to CMakeLists.txt.

Creating a new project

Ideally, use this repository as a scaffolding:

Clone this repository and run the init.sh.
Replace gema.cve, gemm.cve by your kernels.
Replace app-test.cc by your application's source.
Change the CMakeLists.txt accordingly.

That's it!

Standing on the shoulder of giants...

This project uses the following packages:

VE Offload [1]
VE UDMA [2]
NEC's LLVM
pycparse

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.runtime		.runtime
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
app-test.cc		app-test.cc
gema.cve		gema.cve
gemm.cve		gemm.cve
init.sh		init.sh
timer.cc		timer.cc
timer.h		timer.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.runtime

.runtime

.gitignore

.gitignore

CMakeLists.txt

CMakeLists.txt

LICENSE

LICENSE

README.md

README.md

app-test.cc

app-test.cc

gema.cve

gema.cve

gemm.cve

gemm.cve

init.sh

init.sh

timer.cc

timer.cc

timer.h

timer.h

Repository files navigation

aurora-runtime

Installation and Example

Using the runtime

Creating a new project

Standing on the shoulder of giants...

References

About

Releases

Packages

Contributors 2

Languages

License

dthuerck/aurora-runtime

Folders and files

Latest commit

History

Repository files navigation

aurora-runtime

Installation and Example

Using the runtime

Creating a new project

Standing on the shoulder of giants...

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages