! hello world
Setting the following environment variables is highly recommended to make life easier.
$ source <install_path>/setup.source # default install_path would be $HOME/.iris
The “Hello World” program is the first step towards learning IRIS. This program displays the message “HELLO WORLD” on the screen.
$ cd iris/apps/helloworld
$ make
$ ./helloworld
HELLO WORLD
$
tab1
../../../apps/helloworld/helloworld.c
tab2
../../../apps/helloworld/helloworld.cpp
tab1
../../../apps/helloworld/kernel.cu
tab2
../../../apps/helloworld/kernel.hip.cpp
tab3
../../../apps/helloworld/kernel.cl
tab4
../../../apps/helloworld/kernel.openmp.h
! saxpy
SAXPY stands for "Single-precision A * X Plus Y". It is a combination of scalar multiplication and vector addition.
$ cd iris/apps/saxpy
$ make
$ ./saxpy-c
X [ 0. 1. 2. 3. 4. 5. 6. 7.]
Y [ 0. 1. 2. 3. 4. 5. 6. 7.]
S = 10.000000 * X + Y [ 0. 11. 22. 33. 44. 55. 66. 77.]
$
tab1
../../../apps/saxpy/saxpy.c
tab2
../../../apps/saxpy/saxpy.cpp
tab3
../../../apps/saxpy/saxpy.f90
tab4
../../../apps/saxpy/saxpy.py
tab1
../../../apps/saxpy/kernel.cu
tab2
../../../apps/saxpy/kernel.hip.cpp
tab3
../../../apps/saxpy/kernel.cl
tab4
../../../apps/saxpy/kernel.openmp.h
! data memory
One of the major benefits of using IRIS is its "data memory" feature, which automatically manage data movement independent of the device scheduling. Here is an example of the use of data memory during a vector addition code. Note how the:
call differs from the SAXPY example above. We no longer need iris_task_h2d_full
and iris_task_d2h_full
calls, instead, we only need to know when to flush the final memory transfer required by the host. This is a simpler work-flow that the conventional explicit memory movement approach.
$ cd iris/apps/vecadd
$ make
$ ./vecadd-iris
tab1
../../../apps/vecadd/vecadd-iris.cpp
tab1
../../../apps/vecadd/kernel.cu
tab2
../../../apps/vecadd/kernel.hip.cpp
tab3
../../../apps/vecadd/kernel.cl
tab4
../../../apps/vecadd/kernel.openmp.h
! device selection
IRIS opportunistically attempts to use all available devices and backends, it resolves task names to function names in the corresponding kernel binaries. It allows device selection to be set both at compile and at runtime.
The user can submit the device target(s) for when the task is submitted:
This task submission includes information about the task, such as a hint, target device parameter, synchronization mode (blocking or non-blocking), and policy selector that indicates where the task should be executed. The device
is the device submission policy. The complete list of available targets are:
Device Policy | About |
---|---|
iris_cpu | Submit the task to a CPU device |
iris_gpu | Submit the task to any GPU device |
iris_fpga | Submit the task to any FPGA (currently Intel and Xilinx) |
iris_dsp | Submit the task to any DSP device (currently Hexagon) |
iris_nvidia | Submit the task to an NVIDIA GPU device |
iris_amd | Submit the task to an AMD GPU device |
iris_gpu_intel | Submit the task to an Intel GPU device |
iris_phi | Submit the task to an Intel Xeon Phi device |
We can also submit tasks according to a scheduling policy:
Scheduling Policy | About |
---|---|
iris_default | Use the first device |
iris_roundrobin | Submit this task in a round-robin (cyclic) way, for equal work sharing |
iris_depend | Submit this task to a device that has been assigned its dependent |
iris_data | Submit task to device to minimize data movement |
iris_profile | Submit the task to the device based on execution time history |
iris_random | Randomly assign this task to any of the available devices |
iris_pending | Delay submitting the task until the memory it depends on has been assigned, then use that device |
iris_any | Submit task to the device with the fewest assigned tasks |
iris_all | Submit the task to all device queues, the device that accesses it first has exclusive execution (it is removed from the other device queues) |
iris_custom | Submit the task based on a used provided, custom policy |
The opt
parameter is for iris_custom
policies.
We can also filter out devices at runtime by setting the IRIS_ARCHS
environment variable. Modifying the selection of backends to instantiate allows dynamic device targets---without requiring recompilation. All current options are:hip
,cuda
,opencl
, and openmp
. An example of only allowing execution on openmp
and hip
devices would then be:
$ export IRIS_ARCHS=hip,openmp
$ ./helloworld
HELLO WORLD
$