# OpenMP and HIP Interoperability

In this notebook, we will see how we can write an application that makes use of both OpenMP pragmas and HIP kernels.
This type of strategic use of multiple acceleration paradigms can make our code flexible and performant in ways that would not be possible with just one.

Let's begin in the traditional way; by checking that we have an appropraite GPU on the system, loading into the relevant working directory and cleaning our environment.

In [None]:
rocm-smi
cd $HOME/DiRAC-AMD-GPU/notebooks/02-OpenMP/2i-OpenMP_HIP_interportability/Fortran
make clean

## Why use HIP and OpenMP?

As we've learned throughout these notebooks, OpenMP is a flexible and portable pragma-based API for shared-memory parallelism.
It allows relatively straightforward parallelism to be implemented across different architectures, but lacks the level of direct control that other paradigms offer.

HIP, on the other hand, offers a tighter control at the cost of OpenMP's ease and flexibility.
The details of HIP implementation will not be discussed in this notebook - rather, that will be saved for the next section of this course.
For now, it is enough to know that it is a low-level language that makes use of kernels to offer a more direct control of the offloaded compute.

Both of these approaches have their upsides and their drawbacks, so you might wonder if there was a way to incorporate both into code, to draw on their relative strengths.
And you would indeed be correct - there is!
Code making use of OpenMP pragmas can call HIP kernels, and HIP applications can call OpenMP kernels.

Using this knowledge we can, for example, write our simple loops using OpenMP pragmas, but write our complicated offloaded routines where we require fine-grained control to optimise their performance as HIP kernels.
By using both in the same application, we really can have the best of both worlds.

Unfortunately, HIP does not have a working implementation in Fortran.
Thanks to Fortran's interoperability with C, however, we can still gain its benefits in our code.
By writing our HIP kernel in C, we can call it as usual from within our Fortran program.

### An example implementation

We have an example of an OpenMP-enabled Fortran code calling a HIP kernel available here.
Let's look at the files in our current directory:

In [None]:
ls

[`main.F90`](./Fortran/main.F90) is the main Fortran part of the code.
In it, we allocate 2 arrays, `x` and `y`, enter a `target data` region that transfers them to the device, then carry out operations on these arrays on the host.
These changes are updated to the device with the `update` pragma, and then the external function `daxpy_hip` is called.
We leave the `target data` region, carry out a final computation on the host, and exit the program.
Note that the `target data` and `update` pragmas are unnecessary on a shared-memory enabled device such as an APU.

[`daxpy_kernel.cpp`](./Fortran/daxpy_kernel.cpp) then provides the C implementation of the daxpy HIP kernel called from this file.

[`hip_interface.F90`](./Fortran/hip_interface.F90) provides the necessary interface information for Fortran to be able to call this function.

[`Makefile`](./Fortran/Makefile) can then be used to build the application.
Let's take a look at the [`Makefile`](./Fortran/Makefile) to fully understand how we compile all the necessary parts of this code:

In [None]:
cat Makefile

We can see that the default build rule calls a number of build instructions:
 - [`daxpy_kernel.cpp`](./Fortran/daxpy_kernel.cpp) is built using the `hipcc` compiler, in order to import the HIP runtime information,
 - [`hip_interface.F90`](./Fortran/hip_interface.F90) is built with the AMD Fortran llvm-based compiler `amdflang` to allow the import of the daxpy kernel,
 - [`main.F90`](./Fortran/main.F90) is built with the the same Fortran compiler as the interface code, but with the OpenMP flags enabled,
 - The executable is then called by linking these objects together.

Let's try compiling and running it now:

In [None]:
make daxpy
./daxpy

If all has worked well, the code will state which machine it was compiled for - the host or the device - and verify that the results were reported correctly.
Congratulations!
You've now successfully run code with multiple forms of acceleration enabled.

This concludes the tutorial's optional lessons on OpenMP.

In the next set of notebooks, we will dive in to HIP proper; examining its functionality; discovering the strengths of its programming model; and learning how we can implement it into our code.  As the HIP support for Fortran is still in development, these notebooks will be exclusively in C/C++.