# Device functions and subroutines in C++

Until now, we have focused on how to offload data or particular calculations to the GPU.
In most real software applications, however, we will want to pass data through a pipeline of multiple functions and routines before completion.
In this notebook, we will learn how to call functions to run on the device from within existing target regions.
In so doing, we will combine many of the things we have learned up to this point, and let you try porting some code to the GPU using OpenMP yourself.

In the [C.ipynb](./C.ipynb) notebook, we learn about the `declare target` clause that allows us to offload specific functions to the device.
This notebook will look at three further offloading approaches unique to C++; member functions; external functions; and virtual functions.
As such, it is recommended that you work through that previous notebook first before beginning this one.

But first, let's conduct our usual environment checks and move into the C++ direcrtory for this exercise:

In [None]:
rocm-smi
cd $HOME/DiRAC-AMD-GPU/notebooks/02-OpenMP/2g-OpenMP_device_subroutines/CXX

## C++ member functions

A member function is a function that is declared as part of a class.
Our first example will look at such a function, and how we can offload it to the device.
The example is found in the `1_member_function/0_member_function_portyourself/` directory - let's look at what's there:

In [None]:
cd $HOME/DiRAC-AMD-GPU/notebooks/02-OpenMP/2g-OpenMP_device_subroutines/CXX/1_member_function/0_member_function_portyourself
make clean
ls

The directory contains two source code files; [`bigscience.cc`](./CXX/1_member_function/0_member_function_portyourself/bigscience.cc) - where `main` is implemented - and [`Science.hh`](./CXX/1_member_function/0_member_function_portyourself/Science.hh), where the `Science` class is defined and its `compute()` **member function** is implemented.
Inspect and understand the code, then compile and run it to understand the baseline CPU behaviour:

In [None]:
make clean
make
./bigscience

Try adding the `#pragma omp target teams loop` directive to the loop in the bigscience.cc routine to port it to run on the device.
Note that because the `compute()` function is **implemented in the [`Science.hh`](./CXX/1_member_function/0_member_function_portyourself/Science.hh) header**, it will be compiled *inline*.
This means it will already be in the target region, and no further work is needed to port it to the device.

You can choose to either use unified shared memory - which was covered in [2e-OpenMP_managed_memory](./../2e-OpenMP_managed_memory/Managed_memory_in_C.ipynb) - in which case you must enable `HSA_XNACK`, or use unmanaged memory - as covered in [2d-OpenMP_explicit_memory_directive](./../2d-OpenMP_explicit_memory_directive/Explicit_memory_management_in_C.ipynb) - by adding an appropriate `map` clause to the directive.

You can use the code block above to compile and run your edited code.
After porting the code, you can compare your results with the solutions in `1_member_function/` if using unified shared memory, and `2_member_function_map/` if using unmanaged explicit memory movement.

In [None]:
cd $HOME/DiRAC-AMD-GPU/notebooks/02-OpenMP/2g-OpenMP_device_subroutines/CXX/1_member_function/1_member_function
make clean
make
export HSA_XNACK=1
./bigscience

In [None]:
cd $HOME/DiRAC-AMD-GPU/notebooks/02-OpenMP/2g-OpenMP_device_subroutines/CXX/1_member_function/2_member_function_map
make clean
make
export HSA_XNACK=0
./bigscience

## External member function

This approach works well when our member function is defined **and** implemented in the header file, since it will be compiled inline.
But what happens when we **declare** a member function in the header, but **implement** it in a separate source code file?

Our second example - in the `2_member_function_external/0_member_function_external_portyourself/` directory - will address this.
Let's look at the contents of the directory:

In [None]:
cd $HOME/DiRAC-AMD-GPU/notebooks/02-OpenMP/2g-OpenMP_device_subroutines/CXX/2_member_function_external/0_member_function_external_portyourself
make clean
ls

As with the previous example, there is the [`bigscience.cc`](CXX/2_member_function_external/0_member_function_external_portyourself/bigscience.cc) file with the `main()` definition, and [`Science.hh`](CXX/2_member_function_external/0_member_function_external_portyourself/Science.hh) with the `Science` class (and `compute()` member function) **definition**.
There is now, however, also an additional file - [`Science_member_functions.cc`](CXX/2_member_function_external/0_member_function_external_portyourself/Science_member_functions.cc).
This file contains the member function **implementation** for the `compute()` function, meaning it now an external implementation.

Only the structure of the code has changed since the previous example, so we already know what to expect when running.
Let's go ahead and try porting the code now, again choosing either unified shared memory or unmanaged memory.
Remember that in addition to the changes we made in the previous example, we will now also need to add a `#pragma omp declare target` directive to the external member function (as discussed in [C.ipynb](./C.ipynb)).

You can use the code cell below to compile and run your code as you modify it.  After porting the code, you can compare your results with the solutions.
The unified shared memory solution is provided in `1_member_function_external/`, and the solution using an unstructred memory region in `2_member_function_external_data/`.
Note that we could use a simple explicit memory movement as in the previous example.

In [None]:
make clean
make
./bigscience

In [None]:
cd $HOME/DiRAC-AMD-GPU/notebooks/02-OpenMP/2g-OpenMP_device_subroutines/CXX/2_member_function_external/1_member_function_external
make clean
make
export HSA_XNACK=1
./bigscience

In [None]:
cd $HOME/DiRAC-AMD-GPU/notebooks/02-OpenMP/2g-OpenMP_device_subroutines/CXX/2_member_function_external/2_member_function_external_data
make clean
make
export HSA_XNACK=0
./bigscience

## Virtual methods

Our final exercise extends the previous examples with the of use virtual functions. 
Polymorphism aad inheritance are intrinsic aspects of C++, and OpenMP has been written to accomodate this.
This example is contained in the `3_virtual_methods/0_virtual_methods_portyourself/` directory - let's navigate to it and take a look at what's there:

In [None]:
cd $HOME/DiRAC-AMD-GPU/notebooks/02-OpenMP/2g-OpenMP_device_subroutines/CXX/3_virtual_methods/0_virtual_methods_portyourself
make clean
ls

We now have 4 files;
 - The familiar [`bigscience.cc`](./CXX/3_virtual_methods/0_virtual_methods_portyourself/bigscience.cc) where `main` is implemented,
 - [`Science.hh`](./CXX/3_virtual_methods/0_virtual_methods_portyourself/Science.hh) where the `Science` class is defined, now containing a **virtual function declaration** for the `compute()` function,
 - A new file [`HotScience.hh`](./CXX/3_virtual_methods/0_virtual_methods_portyourself/HotScience.hh) that contains the definition of the `HotScience` class, which inherits from `Science` and defines its own instance of the `compute()` function,
 - and [`HotScience_member_functions.cc`](./CXX/3_virtual_methods/0_virtual_methods_portyourself/HotScience_member_functions.cc) where the `compute()` member function for the `HotScience` class is **implemented**.

You might notice that there has been a minor change in the `compute()` function implementation, so let's compile the code and see the results:

In [None]:
make clean
make
./bigscience

Now we can try porting the code, again choosing our prefered memory model.
To note; for virtual functions, only the implementation requries the `#pragma omp declare target` directive; no changes should be necessary in any header file.
You can use the code block above to compile and run the code as you make your modifications.

After porting the code, you can compare your results with the model solutions below. 
The unified shared memory solution is provided in `1_virtual_methods/`, and the solution using managed explicit memory manegment in `2_virtual_methods_map/`.
Note that we could also use an unstructred memory region as in the previous example.
Remember, your choice depends on if you need to reuse the memory in other parts of the code.
You can review sections [2d-OpenMP_explicit_memory_directive](./../2d-OpenMP_explicit_memory_directive/Explicit_memory_management_in_C.ipynb) and [2e-OpenMP_managed_memory](./../2e-OpenMP_managed_memory/Managed_memory_in_C.ipynb) for more insights into memory model considerations.

In [None]:
cd $HOME/DiRAC-AMD-GPU/notebooks/02-OpenMP/2g-OpenMP_device_subroutines/CXX/3_virtual_methods/1_virtual_methods
make clean
make
export HSA_XNACK=1
./bigscience

In [None]:
cd $HOME/DiRAC-AMD-GPU/notebooks/02-OpenMP/2g-OpenMP_device_subroutines/CXX/3_virtual_methods/2_virtual_methods_map
make clean
make
export HSA_XNACK=0
./bigscience

Now that you have successfully ported inline, external and virtual member functions to run on the GPU, you can experiment with making the loops larger, changing the logic in `compute()` to be more complex, and testing different memory models.

Throughout this section of the course, we've learnt the fundamentals of OpenMP, and how they can help you run performant code on an AMD GPU or APU.
You should now be equipped to start porting your own code to such architectures.

There follow two optional sections on OpenMP optimisations, and interoperability with HIP.
Feel free to go through them if they sound interesting or useful to you or your work, or continue on to the next section of this course, looking at the HIP programming model, and how dedicated kernels can be used to improve the performance of your code.