# OpenACC

## Level 0: Hello World

We start with a serial CPU code printing a range of numbers:

```cpp
for (size_t i0 = 0; i0 < nx; ++i0) {
    printf("%ld\n", i0);
}
```

The full example is available in [print-numbers-base.cpp](../src/print-numbers/print-numbers-base.cpp), and can be compiled and executed using the following cells.

In [None]:
!g++ -O3 -march=native -std=c++17 -o ../build/print-numbers/print-numbers-base ../src/print-numbers/print-numbers-base.cpp

In [None]:
!../build/print-numbers/print-numbers-base

OpenACC offers a similar approach to OpenMP target offloading, using `parallel` (to spawn threads) and `loop` (to distribute work).

Whether execution occurs on the CPU or GPU is determined by compiler arguments.

If not specified, the compiler chooses the number of gangs, workers, and vector size.

```cpp
#pragma acc parallel loop
for (size_t i0 = 0; i0 < nx; ++i0) {
    printf("%ld\n", i0);
}   // implicit synchronization
```

As with OpenMP, this code instructs the compiler to parallelize the loop - regardless of whether it is safe (e.g., in the presence of race conditions due to inter-iteration dependencies).
Alternatively, `kernels` can be used to give the compiler more control.
In this case, the compiler will:
* Analyze dependencies and only parallelize loops without dependencies
* Apply loop and kernel transformations, including fusion

```cpp
#pragma acc kernels
{
    for (size_t i0 = 0; i0 < nx; ++i0) {
        printf("%ld\n", i0);
    }
    /* potentially more work */
}
```

The complete example code is available in [print-numbers-openacc.cpp](../src/print-numbers/print-numbers-openacc.cpp).
Build and execute them using the following cells.

In [None]:
!nvc++ -O3 -std=c++17 -acc=gpu -target=gpu -o ../build/print-numbers/print-numbers-openacc ../src/print-numbers/print-numbers-openacc.cpp

In [None]:
!../build/print-numbers/print-numbers-openacc

## Level 1: Adding Managed Memory

Our next application is increasing all elements of an array by one.

[increase-base.cpp](../src/increase/increase-base.cpp) shows a serial CPU-only implementation.
Its key part and our entry point is the increase function.

```cpp
void increase(double* data, size_t nx) {
    for (size_t i0 = 0; i0 < nx; ++i0) {
        data[i0] += 1;
    }
}
```

As with OpenMP target offloading, the parallelization approach follows the previously discussed concepts.
Support for managed memory is enabled by adding the `-gpu=mem:managed` compiler switch.

The complete example code can be found in [increase-openacc-mm.cpp](../src/increase/increase-openacc-mm.cpp), and be built and executed with the following cells.

In [None]:
!nvc++ -O3 -std=c++17 -acc=gpu -target=gpu -gpu=mem:managed -o ../build/increase/increase-openacc-mm ../src/increase/increase-openacc-mm.cpp

In [None]:
!../build/increase/increase-openacc-mm

## Level 2: Switching to Explicit Memory Management

OpenACC follows the same principles as OpenMP for explicit data management, but does so with a slightly different syntax.

```cpp
#pragma acc enter data copyin (field[0:nx])
#pragma acc exit  data copyout(field[0:nx])
```

The complete example code is available in [increase-openacc-expl.cpp](../src/increase/increase-openacc-expl.cpp).
Build and execute it using the following cells.

In [None]:
!nvc++ -O3 -std=c++17 -acc=gpu -target=gpu -o ../build/increase/increase-openacc-expl ../src/increase/increase-openacc-expl.cpp

In [None]:
!../build/increase/increase-openacc-expl