## Example 1: Fibonacci Sequence

The fibonacci sequence, when written iteratively as follows, contains a dependency. Here is the relevant code:

```
int fib(int x) {
    int prevPrev = 0;
    int prev = 0;
    int curr = 1;

    # pragma acc kernels
    for (int i=0; i<x; i++) {
        prevPrev = prev;
        prev = curr;
        curr = prevPrev + prev;
    }
    return curr;
}

```

Let's see what the OpenACC Compiler says about this code:

In [None]:
!pgcc -acc -Minfo=accel data_dep_daniel.c



## Example 2: 



```
#include <openacc.h>

int main(){
    int N = 20;
    int * b = (int*) malloc(sizeof(int) * N);
    b[1] = 5;
    int * a = (int*) malloc(sizeof(int) * N);
    
    for(int i = 0; i < N; ++i) {
          
    }
}
```


In [None]:
!pgcc -acc -Minfo=accel ex.c

# Example 3:


```
int main(){
    int N = 20;
    int * I = (int*) malloc(sizeof(int) * N);
    int ** a = (int**) malloc(sizeof(int*) * N);
    for(int i = 0; i < N; ++i)
        a[i] = (int*) malloc(sizeof(int) * N);
    
    a[0][0] = 0;
    
    #pragma acc kernels
    for(int i = 0; i < N; ++i) {
        for(int j = 0; j < N; ++j) {
           a[i][j] = (i != 0) ? ((j != 0) ? a[i-1][j-1] : a[i-1][j]) : a[i][j-1]; 
           if(i == j) I[i] = j;
        }
    }
}
```

In [2]:
!pgcc -acc -Minfo=accel ex2.c

PGC-W-0155-Pointer value created from a nonlong integral type  (ex2.c: 5)
PGC-W-0155-Pointer value created from a nonlong integral type  (ex2.c: 6)
PGC-W-0155-Pointer value created from a nonlong integral type  (ex2.c: 8)
main:
     12, Generating implicit copyin(a[-1:21][-1:21])
         Generating implicit copyout(a[:20][:20])
         Generating implicit copy(I[:20])
     13, Complex loop carried dependence of a->-> prevents parallelization
         Loop carried dependence of a->-> prevents parallelization
         Loop carried backward dependence of a->-> prevents vectorization
         Complex loop carried dependence of I-> prevents parallelization
         Accelerator serial kernel generated
         Generating Tesla code
         13, #pragma acc loop seq
         14, #pragma acc loop seq
     13, Loop carried backward dependence of a->-> prevents vectorization
         Complex loop carried dependence of I-> prevents parallelization
     14, Complex loop carried dependence of a->

## Example 4: Flow Dependency


Kobe Davis
Prof. Karavanic
CS 405
1 June 2019

This is an example of a matrix operation which contains a flow dependency. Despite the flow dependency, the code may be refactored to execute in parallel. The parallel version of this program, found in the file (flowdep_refactor_par.cpp) demonstrates a solution through refactoring.


```
#include <cstdlib>
#include <ctime>

const int DIM = 30; // Height and Width

int main() {
    srand(time(NULL));

    // Declare matrix of height and width, DIM
    int data[DIM][DIM];

    // Randomly initialize elements in range (0, 1)
    for(int i = 0; i < DIM; ++i)
        for(int j = 0; j < DIM; ++j)
            data[i][j] = rand() % 2;

    // If the elements directly above and directly to
    // the left are equal, set the current element to match them
    // Otherwise do nothing
    #pragma acc parallel loop
    for(int i = 1; i < DIM; ++i)
        for(int j = 1; j < DIM; ++j)
            if(!data[i-1][j] == !data[i][j-1]) // XOR
                data[i][j] = data[i-1][j] && data[i][j-1];

    // There is a flow depency in the above code because our calculation of
    // data[i][j] relies on both the previous row and previous column. A partial
    // parallelization will not work because a dependency in both the previous
    // row and column stops us from simply parallelizing a single loop.

    return 0;
}
```

In [None]:
!pgc++ -acc -Minfo=accel flowdep_refactor_seq.cpp

## Solution to Example 4


The purpose of this program is to demonstrate a refactoring solution to a flow dependency problem. The sequential version of this program (flowdep_refactor_seq.cpp) demonstrates how a flow dependency results in a compile-time error from OpenACC.


```
#include <cstdlib>
#include <ctime>
#include <omp.h>

const int DIM = 30; // Height and Width

int main() {
    srand(time(NULL));

    // Declare matrix of height and width, DIM
    int data[DIM][DIM];

    // Randomly initialize elements in range (0, 1)
    for(int i = 0; i < DIM; ++i)
        for(int j = 0; j < DIM; ++j)
            data[i][j] = rand() % 2;

    // In this version, we will iterate over the diagonals,
    // beginning at the top left diagonal and ending on the
    // bottom right diagonal
    int size = DIM-1;
    for(int l = 1; l < size*2; ++l) {
        int diagonal = (l <= size) ? l : size - (l % size);

        #pragma acc parallel loop
        for(int k = 0; k < diagonal; ++k) {
            int i = ((l <= size) ? diagonal : size) - k;
            int j = ((l <= size) ? 0 : (size - diagonal)) + (k+1);

            if(!data[i-1][j] == !data[i][j-1]) // XOR
                data[i][j] = data[i-1][j] && data[i][j-1];
        }
    }

    // Now we have refactored our code to use indices l and k, which
    // represent the diagonals across the matrix. Since the diagonal elements
    // are not dependent on each other, this executes with no complaints.

    return 0;
}


```

In [None]:
!pgc++ -acc -Minfo=accel flowdep_refactor_par.cpp

## Example 5: 

I have two examples that are just variants of each other; one has RAW and the other WAR.


```
int A[SIZE];
int B[SIZE];
int C[SIZE];


void loopRearrange1() {
  for (int i = 1; i < SIZE; ++i) {
    A[i] =  A[i] + B[i-1];
    B[i] = B[i] + C[i];
  }
}

void loopRearrange1_opt() {
  //How can we make this parallel?
  for (int i = 1; i < SIZE; ++i) {
    A[i] =  A[i] + B[i-1];
    B[i] = B[i] + C[i];
  }
}

void loopRearrange2() {
  for (int i = 0; i < SIZE-1; ++i) {
    A[i] = A[i] + B[i+1];
    B[i] = B[i] + C[i];
  }
}

void loopRearrange2_opt() {
  //How can me make this parallel?
  for (int i = 0; i < SIZE-1; ++i) {
    A[i] = A[i] + B[i+1];
    B[i] = B[i] + C[i];
  }
}


```

In [5]:
!pgcc -acc -Minfo=accel loop_rearrange.c