Skip to content

Latest commit

 

History

History

PWD003

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

PWD003: Missing array range in data copy to the GPU

Issue

Copying data to the GPU from an array whose size is not known to the compiler requires specifying information about the desired array data range to be copied.

Actions

Specify the array range to be copied to device memory.

Relevance

When offloading to the GPU, copying array data from the host memory to the GPU memory requires information about the data range of the array that must be copied. If the compiler knows the array size (e.g., for C static arrays or Fortran assumed shape arrays) and the whole array must be copied, specifying the data range is optional for both OpenMP and OpenACC standards. However, in the case of arrays whose size is not known to the compiler, specifying the array range is compulsory. Some compilers do not enforce this, which leads to undefined behavior. For instance, for C dynamic arrays the pointer scalar value might be copied instead of any pointed-to data; for Fortran assumed size arrays, an invalid memory access might occur or erroneous memory (i.e., from wrong memory locations) might be copied.

Code example

In the following OpenMP code, a pointer is being copied to the offloading target device instead of the dynamic array data pointed by it.

void foo(int *a, int *b, int *sum, int size) {
  #pragma omp target map(to: a, b) map(from: sum)
  #pragma omp parallel for
  for (int i = 0; i < size; i++) {
    sum[i] = a[i] + b[i];
  }
}

In this case, it suffices to specify the array bounds in the OpenMP map clauses:

void foo(int *a, int *b, int *sum, int size) {
  #pragma omp target map(to: a[0:size], b[0:size]) map(from: sum[0:size])
  #pragma omp parallel for
  for (int i = 0; i < size; i++) {
    sum[i] = a[i] + b[i];
  }
}

The same applies to the analogous OpenACC example.

void foo(int *a, int *b, int *sum, int size) {
  #pragma acc data copyin(a, b) copyout(sum)
  #pragma acc parallel loop
  for (int i = 0; i < size; i++) {
    sum[i] = a[i] + b[i];
  }
}

And again, specifying the array bounds fixes the problem:

void foo(int *a, int *b, int *sum, int size) {
  #pragma acc data copyin(a[0:size], b[0:size]) copyout(sum[0:size])
  #pragma acc parallel loop
  for (int i = 0; i < size; i++) {
    sum[i] = a[i] + b[i];
  }
}

Related resources

References