## Metadirective Directive

A `metadirective` directive provides a mechanism to select a directive in a `when` clause to be used, depending upon one or more contexts:   implementation, available devices and the present enclosing construct.  The directive in a `default` clause is used when a directive of the  `when` clause is not selected.

In the `when` clause the  _context selector_  (or just  _selector_ ) defines traits that are evaluated for selection of the directive that follows the selector.  This "selectable" directive is called a  _directive variant_ . Traits are grouped by  _construct_ ,  _implementation_  and   _device_   _sets_  to be used by a selector of the same name.

In the first example the architecture trait  _arch_  of the   _device_  selector set specifies that if an  _nvptx_  (NVIDIA) architecture is active in the OpenMP context, then the `teams` `loop`   _directive variant_  is selected as the directive; otherwise, the `parallel` `loop`  _directive variant_  of the `default` clause is selected as the directive. That is, if a  _device_  of  _nvptx_  architecture is supported by the implementation within the enclosing `target` construct, its  _directive variant_  is selected. The architecture names, such as  _nvptx_ , are implementation defined. Also, note that  _device_  as used in a `target` construct specifies a device number, while  _device_ , as used in the `metadirective` directive as selector set, has traits of  _kind_ ,  _isa_  and  _arch_ .

In [None]:
//%compiler: clang
//%cflags: -fopenmp

/*
* name: metadirective.1c
* type: C
* version: omp_5.0
*/

#define N 100
#include <stdio.h>

int main()
{
   int v1[N], v2[N], v3[N];
   for(int i=0; i<N; i++){ v1[i]=(i+1); v2[i]=-(i+1); }

   #pragma omp target map(to:v1,v2) map(from:v3) device(0)
   #pragma omp metadirective \
                   when(   device={arch("nvptx")}: teams loop) \
                   default(                     parallel loop)
     for (int i= 0; i< N; i++)  v3[i] = v1[i] * v2[i];

   printf(" %d  %d\n",v3[0],v3[N-1]); //output: -1  -10000

   return 0;
}



In [None]:

! name: metadirective.2f90
! type: F-free
! version: omp_5.0

program main
   integer, parameter :: N= 100
   integer ::  v1(N), v2(N), v3(N);

   do i=1,N;  v1(i)=i; v2(i)=-i;  enddo   ! initialize

   !$omp  target map(to:v1,v2) map(from:v3) device(0)
   !$omp  metadirective &
   !$omp&     when(   device={arch("nvptx")}: teams loop) &
   !$omp&     default(                     parallel loop)
     do i= 1,N; v3(i) = v1(i) * v2(i); enddo
   !$omp  end target

   print *, v3(1),v3(N) !!output: -1  -10000
end program



In the second example, the  _implementation_  selector set is specified in the `when` clause to distinguish between AMD and NVIDIA platforms.  Additionally, specific architectures are specified with the  _device_   selector set.

In the code, different `teams` constructs are employed as determined by the `metadirective` directive. The number of teams is restricted by a `num_teams` clause and a thread limit is also set by a `thread_limit` clause for   _vendor_  AMD and NVIDIA platforms and specific architecture traits.  Otherwise, just the `teams` construct is used without any clauses, as prescribed by the `default` clause.

In [None]:
//%compiler: clang
//%cflags: -fopenmp

/*
* name: metadirective.2c
* type: C
* version: omp_5.0
*/

#define N 100
#include <stdio.h>
#include <omp.h>

void work_on_chunk(int idev, int i);

int main()                    //Driver
{
   int i,idev;

   for (idev=0; idev<omp_get_num_devices(); idev++)
   {
      #pragma omp target device(idev)
      #pragma omp metadirective \
                  when( implementation={vendor(nvidia)}, device={arch("kepler")}: \
                        teams num_teams(512) thread_limit(32) )                   \
                  when( implementation={vendor(amd)},    device={arch("fiji"  )}: \
                        teams num_teams(512) thread_limit(64) )                   \
                  default(                                                        \
                        teams)
      #pragma omp distribute parallel for
      for (i=0; i<N; i++) work_on_chunk(idev,i);
   }
   return 0;
}




In [None]:

! name: metadirective.2f90
! type: F-free
! version: omp_5.0

program main                    !!Driver
  use omp_lib
  implicit none
  integer, parameter :: N=1000
  external           :: work_on_chunk
  integer            :: i,idev

  do idev=0,omp_get_num_devices()-1

    !$omp target device(idev)
    !$omp begin metadirective &
    !$omp&  when( implementation={vendor(nvidia)}, device={arch("kepler")}: &
    !$omp&        teams num_teams(512) thread_limit(32) )                   &
    !$omp&  when( implementation={vendor(amd)},    device={arch("fiji"  )}: &
    !$omp&        teams num_teams(512) thread_limit(64) )                   &
    !$omp&  default(                                                        &
    !$omp&        teams)
    !$omp distribute parallel for
    do i=1,N
       call work_on_chunk(idev,i)
    end do
    !$omp end metadirective
    !$omp end target

  end do

end program




In the third example, a  _construct_  selector set is specified in the `when` clause.   Here, a `metadirective` directive is used within a function that is also compiled as a function for a target device as directed by the `declare` `target` directive. The  _target_  directive name of the `construct` selector ensures that the `distribute` `parallel` `for/do` construct is employed for the target compilation. Otherwise, for the host-compiled version the `parallel` `for/do` `simd` construct is used.

In the first call to the  _exp_pi_diff()_  routine the context is a `target` `teams` construct and the `distribute` `parallel` `for/do` construct version of the function is invoked, while in the second call the `parallel` `for/do` `simd` construct version is used.

This case illustrates an important point for users that may want to hoist the  `target` directive out of a function that contains the usual  `target` `teams` `distribute` `parallel` `for/do` construct (for providing alternate constructs through the `metadirective` directive as here). While this combined construct can be decomposed into a `target` and `teams distribute parallel for/do` constructs, the OpenMP 5.0 specification has the restriction: "If a `teams` construct is nested within a `target` construct, that `target` construct must contain no statements, declarations or directives outside of the `teams` construct''. So, the `teams` construct must immediately follow the `target` construct without any intervening code statements (which includes function calls).   Since the `target` construct alone cannot be hoisted out of a function,  the `target` `teams` construct has been hoisted out of the function, and the  `distribute` `parallel` `for/do` construct is used as the  _variant_  directive of the `metadirective` directive within the function.

In [None]:
//%compiler: clang
//%cflags: -fopenmp

/*
* name: metadirective.3c
* type: C
* version: omp_5.0
*/
#include <stdio.h>
#include  <math.h>
#define      N 1000

#pragma omp declare target
void exp_pi_diff(double *d, double my_pi){
   #pragma omp metadirective \
               when( construct={target}: distribute parallel for  ) \
               default(                             parallel for simd)
   for(int i = 0; i<N; i++) d[i] = exp( (M_PI-my_pi)*i );
}
#pragma omp end declare target

int main()
{
  //Calculates sequence of exponentials: (M_PI-my_pi) * index
  //M_PI is from math.h, and my_pi is user provided.

  double d[N];
  double my_pi=3.14159265358979e0;

      #pragma omp target teams map(tofrom: d[0:N])
      exp_pi_diff(d,my_pi);
                                           // value should be near 1
      printf("d[N-1] = %20.14f\n",d[N-1]); // ...= 1.00000000000311

      exp_pi_diff(d,my_pi);                // value should be near 1
      printf("d[N-1] = %20.14f\n",d[N-1]); // ...= 1.00000000000311
}



In [None]:

! name: metadirective.3f90
! type: F-free
! version: omp_5.0

module params
   integer, parameter :: N=1000
   DOUBLE PRECISION, PARAMETER::M_PI=4.0d0*DATAN(1.0d0) !3.1415926535897932_8
end module


subroutine exp_pi_diff(d,    my_pi)
  use params
  implicit none
  integer          ::  i
  double precision ::  d(N), my_pi
  !$omp declare target

  !$omp   metadirective &
  !$omp&      when( construct={target}: distribute parallel do  )  &
  !$omp&      default(                             parallel do simd)

  do i = 1,size(d)
     d(i) = exp( (M_PI-my_pi)*i )
  end do

end subroutine

program main
  ! Calculates sequence of exponentials: (M_PI-my_pi) * index
  ! M_PI is from usual way, and my_pi is user provided.
  ! Fortran Standard does not provide PI

  use params
  implicit none
  double precision   :: d(N)
  double precision   :: my_pi=3.14159265358979d0

      !$omp target teams map(from: d)
      call exp_pi_diff(d,my_pi)
      !$omp end target teams
                                  ! value should be near 1
      print*, "d(N) = ",d(N)      ! 1.00000000000311

      call exp_pi_diff(d,my_pi) ! value should be near 1
      print*, "d(N) = ",d(N)      ! 1.00000000000311

end program

