 
 ###  `simd`  and  `declare`   `simd`  Constructs 

 The following example illustrates the basic use of the  `simd`  construct  to assure the compiler that the loop can be vectorized. 

In [None]:
%load ../sources/Example_SIMD.1.c 

In [None]:
%load ../sources/Example_SIMD.1.f90 

 When a function can be inlined within a loop the compiler has an opportunity to  vectorize the loop. By guaranteeing SIMD behavior of a function's operations,  characterizing the arguments of the function and privatizing temporary  variables of the loop, the compiler can often create faster, vector code for  the loop. In the examples below the  `declare`   `simd`  construct is  used on the  _add1_  and  _add2_  functions to enable creation of their  corresponding SIMD function versions for execution within the associated SIMD  loop. The functions characterize two different approaches of accessing data  within the function: by a single variable and as an element in a data array,  respectively. The  _add3_  C function uses dereferencing. 

 The  `declare`   `simd`  constructs also illustrate the use of   `uniform`  and  `linear`  clauses.  The  `uniform(fact)`  clause  indicates that the variable  _fact_  is invariant across the SIMD lanes. In  the  _add2_  function  _a_  and  _b_  are included in the  `unform`   list because the C pointer and the Fortran array references are constant.  The   _i_  index used in the  _add2_  function is included in a  `linear`   clause with a constant-linear-step of 1, to guarantee a unity increment of the  associated loop. In the  `declare`   `simd`  construct for the  _add3_   C function the   `linear(a,b:1)`  clause instructs the compiler to generate  unit-stride loads across the SIMD lanes; otherwise,  costly emph{gather}  instructions would be generated for the unknown sequence of access of the  pointer dereferences. 

 In the  `simd`  constructs for the loops the  `private(tmp)`  clause is  necessary to assure that the each vector operation has its own  _tmp_   variable. 

In [None]:
%load ../sources/Example_SIMD.2.c 

In [None]:
%load ../sources/Example_SIMD.2.f90 

 A thread that encounters a SIMD construct executes a vectorized code of the  iterations. Similar to the concerns of a worksharing loop a loop vectorized  with a SIMD construct must assure that temporary and reduction variables are  privatized and declared as reductions with clauses.  The example below  illustrates the use of  `private`  and  `reduction`  clauses in a SIMD  construct. 

In [None]:
%load ../sources/Example_SIMD.3.c 

In [None]:
%load ../sources/Example_SIMD.3.f90 

 A  `safelen(N)`  clause in a  `simd`  construct assures the compiler that  there are no loop-carried dependencies for vectors of size  _N_  or below. If  the  `safelen`  clause is not specified, then the default safelen value is  the number of loop iterations.   The  `safelen(16)`  clause in the example below guarantees that the vector  code is safe for vectors up to and including size 16.  In the loop,  _m_  can  be 16 or greater, for correct code execution.  If the value of  _m_  is less  than 16, the behavior is undefined. 

In [None]:
%load ../sources/Example_SIMD.4.c 

In [None]:
%load ../sources/Example_SIMD.4.f90 

 The following SIMD construct instructs the compiler to collapse the  _i_  and   _j_  loops into a single SIMD loop in which SIMD chunks are executed by  threads of the team. Within the workshared loop chunks of a thread, the SIMD  chunks are executed in the lanes of the vector units. 

In [None]:
%load ../sources/Example_SIMD.5.c 

In [None]:
%load ../sources/Example_SIMD.5.f90 

 


 section ###  `inbranch`  and  `notinbranch`  Clauses 

 The following examples illustrate the use of the  `declare`   `simd`   construct with the  `inbranch`  and  `notinbranch`  clauses. The   `notinbranch`  clause informs the compiler that the function  _foo_  is  never called conditionally in the SIMD loop of the function  _myaddint_ . On  the other hand, the  `inbranch`  clause for the function goo indicates that  the function is always called conditionally in the SIMD loop inside  the function  _myaddfloat_ . 

In [None]:
%load ../sources/Example_SIMD.6.c 

In [None]:
%load ../sources/Example_SIMD.6.f90 

 In the code below, the function  _fib()_  is called in the main program and  also recursively called in the function  _fib()_  within an  `if`   condition. The compiler creates a masked vector version and a non-masked vector  version for the function  _fib()_  while retaining the original scalar  version of the  _fib()_  function. 

In [None]:
%load ../sources/Example_SIMD.7.c 

In [None]:
%load ../sources/Example_SIMD.7.f90 

 


 section ### Loop-Carried Lexical Forward Dependence 

  The following example tests the restriction on an SIMD loop with the loop-carried lexical forward-dependence. This dependence must be preserved for the correct execution of SIMD loops. 

 A loop can be vectorized even though the iterations are not completely independent when it has loop-carried dependences that are forward lexical dependences, indicated in the code below by the read of  _A[j+1]_  and the write to  _A[j]_  in C/C++ code (or  _A(j+1)_  and  _A(j)_  in Fortran). That is, the read of  _A[j+1]_  (or  _A(j+1)_  in Fortran) before the write to  _A[j]_  (or  _A(j)_  in Fortran) ordering must be preserved for each iteration in  _j_  for valid SIMD code generation. 

 This test assures that the compiler preserves the loop carried lexical forward-dependence for generating a correct SIMD code. 

In [None]:
%load ../sources/Example_SIMD.8.c 

In [None]:
%load ../sources/Example_SIMD.8.f90 

---end--- 