## The `collapse` Clause

In the following example, the `k` and `j` loops are associated with  the loop construct. So the iterations of the `k` and `j` loops are  collapsed into one loop with a larger iteration space, and that loop is then divided  among the threads in the current team. Since the `i` loop is not associated  with the loop construct, it is not collapsed, and the `i` loop is executed  sequentially in its entirety in every iteration of the collapsed `k` and  `j` loop.

The variable `j` can be omitted from the `private`  clause when the  `collapse` clause is used since it is implicitly private. However, if the  `collapse` clause is omitted then `j` will be shared if it is omitted  from the `private` clause. In either case, `k` is implicitly private  and could be omitted from the `private`  clause.

In [None]:
//%compiler: clang
//%cflags: -fopenmp

/*
* name: collapse.1c
* type: C
* version: omp_3.0
*/

void bar(float *a, int i, int j, int k);

int kl, ku, ks, jl, ju, js, il, iu,is;

void sub(float *a)
{
    int i, j, k;

    #pragma omp for collapse(2) private(i, k, j)
    for (k=kl; k<=ku; k+=ks)
       for (j=jl; j<=ju; j+=js)
          for (i=il; i<=iu; i+=is)
             bar(a,i,j,k);
}



In [None]:

! name: collapse.1f
! type: F-fixed
! version: omp_3.0

      subroutine sub(a)

      real a(*)
      integer kl, ku, ks, jl, ju, js, il, iu, is
      common /csub/ kl, ku, ks, jl, ju, js, il, iu, is
      integer i, j, k

!$omp do collapse(2) private(i,j,k)
       do k = kl, ku, ks
         do j = jl, ju, js
           do i = il, iu, is
             call bar(a,i,j,k)
          enddo
        enddo
      enddo
!$omp end do

      end subroutine



In the next example, the `k` and `j` loops are associated with the  loop construct. So the iterations of the `k` and `j` loops are collapsed  into one loop with a larger iteration space, and that loop is then divided among  the threads in the current team.

The sequential execution of the iterations in the `k` and `j` loops  determines the order of the iterations in the collapsed iteration space. This implies  that in the sequentially last iteration of the collapsed iteration space, `k`  will have the value `2` and `j` will have the value `3`. Since  `klast` and `jlast` are `lastprivate`, their values are assigned  by the sequentially last iteration of the collapsed `k` and `j` loop.  This example prints: `2 3`.

In [None]:
//%compiler: clang
//%cflags: -fopenmp

/*
* name: collapse.2c
* type: C
* version: omp_3.0
*/

#include <stdio.h>
void test()
{
   int j, k, jlast, klast;
   #pragma omp parallel
   {
      #pragma omp for collapse(2) lastprivate(jlast, klast)
      for (k=1; k<=2; k++)
         for (j=1; j<=3; j++)
         {
            jlast=j;
            klast=k;
         }
      #pragma omp single
      printf("%d %d\n", klast, jlast);
   }
}



In [None]:

! name: collapse.2f
! type: F-fixed
! version: omp_3.0

      program test
!$omp parallel
!$omp do private(j,k) collapse(2) lastprivate(jlast, klast)
      do k = 1,2
        do j = 1,3
          jlast=j
          klast=k
        enddo
      enddo
!$omp end do
!$omp single
      print *, klast, jlast
!$omp end single
!$omp end parallel
      end program test



The next example illustrates the interaction of the `collapse` and `ordered`   clauses.

In the example, the loop construct has both a `collapse` clause and an `ordered`  clause. The `collapse` clause causes the iterations of the `k` and  `j` loops to be collapsed into one loop with a larger iteration space, and  that loop is divided among the threads in the current team. An `ordered`  clause is added to the loop construct, because an ordered region binds to the loop  region arising from the loop construct.

According to Section 2.12.8 of the OpenMP 4.0 specification,  a thread must not execute more than one ordered region that binds  to the same loop region. So the `collapse` clause is required for the example  to be conforming. With the `collapse` clause, the iterations of the `k`  and `j` loops are collapsed into one loop, and therefore only one ordered  region will bind to the collapsed `k` and `j` loop. Without the `collapse`  clause, there would be two ordered regions that bind to each iteration of the `k`  loop (one arising from the first iteration of the `j` loop, and the other  arising from the second iteration of the `j` loop).

The code prints

`0 1 1`  `0 1 2`  `0 2 1`  `1 2 2`  `1 3 1`  `1 3 2`

In [None]:
//%compiler: clang
//%cflags: -fopenmp

/*
* name: collapse.3c
* type: C
* version: omp_3.0
*/
#include <omp.h>
#include <stdio.h>
void work(int a, int j, int k);
void sub()
{
   int j, k, a;
   #pragma omp parallel num_threads(2)
   {
      #pragma omp for collapse(2) ordered private(j,k) schedule(static,3)
      for (k=1; k<=3; k++)
         for (j=1; j<=2; j++)
         {
            #pragma omp ordered
            printf("%d %d %d\n", omp_get_thread_num(), k, j);
            /* end ordered */
            work(a,j,k);
         }
   }
}



In [None]:

! name: collapse.3f
! type: F-fixed
! version: omp_3.0
      program test
      include 'omp_lib.h'
!$omp parallel num_threads(2)
!$omp do collapse(2) ordered private(j,k) schedule(static,3)
      do k = 1,3
        do j = 1,2
!$omp ordered
          print *, omp_get_thread_num(), k, j
!$omp end ordered
          call work(a,j,k)
        enddo
      enddo
!$omp end do
!$omp end parallel
      end program test

