Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 65 additions & 65 deletions OPENMP/Branch/branch.c
Original file line number Diff line number Diff line change
@@ -1,32 +1,32 @@
/*
Copyright (c) 2013, Intel Corporation

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

* Redistributions of source code must retain the above copyright
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials provided
* Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following
disclaimer in the documentation and/or other materials provided
with the distribution.
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products
derived from this software without specific prior written
* Neither the name of Intel Corporation nor the names of its
contributors may be used to endorse or promote products
derived from this software without specific prior written
permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
*/

Expand All @@ -51,16 +51,16 @@ PURPOSE: This program tests the effect of inner-loop branches on
- the code should be continuously scalable, i.e. the user should
be able to specify the amount of work to be done.
- the code should be verifiable.
- the code should be executable with and without branches, with
otherwise identical amounts of work, to assess the impact of the
- the code should be executable with and without branches, with
otherwise identical amounts of work, to assess the impact of the
branches.
- the performance of the code should be dominated by the work in
the loops, not by memory bandwidth required to fetch data. This
means that arrays should fit in cache, and any loop over arrays
should be executed many times to amortize the initial memory load
the loops, not by memory bandwidth required to fetch data. This
means that arrays should fit in cache, and any loop over arrays
should be executed many times to amortize the initial memory load
costs and to remove noise from the timings.
- any arrays used should be initialized only once, to avoid confusing
performance impact of initialization with that of the branches.
performance impact of initialization with that of the branches.
Because the base loop over the array is short, it completes very
quickly, leading to very noisy results if it were timed separately.
Hence, we must time the ensemble of all iterations over the base
Expand All @@ -72,17 +72,17 @@ PURPOSE: This program tests the effect of inner-loop branches on
- the amount of work in the codes containing the three different
types of light-weight loops should be the same to allow fair
comparisions.
- the code should not not produce overflow or underflow.
- the code should not not produce overflow or underflow.
- the actual cost of computing the branch condition should be small,
so that we can assess the cost of the occurrence of the branch as
it disrupts vectorization and the hardware pipelines). If the
condition were expensive to compute and we run the code with and
without the branch, the performance difference would be exaggerated.
- Note: Casts from integer to float or double are not always vectorizable.
- Note: Casts from integer to float or double are not always vectorizable.

APPROACH:
- to avoid casts and keep conditionals inexpensive and exact, we use
only integer operations.
- to avoid casts and keep conditionals inexpensive and exact, we use
only integer operations.
- we make sure that the numerical results of the codes for the
different branch structures and for the different paths following
the branch are identical.
Expand All @@ -95,17 +95,17 @@ PURPOSE: This program tests the effect of inner-loop branches on
bounded, and verification values are easily computable.

USAGE: The program takes as input the number of threads, the length of the
vector loop, the number of repetitions of the loop, and the type of
vector loop, the number of repetitions of the loop, and the type of
branching

<progname> <# threads> <# iterations> <vector length> <branch_type>
The output consists of diagnostics to make sure the

The output consists of diagnostics to make sure the
algorithm worked, and of timing statistics.

FUNCTIONS CALLED:

Other than OpenMP or standard C functions, the following
Other than OpenMP or standard C functions, the following
functions are used in this program:

wtime()
Expand All @@ -114,7 +114,7 @@ FUNCTIONS CALLED:
func*()

HISTORY: Written by Rob Van der Wijngaart, May 2006.

**********************************************************************************/

#include <par-res-kern_general.h>
Expand All @@ -128,7 +128,7 @@ HISTORY: Written by Rob Van der Wijngaart, May 2006.
#define WITH_BRANCHES 1
#define WITHOUT_BRANCHES 0

extern int fill_vec(int *vector, int vector_length, int iterations, int branch,
extern int fill_vec(int *vector, int vector_length, int iterations, int branch,
int *nfunc, int *rank);

int main(int argc, char ** argv)
Expand All @@ -144,15 +144,15 @@ int main(int argc, char ** argv)
int i, iter, aux; /* dummies */
char *branch_type; /* string defining branching type */
int btype; /* integer encoding branching type */
int total=0,
int total=0,
total_ref; /* computed and stored verification values */
int nthread_input; /* thread parameters */
int nthread;
int nthread;
int num_error=0; /* flag that signals that requested and obtained
numbers of threads are the same */

/**********************************************************************************
** process and test input parameters
** process and test input parameters
**********************************************************************************/

if (argc != 5){
Expand Down Expand Up @@ -208,7 +208,7 @@ int main(int argc, char ** argv)
printf("ERROR: number of requested threads %d does not equal ",
nthread_input);
printf("number of spawned threads %d\n", nthread);
}
}
else {
printf("Number of threads = %d\n", nthread_input);
printf("Vector length = %d\n", vector_length);
Expand All @@ -231,17 +231,17 @@ int main(int argc, char ** argv)
/* grab the second half of vector to store index array */
index = vector + vector_length;

/* initialize the array with entries with varying signs; array "index" is only
/* initialize the array with entries with varying signs; array "index" is only
used to obfuscate the compiler (i.e. it won't vectorize a loop containing
indirect referencing). It functions as the identity operator. */
for (i=0; i<vector_length; i++) {
for (i=0; i<vector_length; i++) {
vector[i] = 3 - (i&7);
index[i] = i;
}

#pragma omp barrier
#pragma omp barrier
#pragma omp master
{
{
branch_time = wtime();
}

Expand All @@ -252,14 +252,14 @@ int main(int argc, char ** argv)
case VECTOR_STOP:
/* condition vector[index[i]]>0 inhibits vectorization */
for (iter=0; iter<iterations; iter+=2) {
#pragma vector always
for (i=0; i<vector_length; i++) {
#pragma omp simd
for (i=0; i<vector_length; i++) {
aux = -(3 - (i&7));
if (vector[index[i]]>0) vector[i] -= 2*vector[i];
else vector[i] -= 2*aux;
}
#pragma vector always
for (i=0; i<vector_length; i++) {
#pragma omp simd
for (i=0; i<vector_length; i++) {
aux = (3 - (i&7));
if (vector[index[i]]>0) vector[i] -= 2*vector[i];
else vector[i] -= 2*aux;
Expand All @@ -270,13 +270,13 @@ int main(int argc, char ** argv)
case VECTOR_GO:
/* condition aux>0 allows vectorization */
for (iter=0; iter<iterations; iter+=2) {
#pragma vector always
#pragma omp simd
for (i=0; i<vector_length; i++) {
aux = -(3 - (i&7));
if (aux>0) vector[i] -= 2*vector[i];
else vector[i] -= 2*aux;
}
#pragma vector always
#pragma omp simd
for (i=0; i<vector_length; i++) {
aux = (3 - (i&7));
if (aux>0) vector[i] -= 2*vector[i];
Expand All @@ -288,13 +288,13 @@ int main(int argc, char ** argv)
case NO_VECTOR:
/* condition aux>0 allows vectorization, but indirect indexing inbibits it */
for (iter=0; iter<iterations; iter+=2) {
#pragma vector always
#pragma omp simd
for (i=0; i<vector_length; i++) {
aux = -(3 - (i&7));
if (aux>0) vector[i] -= 2*vector[index[i]];
else vector[i] -= 2*aux;
}
#pragma vector always
#pragma omp simd
for (i=0; i<vector_length; i++) {
aux = (3 - (i&7));
if (aux>0) vector[i] -= 2*vector[index[i]];
Expand All @@ -319,7 +319,7 @@ int main(int argc, char ** argv)

#pragma omp barrier
#pragma omp master
{
{
no_branch_time = wtime();
}

Expand All @@ -330,29 +330,29 @@ int main(int argc, char ** argv)
case VECTOR_STOP:
case VECTOR_GO:
for (iter=0; iter<iterations; iter+=2) {
#pragma vector always
for (i=0; i<vector_length; i++) {
aux = -(3-(i&7));
#pragma omp simd
for (i=0; i<vector_length; i++) {
aux = -(3-(i&7));
vector[i] -= (vector[i] + aux);
}
for (i=0; i<vector_length; i++) {
aux = (3-(i&7));
aux = (3-(i&7));
vector[i] -= (vector[i] + aux);
}
}
break;

case NO_VECTOR:
for (iter=0; iter<iterations; iter+=2) {
#pragma vector always
#pragma omp simd
for (i=0; i<vector_length; i++) {
aux = -(3-(i&7));
vector[i] -= (vector[index[i]]+aux);
vector[i] -= (vector[index[i]]+aux);
}
#pragma vector always
#pragma omp simd
for (i=0; i<vector_length; i++) {
aux = (3-(i&7));
vector[i] -= (vector[index[i]]+aux);
vector[i] -= (vector[index[i]]+aux);
}
}
break;
Expand All @@ -377,17 +377,17 @@ int main(int argc, char ** argv)

if (total == total_ref) {
printf("Solution validates\n");
printf("Rate (Mops/s) with branches: %lf, time (s): %lf\n",
printf("Rate (Mops/s) with branches: %lf, time (s): %lf\n",
ops/(branch_time*1.e6), branch_time);
printf("Rate (Mops/s) without branches: %lf, time (s): %lf\n",
printf("Rate (Mops/s) without branches: %lf, time (s): %lf\n",
ops/(no_branch_time*1.e6), no_branch_time);
#ifdef VERBOSE
printf("Array sum = %d, reference value = %d\n", total, total_ref);
#endif
#endif
}
else {
printf("ERROR: array sum = %d, reference value = %d\n", total, total_ref);
}

exit(EXIT_SUCCESS);
}
}
12 changes: 5 additions & 7 deletions OPENMP/Nstream/nstream.c
Original file line number Diff line number Diff line change
Expand Up @@ -232,8 +232,7 @@ int main(int argc, char **argv)
}
bail_out(num_error);

#pragma omp for
#pragma vector always
#pragma omp for simd
for (j=0; j<length; j++) {
a[j] = 0.0;
b[j] = 2.0;
Expand All @@ -252,8 +251,8 @@ int main(int argc, char **argv)
nstream_time = wtime();
}

#pragma omp for
#pragma vector always
#pragma omp for simd

for (j=0; j<length; j++) a[j] = b[j]+scalar*c[j];

#pragma omp master
Expand All @@ -264,8 +263,7 @@ int main(int argc, char **argv)
maxtime = MAX(maxtime, nstream_time);
}
/* insert a dependency between iterations to avoid dead-code elimination */
#pragma omp for
#pragma vector always
#pragma omp for simd
for (j=0; j<length; j++) b[j] = a[j];
}
} /* end of OpenMP parallel region */
Expand Down Expand Up @@ -327,4 +325,4 @@ int checkTRIADresults (int iterations, long int length) {
printf ("Solution Validates\n");
return (1);
}
}
}