# SIMD

## Overview

Single instruction multiple data (SIMD) performs one instruction concurrently on multiple elements.
Special instructions are provided for gather, scatter, masked and other operations.

<img src="img/simd/vectorized.svg" alt="scalar-vs-vectorized" width="50%" background-color="#ffffff"/>

OpenMP supports automatic vectorization over loop iterations with the `simd` construct ([OpenMP 5.1 - 2.11.5](https://www.openmp.org/spec-html/5.1/openmpsu49.html)).
\
Loops are partitioned into chunks of the size of the vector length and all elements of a chunk are then processed in parallel.

Requirements: as with worksharing constructs, no data dependencies or pointer aliasing are allowed.
The `safelen` clause can be used to partly lift this restriction.

In [None]:
%load_ext ice.magic

## SIMD Construct

In [None]:
%%cpp_omp -o code/simd/vec-init.cpp -t

constexpr auto N = 1024;
int vec[N];

#pragma omp simd
            ☝
for (auto i = 0; i < N; ++i)
    vec[i] = i;

std::cout << vec[N - 1] << std::endl;

Compilation of simd constructs exclusively (i.e. ignoring other OpenMP constructs) can be done by specifying `-fopenmp-simd` for gcc and clang compilers, or `qopenmp-simd` for Intel compilers.

In [None]:
%%cpp -o code/simd/serial-vec-init.cpp -v -f fopenmp-simd
  ☝                                        ☝

constexpr auto N = 1024;
int vec[N];

#pragma omp simd
for (auto i = 0; i < N; ++i)
    vec[i] = i;

std::cout << vec[N - 1] << std::endl;

Allowed clauses for simd included

| Clause      | Specifies                                                                                                |
|-------------|----------------------------------------------------------------------------------------------------------|
| `simdlen`   | the number of elements in one SIMD vector                                                                |
| `aligned`   | a list of aligned pointers with optional alignment                                                       |
| `linear`    | a list of variables that have a linear dependency with the loop iterator; they are implicitly privatized |
| `collapse`  | a collapse operation as before                                                                           |
| `reduction` | a reduction as before                                                                                    |

In [None]:
%%cpp_omp -o code/simd/simd-clauses.cpp -v -i stdlib.h

constexpr auto N = 1024;
int *vec;
vec = (int*)aligned_alloc(64, 1024 * sizeof(int));
      ☝

#pragma omp simd simdlen(8) aligned(vec:64)
                 ☝        ☝
for (auto i = 0; i < N; ++i)
    vec[i] = i;

std::cout << vec[N - 1] << std::endl;

free(vec);

## Combination with Parallel

The simd and parallel for constructs can be fused into `parallel for simd`.
In this case, the optional `simd` modifier is supported in the schedule clause.
When set, then chunk sizes are set to be a multiple of the SIMD width.

In [None]:
%%cpp_omp -o code/simd/parallel.cpp -t

constexpr auto N = 1024;
int vec[N];

#pragma omp parallel for simd schedule(simd:static)
for (auto i = 0; i < N; ++i)
    vec[i] = i;

std::cout << vec[N - 1] << std::endl;

## Vectorizing Functions

Functions to be called from vectorized code can be converted into vectorized counterparts with the `declare simd` directive ([OpenMP 5.1 - 2.11.5.3](https://www.openmp.org/spec-html/5.1/openmpsu49.html)).


```cpp
#pragma omp declare simd simdlen(2) linear(x, y) uniform(a, b, c) notinbranch
#pragma omp declare simd simdlen(4) linear(x, y) uniform(a, b, c)
#pragma omp declare simd simdlen(8) linear(x, y) uniform(a, b, c) notinbranch
double eval_polynom(double x, double y, double a, double b, double c) {
    return a * x * x + b * y + c;
}

/// ...

#pragma omp simd
for (int i = 0; i < n; ++i) {
    outVec[i] = eval_polynom(xVec[i], yVec[i], 2, 4);
}
```

Additional clauses include
| Clause                      | Specifies                                                                                                |
|-----------------------------|----------------------------------------------------------------------------------------------------------|
| `simdlen`                   | the number of elements in one SIMD vector                                                                |
| `aligned`                   | a list of aligned pointers with optional alignment                                                       |
| `linear`                    | a list of variables that have a linear dependency with the loop iterator; they are implicitly privatized |
| `uniform`                   | a list of variables that do not change                                                                   |
| `inbranch` or `notinbranch` | whether the function is called from within a branch (or not)                                             |

Multiple declarations with different clauses are allowed.