# SIMD

## Overview

Single instruction multiple data (SIMD) performs one instruction on multiple elements.
Special instructions are provided for gather and scatter operations, masked operations, ...

<img src="img/simd/vectorized.svg" alt="scalar-vs-vectorized" width="50%" background-color="#ffffff"/>

OpenMP supports automatic vectorization over loop iterations with the `simd` construct.
Loop is partitioned into chunks of the size of the vector length.

Requirements: as with worksharing constructs, no data dependencies or pointer aliasing are allowed.
`safelen` can be used to partly lift this restriction.

In [2]:
%load_ext ice.magic

In [9]:
%%cpp_omp -o code/simd/vec-init.cpp -t

constexpr auto N = 1024;
int vec[N];

#pragma omp simd
for (auto i = 0; i < N; ++i)
    vec[i] = i;

std::cout << vec[N - 1] << std::endl;

1023
Total time: 0.053898 ms


Compilation of simd constructs exclusively (i.e. ignoring other OpenMP constructs) can be done by specifying `-fopenmp-simd` for gcc and clang compilers, or `qopenmp-simd` for Intel compilers.

In [15]:
%%cpp -o code/simd/serial-vec-init.cpp -v -f fopenmp-simd

constexpr auto N = 1024;
int vec[N];

#pragma omp simd
for (auto i = 0; i < N; ++i)
    vec[i] = i;

std::cout << vec[N - 1] << std::endl;

Writing to       my-code.cpp
Compiling with   g++ -O3 -march=native -std=c++17 -Wall -fopenmp-simd -o /tmp/my-app my-code.cpp
Executing        /tmp/my-app

1023


Other clauses
* `simdlen`
* `safelen`
* `aligned(var[:alignment])`
* `linear(list[:step])`; list variables are privatized
* collapse
* reduction

### Combination with Parallel

optional: simd modifier in schedule causes chunk size to be a multiple of simd width

In [18]:
%%cpp_omp -o code/simd/parallel.cpp -t

constexpr auto N = 1024;
int vec[N];

#pragma omp parallel for simd schedule(simd:static)
for (auto i = 0; i < N; ++i)
    vec[i] = i;

std::cout << vec[N - 1] << std::endl;

1023
Total time: 10.2683 ms


### Vectorizing Functions

declare simd simdlen(n) [clauses]

```cpp
#pragma omp declare simd simdlen(2)
#pragma omp declare simd simdlen(4)
#pragma omp declare simd simdlen(8)
double sumit(double a, double b) 
{
    return a + b;
}
...
#pragma omp simd
for (int i = 0; i < n; ++i)
    a[i] = sumit(b[i], c[i]);
```

Additional clauses can be
* a list of `uniform` parameters, i.e. parameters that do not change
* a list of `linear` parameters, i.e. parameters with a linear relationship on the iterator of the calling loop
* `inbranch` or `notinbranch` specifying whether the function is called from withing a branch (or not)
* a list of `aligned` pointers with respective alignments

Multiple declarations with different clauses are allowed.