# SIMD

SIMD stands for "Single Instruction Multiple Data" and falls into the category of instruction level parallelism (vector instructions).

In [None]:
function mysum(X)
    acc = zero(eltype(X))
    for i in 1:length(X)
        @inbounds acc += X[i]
    end
    return acc
end

Think about if each loop iteration is independent.

Integer addition is **associative** and the order of operations has no impact. Floating-point addition is **non-associative** and the order of operations is important.

By using `@simd`, we are asserting several properties of the loop:

* It is safe to execute iterations in arbitrary or overlapping order, with special consideration for reduction variables.
* Floating-point operations on reduction variables can be reordered, possibly causing different results than without `@simd`.

In [None]:
function mysum_simd(X)
    acc = zero(eltype(X))
    @simd for i in 1:length(X)
        @inbounds acc += X[i]
    end
    return acc
end

In [None]:
using BenchmarkTools

In [None]:
X = rand(Float64, 1000)
@btime mysum($X);
@btime mysum_simd($X);

In [None]:
X = rand(Int64, 1000)
@btime mysum($X);
@btime mysum_simd($X);

In [None]:
X = rand(Float64, 1000)
s = mysum(X);
s_simd = mysum_simd(X);

In [None]:
s == s_simd # will sometimes be false!!!

In [None]:
abs(s-s_simd)

In [None]:
@code_native debuginfo=:none mysum(X)

In [None]:
@code_native debuginfo=:none mysum_simd(X)

## Structure of Array vs Array of Structure

In [None]:
struct MyComplex
  real::Float64
  imag::Float64
end

In [None]:
# Array of structure
AoS = [MyComplex(rand(),rand()) for i in 1:100]

In [None]:
struct MyComplexes
  real::Vector{Float64}
  imag::Vector{Float64}
end

In [None]:
# Structure of arrays
SoA = MyComplexes(rand(100),rand(100))

In [None]:
# Array of structure (MyComplex)
Base.:+(x::MyComplex,y::MyComplex) = MyComplex(x.real+y.real,x.imag+y.imag)
Base.:/(x::MyComplex,y::Int) = MyComplex(x.real/y,x.imag/y)
average(x::Vector{MyComplex}) = sum(x)/length(x)

# Structure of array (MyComplexes)
average(x::MyComplexes) = MyComplex(sum(x.real),sum(x.imag))/length(x.real)

In [None]:
using BenchmarkTools

In [None]:
@btime average(AoS);
@btime average(SoA);