## SIMD Vectorization
Does my code vectorize? Let's take a look.

In [1]:
function add(out, x, y)
    for i in 1:length(out)
        out[i] = x[i] + y[i]
    end
    return out
end

add (generic function with 1 method)

In [4]:
@code_llvm add(Vector{Float64}, Vector{Float64}, Vector{Float64})


define i8** @julia_add_63046(i8**, i8**, i8**) #0 !dbg !5 {
top:
  %3 = call i8**** @jl_get_ptls_states() #4
  %4 = alloca [25 x i8**], align 8
  %.sub = getelementptr inbounds [25 x i8**], [25 x i8**]* %4, i64 0, i64 0
  %5 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 15
  %6 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 18
  %7 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 23
  %8 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 21
  %9 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 20
  %10 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 2
  %"#temp#" = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 3
  %11 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 4
  %12 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 5
  %13 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 6
  %14 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 7
  %15 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0,

## @inbounds
Adding `@inbounds` removes the bound-checks and gives LLVM the opportunity to auto-vectorize this function.

In [7]:
function add2(out, x, y)
    @inbounds for i in 1:length(out)
        out[i] = x[i] + y[i]
    end
    return out
end

add2 (generic function with 1 method)

In [8]:
@code_llvm add2(Vector{Float64}, Vector{Float64}, Vector{Float64})


define i8** @julia_add2_63060(i8**, i8**, i8**) #0 !dbg !5 {
top:
  %3 = call i8**** @jl_get_ptls_states() #4
  %4 = alloca [25 x i8**], align 8
  %.sub = getelementptr inbounds [25 x i8**], [25 x i8**]* %4, i64 0, i64 0
  %5 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 15
  %6 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 18
  %7 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 23
  %8 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 21
  %9 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 20
  %10 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 2
  %"#temp#" = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 3
  %11 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 4
  %12 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 5
  %13 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 6
  %14 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0, i64 7
  %15 = getelementptr [25 x i8**], [25 x i8**]* %4, i64 0

## SIMD.jl
Other option is to use explicit SIMD vectorization instructions. The [SIMD.jl](https://github.com/eschnett/SIMD.jl) library gives you correct data types for this.

## Additionally:
Look [here](https://slides.com/valentinchuravy/julia-parallelism) for a lecture about levels of parallelism in Julia.

The syntactic loop fusion is discussed [here](https://julialang.org/blog/2017/01/moredots).