Skip to content

Commit

Permalink
rip -O3 (#372)
Browse files Browse the repository at this point in the history
  • Loading branch information
KristofferC authored and jrevels committed Oct 30, 2018
1 parent 0fe35e3 commit 0eeb1db
Showing 1 changed file with 1 addition and 85 deletions.
86 changes: 1 addition & 85 deletions docs/src/user/advanced.md
Expand Up @@ -200,90 +200,6 @@ julia> vector_hessian(cumprod, [1, 2, 3])
Likewise, you could write a version of `vector_hessian` which supports functions of the
form `f!(y, x)`, or perhaps an in-place Jacobian with `ForwardDiff.jacobian!`.

## SIMD Vectorization

Many operations on ForwardDiff's dual numbers are amenable to [SIMD
vectorization](https://en.wikipedia.org/wiki/SIMD#Hardware). For some ForwardDiff
benchmarks, we've seen SIMD vectorization yield [speedups of almost
3x](https://github.com/JuliaDiff/ForwardDiff.jl/issues/98#issuecomment-253149761).

To enable SIMD optimizations, start your Julia process with the `-O3` flag. This flag
enables [LLVM's SLPVectorizerPass](http://llvm.org/docs/Vectorizers.html#the-slp-vectorizer)
during compilation, which attempts to automatically insert SIMD instructions where possible
for certain arithmetic operations.

Here's an example of LLVM bitcode generated for an addition of two `Dual` numbers without
SIMD instructions (i.e. not starting Julia with `-O3`):

```julia
julia> using ForwardDiff: Dual

julia> a = Dual(1., 2., 3., 4.)
Dual{Nothing}(1.0,2.0,3.0,4.0)

julia> b = Dual(5., 6., 7., 8.)
Dual{Nothing}(5.0,6.0,7.0,8.0)

julia> @code_llvm a + b

define void @"julia_+_70852"(%Dual* noalias sret, %Dual*, %Dual*) #0 {
top:
%3 = getelementptr inbounds %Dual, %Dual* %1, i64 0, i32 1, i32 0, i64 0
%4 = load double, double* %3, align 8
%5 = getelementptr inbounds %Dual, %Dual* %2, i64 0, i32 1, i32 0, i64 0
%6 = load double, double* %5, align 8
%7 = fadd double %4, %6
%8 = getelementptr inbounds %Dual, %Dual* %1, i64 0, i32 1, i32 0, i64 1
%9 = load double, double* %8, align 8
%10 = getelementptr inbounds %Dual, %Dual* %2, i64 0, i32 1, i32 0, i64 1
%11 = load double, double* %10, align 8
%12 = fadd double %9, %11
%13 = getelementptr inbounds %Dual, %Dual* %1, i64 0, i32 1, i32 0, i64 2
%14 = load double, double* %13, align 8
%15 = getelementptr inbounds %Dual, %Dual* %2, i64 0, i32 1, i32 0, i64 2
%16 = load double, double* %15, align 8
%17 = fadd double %14, %16
%18 = getelementptr inbounds %Dual, %Dual* %1, i64 0, i32 0
%19 = load double, double* %18, align 8
%20 = getelementptr inbounds %Dual, %Dual* %2, i64 0, i32 0
%21 = load double, double* %20, align 8
%22 = fadd double %19, %21
%23 = getelementptr inbounds %Dual, %Dual* %0, i64 0, i32 0
store double %22, double* %23, align 8
%24 = getelementptr inbounds %Dual, %Dual* %0, i64 0, i32 1, i32 0, i64 0
store double %7, double* %24, align 8
%25 = getelementptr inbounds %Dual, %Dual* %0, i64 0, i32 1, i32 0, i64 1
store double %12, double* %25, align 8
%26 = getelementptr inbounds %Dual, %Dual* %0, i64 0, i32 1, i32 0, i64 2
store double %17, double* %26, align 8
ret void
}
```

If we start up Julia with `-O3` instead, the call to `@code_llvm` will show that LLVM
can SIMD-vectorize the addition:

```julia
julia> @code_llvm a + b

define void @"julia_+_70842"(%Dual* noalias sret, %Dual*, %Dual*) #0 {
top:
%3 = bitcast %Dual* %1 to <4 x double>* # cast the Dual to a SIMD-able LLVM vector
%4 = load <4 x double>, <4 x double>* %3, align 8
%5 = bitcast %Dual* %2 to <4 x double>*
%6 = load <4 x double>, <4 x double>* %5, align 8
%7 = fadd <4 x double> %4, %6 # SIMD add
%8 = bitcast %Dual* %0 to <4 x double>*
store <4 x double> %7, <4 x double>* %8, align 8
ret void
}
```

Note that whether or not SIMD instructions can actually be used will depend on your machine
and Julia build. For example, pre-built Julia binaries might not emit vectorized LLVM
bitcode. To overcome this specific issue, you can [locally rebuild Julia's system
image](http://docs.julialang.org/en/latest/devdocs/sysimg).

## Custom tags and tag checking

The `Dual` type includes a "tag" parameter indicating the particular function call to
Expand Down Expand Up @@ -315,4 +231,4 @@ want to disable this checking.
```julia
cfg = GradientConfig(nothing, x)
gradient(f, x, cfg)
```
```

0 comments on commit 0eeb1db

Please sign in to comment.