use LoopVectorization to vectorize activation functions and softmax #199

AStupidBear · 2020-05-10T05:59:00Z

using BenchmarkTools
using NNlib
using Zygote

Old NNlib

julia> for f in (tanh, σ)
           println(f)
           @btime $f.(x)
           @btime Zygote.gradient($x) do z
               sum($f.(z))
           end
       end
tanh
  88.858 μs (3 allocations: 16.16 KiB)
  114.096 μs (20 allocations: 32.61 KiB)
σ
  48.257 μs (3 allocations: 16.16 KiB)
  56.673 μs (5 allocations: 32.30 KiB)

julia> for d in 1:ndims(x)
           println(softmax, " dim=", d)
           @btime softmax($x, dims = $d)
           @btime Zygote.gradient($x) do z
               sum(softmax(z, dims = $d))
           end
       end
softmax dim=1
  58.959 μs (13 allocations: 32.86 KiB)
  133.949 μs (30 allocations: 98.19 KiB)
softmax dim=2
  51.954 μs (31 allocations: 34.23 KiB)
  110.524 μs (72 allocations: 101.53 KiB)

This PR:

julia> for f in (tanh, σ)
           println(f)
           @btime $f.(x)
           @btime Zygote.gradient($x) do z
               sum($f.(z))
           end
       end
tanh
  9.181 μs (1 allocation: 16.13 KiB)
  11.725 μs (20 allocations: 32.61 KiB)
σ
  2.344 μs (1 allocation: 16.13 KiB)
  4.755 μs (5 allocations: 32.30 KiB)

julia> for d in 1:ndims(x)
           println(softmax, " dim=", d)
           @btime softmax($x, dims = $d)
           @btime Zygote.gradient($x) do z
               sum(softmax(z, dims = $d))
           end
       end
softmax dim=1
  25.194 μs (13 allocations: 32.86 KiB)
  56.679 μs (30 allocations: 98.19 KiB)
softmax dim=2
  9.377 μs (31 allocations: 34.23 KiB)
  25.198 μs (72 allocations: 101.53 KiB)

Other activation functions can be sped up by overloading Base.broadcasted after the adjoint is defined in LoopVectorization (JuliaSIMD/LoopVectorization.jl#108).

CarloLucibello · 2020-06-06T11:13:05Z

vreduce doesn't support a dims args yet

AStupidBear · 2020-06-06T11:41:52Z

@CarloLucibello The next release of LoopVectorization will has that.

AStupidBear · 2020-06-07T15:54:31Z

Tests are passed now. Any idea?

src/activation.jl

CarloLucibello · 2020-06-09T07:53:54Z

Is all of this Zygote friendly?

src/softmax.jl

CarloLucibello · 2020-07-02T17:08:26Z

@AStupidBear bump on this, it would be really nice to have this performance improvement

chriselrod · 2020-07-02T17:57:08Z

FWIW, I added a much faster AVX512-Float32 tanh to SLEEFPirates.
I haven't released it yet, but if it will be used here, I can try to add AVX2 and Float64 versions as well.

These will still probably be slower than:

function tanh_fast(x)
    exp2x = exp(x + x)
    (exp2x - 1)/(exp2x + 1)
end

But they are a little more accurate.

AStupidBear · 2020-07-05T04:00:04Z

Is all of this Zygote friendly?

Since all those modifications are in kernel functions used in the forward and backward pass of Zygote, they are Zygote friendly.

…orize

src/softmax.jl

CarloLucibello · 2020-07-07T12:04:18Z

what's the status of this?

AStupidBear · 2020-07-07T13:48:17Z

It's ready to get merged.

julia> using NNlib, StaticArrays; x = SMatrix{2, 2}(rand(2, 2))
2×2 SArray{Tuple{2,2},Float64,2,4} with indices SOneTo(2)×SOneTo(2):
 0.604445  0.330955
 0.975996  0.909042

julia> softmax(x)
2×2 SArray{Tuple{2,2},Float64,2,4} with indices SOneTo(2)×SOneTo(2):
 0.408166  0.359373
 0.591834  0.640627

julia> σ.(x)
2×2 SArray{Tuple{2,2},Float64,2,4} with indices SOneTo(2)×SOneTo(2):
 0.646673  0.581992
 0.726313  0.712804

julia> logsoftmax(x)
2×2 SArray{Tuple{2,2},Float64,2,4} with indices SOneTo(2)×SOneTo(2):
 -0.89608  -1.02339 
 -0.52453  -0.445308

CarloLucibello · 2020-07-07T15:55:07Z

Alright, let's merge this and tag a new release. If no problems come up, we can then extend apply vmap to all activations by providing custom adjoints, right @AStupidBear ?

AStupidBear · 2020-07-08T09:53:56Z

@CarloLucibello Yes! But where should we put those definitions? Zygote?

CarloLucibello · 2020-07-08T09:58:52Z

yes, here https://github.com/FluxML/Zygote.jl/blob/master/src/lib/nnlib.jl

AStupidBear · 2020-07-08T10:02:35Z

Maybe it's better to dispatch other activation functions to vmap and then define the adjoint for vmap? @chriselrod Any idea?

CarloLucibello · 2020-07-08T13:58:58Z

I think we can simply copy the adjoint map, i.e. doing something similar to FluxML/Zygote.jl#728

ChrisRackauckas · 2020-07-08T14:09:35Z

yeah, we can just add vmap to the loop I have in that PR. That would need a LoopVectorization dependency then?

CarloLucibello · 2020-07-08T14:11:41Z

yes. also needs some tests

use LoopVectorization

8281850

This was referenced May 10, 2020

Speed up recurrent layers by seperating broadcasting of tanh/sigmoid from other parts FluxML/Flux.jl#1169

Closed

Piecewise approximations to tanh and sigmoid #195

Closed

AStupidBear added 2 commits May 10, 2020 14:07

remove @inline

3b894ba

overload vifelse

bee95fd

AStupidBear mentioned this pull request May 10, 2020

vifelse for scaler inputs chriselrod/SIMDPirates.jl#8

Closed

AStupidBear added 9 commits May 10, 2020 16:49

drop julia 1.0

fd8b253

use @avx

61f64e1

remove vifelse overloading

3693735

revert to SLEEF

8c5e7be

update Manifest

a81aa85

use SLEEF.log only on x64

dfc0383

always use SLEEF.log

fcea0f0

use vreduce

937de5a

update Project and Manifest

bfc4039

AStupidBear added 3 commits June 7, 2020 17:01

update Manifest

8f8fdd7

update Project

52f3caa

update Manifest

19b6a14

CarloLucibello reviewed Jun 9, 2020

View reviewed changes

src/activation.jl Outdated Show resolved Hide resolved

CarloLucibello reviewed Jun 9, 2020

View reviewed changes

src/activation.jl Outdated Show resolved Hide resolved

CarloLucibello reviewed Jun 9, 2020

View reviewed changes

src/softmax.jl Outdated Show resolved Hide resolved

chriselrod reviewed Jun 14, 2020

View reviewed changes

src/softmax.jl Outdated Show resolved Hide resolved

compatible with generic AbstractArrays

1af8738

Merge branch 'master' of https://github.com/FluxML/NNlib.jl into vect…

25a16b7

…orize

AStupidBear commented Jul 5, 2020

View reviewed changes

src/softmax.jl Show resolved Hide resolved

add newline

6492fe9

CarloLucibello merged commit 1f0388d into FluxML:master Jul 7, 2020

DhairyaLGandhi mentioned this pull request Jul 9, 2020

Find out common bottlenecks FluxML/Flux.jl#1273

Open

4 tasks

AStupidBear mentioned this pull request Jul 10, 2020

replace Base.tanh with faster tanh FluxML/Flux.jl#1272

Closed

4 tasks

CarloLucibello mentioned this pull request Jul 11, 2020

use vmap for all activations #220

Open

DhairyaLGandhi mentioned this pull request Jul 25, 2020

Revert LoopVectorization #226

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use LoopVectorization to vectorize activation functions and softmax #199

use LoopVectorization to vectorize activation functions and softmax #199

AStupidBear commented May 10, 2020 •

edited

Loading

CarloLucibello commented Jun 6, 2020

AStupidBear commented Jun 6, 2020

AStupidBear commented Jun 7, 2020

CarloLucibello commented Jun 9, 2020

CarloLucibello commented Jul 2, 2020

chriselrod commented Jul 2, 2020 •

edited

Loading

AStupidBear commented Jul 5, 2020

CarloLucibello commented Jul 7, 2020

AStupidBear commented Jul 7, 2020

CarloLucibello commented Jul 7, 2020

AStupidBear commented Jul 8, 2020

CarloLucibello commented Jul 8, 2020

AStupidBear commented Jul 8, 2020

CarloLucibello commented Jul 8, 2020

ChrisRackauckas commented Jul 8, 2020

CarloLucibello commented Jul 8, 2020

use LoopVectorization to vectorize activation functions and softmax #199

use LoopVectorization to vectorize activation functions and softmax #199

Conversation

AStupidBear commented May 10, 2020 • edited Loading

CarloLucibello commented Jun 6, 2020

AStupidBear commented Jun 6, 2020

AStupidBear commented Jun 7, 2020

CarloLucibello commented Jun 9, 2020

CarloLucibello commented Jul 2, 2020

chriselrod commented Jul 2, 2020 • edited Loading

AStupidBear commented Jul 5, 2020

CarloLucibello commented Jul 7, 2020

AStupidBear commented Jul 7, 2020

CarloLucibello commented Jul 7, 2020

AStupidBear commented Jul 8, 2020

CarloLucibello commented Jul 8, 2020

AStupidBear commented Jul 8, 2020

CarloLucibello commented Jul 8, 2020

ChrisRackauckas commented Jul 8, 2020

CarloLucibello commented Jul 8, 2020

AStupidBear commented May 10, 2020 •

edited

Loading

chriselrod commented Jul 2, 2020 •

edited

Loading