-
-
Notifications
You must be signed in to change notification settings - Fork 600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flux#zygote slower than Tracker #815
Comments
So I've been playing around with Zygote a bit, and something I've noticed is it suffers terrible performance when you are doing type conversions in the loss you are optimizing over. Tracker will also have worse performance but isn't effected as much as zygote. If you run the above code, except w/ confs_left = rand(Float32, 64,4000)
confs_right = rand(Float32, 64,4000) you will get much better performance. Basically, you want to make sure your inputs match the type of your model (at least I think). For me the speed up is significant: W/ Float64s julia> @btime Flux.train!($loss, Flux.params($m), $dataset, $opt)
1.028 s (26084765 allocations: 522.98 MiB) W/ Float32s julia> @btime Flux.train!($loss, Flux.params($m), $dataset, $opt)
55.755 ms (484235 allocations: 54.22 MiB) |
Not the core of the problem,
|
From what @oxinabox mentioned in the issue I opened in Zygote (FluxML/Zygote.jl#491 (comment)), this does not seem to be an issue with Flux or Zygote and is even mentioned in the Flux performance tips. It might be beneficial for newcomers for this gotcha to be highlighted more directly in parts of the tutorials. Potentially even in the basics, as this seems to be much less of a problem when looking at the likes of Pytorch and Tensorflow (at least as far as I can tell). |
The problem in FluxML/Zygote.jl#491 was that Flux has some logic to avoid that, at least for Dense layers, by simply converting one of the matrices before multiplying: Lines 116 to 117 in 5839e16
Making this an error is one way to find problems. (And perhaps that would be a better default for Flux than silent conversions?) But the slowdown with Zygote appears to be caused by precisely how this conversion happens. Here's one way to fix it: function (a::Flux.Dense{<:Any,W})(x::AbstractArray{<:AbstractFloat}) where {T <: Union{Float32,Float64}, W <: AbstractArray{T}}
# error("tried to convert types")
a(arrayconvert(T, x))
end
arrayconvert(T::Type, x::AbstractArray) = T.(x)
using Zygote: @adjoint
@adjoint arrayconvert(T, x) = T.(x), dy -> (nothing, dy) Probably |
I have also this problem using Flux const Nf=128 const m=Chain(GRU(Nf,DIM_H),GRU(DIM_H,DIM_H),GRU(DIM_H,DIM_H),Dense(DIM_H,Nf,relu)) function loss(data::Array{Float32,2},label::Array{Float32,2}) input=rand(Float32,Nf,FRAMES) @time gs=gradient(() -> loss(input, label), params(m)) with Flux@v0.10.3(zygote) with Flux@v0.9.0(tracker) Tracker is faster than zygote and the difference become larger when input matrix increase. see allocations and gc time. |
Can you check with benchmark tools and |
Running the code above I got this:
|
Zygote allocated a lot more, but in the end it amounts to roughly the same time |
Keeping the code the same, just increasing the size of the hidden layer in
|
It'd be good to understand what is happening here, but seems like the mwe in the OP seems to be roughly equivalent? |
I just asserted the data is Float32, and set the model to be |
The rest is pretty much the same. Removed the call to |
Why does
Compare with
|
Don't know why, but the relu activation function seems problematic on my setup. |
Is it that the adjoint for |
the regression with respect to Tracker seems to be solved, if other problems persist please file separate issues |
MWE:
I get
955.648 ms (28965221 allocations: 587.11 MiB)
with Flux + Zygote and47.298 ms (7471 allocations: 63.52 MiB)
with Flux + Tracker.Is this expected because I'm running Julia 1.1?
The text was updated successfully, but these errors were encountered: