-
-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unrealistically poor benchmark results #67
Comments
Thanks for the detailed issue. Zygote still does dumb things every so often (c.f. segfault); given that this code is pretty straightforward there's probably either some simple typing issue in broadcast or we're missing a derivative, like you suggest. It would be great to have a manifest for that benchmark as I might not look at it immediately, but I'll dig into it sometime soon. One thing to note is that your Zygote adjoint definitions won't have any effect, since we use a fused forward mode for broadcasts (I assume Yota uses a simpler reverse mode?). What do you mean by "full buffering"? It doesn't seem that you're compiling a tape in this script, otherwise I'd assume you meant something like allocating array storage up front -- but I could be missing something. |
Yep, for broadcasted operations I simply record broadcasted derivatives of scalar functions, e.g. for
No hurry! I think I'll do more benchmarks for Yota on CPU and GPU soon and cross-check them with Zygote to detect any unexpected performance issues. |
751: Fix FFT type promotions r=CarloLucibello a=wkearn I ran into an issue with type promotions while using `Conv` layers in Flux when I tried to run model output through FFTs. I tracked it down to the definitions of the adjoints for the various `fft` variants. The 1/N factors needed in the `irfft` adjoints automatically promoted `Float32` arrays to `Float64` arrays, which then caused the error when propagated back through the chain. A minimal example is included below. This pull request just changes all those multiplications by 1/N into divisions by N, which prevents the promotion. I also added a few tests that assert that the gradients are the right types. The tests with `irfft(x,dims)` throw an error that is not related to the type promotion, so I commented those out. ```julia x = randn(Float32,16,1,1) m = Conv((5,),1=>1,relu,pad=SamePad()) loss(x) = sum(abs2,irfft(rfft(m(x)),16)) gradient(()->loss(x),Flux.params(m)) ┌ Warning: Slow fallback implementation invoked for ∇conv_data! You probably don't want this; check your datatypes. │ yT = AbstractFloat │ T1 = AbstractFloat │ T2 = Float32 └ @ NNlib C:\Users\wkearney\.julia\packages\NNlib\sSn9M\src\conv.jl:206 ERROR: UndefRefError: access to undefined reference Stacktrace: [1] getindex at .\array.jl:745 [inlined] [2] #conv_direct!#149(::Float64, ::Bool, ::typeof(NNlib.conv_direct!), ::Array{AbstractFloat,5}, ::Array{AbstractFloat,5}, ::Array{Float32,5}, ::DenseConvDims{3,(5, 1, 1),1,1,(1, 1, 1),(2, 2, 0, 0, 0, 0),(1, 1, 1),false}) at C:\Users\wkearney\.julia\packages\NNlib\sSn9M\src\impl\conv_direct.jl:98 [3] (::NNlib.var"#kw##conv_direct!")(::NamedTuple{(:alpha, :beta),Tuple{Float64,Bool}}, ::typeof(NNlib.conv_direct!), ::Array{AbstractFloat,5}, ::Array{AbstractFloat,5}, ::Array{Float32,5}, ::DenseConvDims{3,(5, 1, 1),1,1,(1, 1, 1),(2, 2, 0, 0, 0, 0),(1, 1, 1),false}) at .\none:0 [4] #∇conv_data_direct!#152(::Float64, ::Bool, ::typeof(NNlib.∇conv_data_direct!), ::Array{AbstractFloat,5}, ::Array{AbstractFloat,5}, ::Array{Float32,5}, ::DenseConvDims{3,(5, 1, 1),1,1,(1, 1, 1),(2, 2, 0, 0, 0, 0),(1, 1, 1),false}) at C:\Users\wkearney\.julia\packages\NNlib\sSn9M\src\impl\conv_direct.jl:163 [5] ∇conv_data_direct! at C:\Users\wkearney\.julia\packages\NNlib\sSn9M\src\impl\conv_direct.jl:158 [inlined] [6] #∇conv_data!#106(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(∇conv_data!), ::Array{AbstractFloat,5}, ::Array{AbstractFloat,5}, ::Array{Float32,5}, ::DenseConvDims{3,(5, 1, 1),1,1,(1, 1, 1),(2, 2, 0, 0, 0, 0),(1, 1, 1),false}) at C:\Users\wkearney\.julia\packages\NNlib\sSn9M\src\conv.jl:208 [7] ∇conv_data!(::Array{AbstractFloat,5}, ::Array{AbstractFloat,5}, ::Array{Float32,5}, ::DenseConvDims{3,(5, 1, 1),1,1,(1, 1, 1),(2, 2, 0, 0, 0, 0),(1, 1, 1),false}) at C:\Users\wkearney\.julia\packages\NNlib\sSn9M\src\conv.jl:206 [8] #∇conv_data!#67(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(∇conv_data!), ::Array{AbstractFloat,3}, ::Array{AbstractFloat,3}, ::Array{Float32,3}, ::DenseConvDims{1,(5,),1,1,(1,),(2, 2),(1,),false}) at C:\Users\wkearney\.julia\packages\NNlib\sSn9M\src\conv.jl:148 [9] ∇conv_data! at C:\Users\wkearney\.julia\packages\NNlib\sSn9M\src\conv.jl:148 [inlined] [10] #∇conv_data#39 at C:\Users\wkearney\.julia\packages\NNlib\sSn9M\src\conv.jl:103 [inlined] [11] ∇conv_data at C:\Users\wkearney\.julia\packages\NNlib\sSn9M\src\conv.jl:101 [inlined] [12] #1241 at C:\Users\wkearney\.julia\dev\Zygote\src\lib\nnlib.jl:42 [inlined] [13] (::Zygote.var"#4101#back#1243"{Zygote.var"#1241#1242"{Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}},Array{Float32,3},Array{Float32,3},DenseConvDims{1,(5,),1,1,(1,),(2, 2),(1,),false}}})(::Array{AbstractFloat,3}) at C:\Users\wkearney\.julia\packages\ZygoteRules\6nssF\src\adjoint.jl:49 [14] Conv at C:\Users\wkearney\.julia\packages\Flux\IjMZL\src\layers\conv.jl:147 [inlined] [15] (::typeof(∂(λ)))(::Array{Float64,3}) at C:\Users\wkearney\.julia\dev\Zygote\src\compiler\interface2.jl:0 [16] loss at .\REPL[6]:1 [inlined] [17] (::typeof(∂(loss)))(::Float32) at C:\Users\wkearney\.julia\dev\Zygote\src\compiler\interface2.jl:0 [18] #7 at .\REPL[8]:1 [inlined] [19] (::typeof(∂(#7)))(::Float32) at C:\Users\wkearney\.julia\dev\Zygote\src\compiler\interface2.jl:0 [20] (::Zygote.var"#56#57"{Params,Zygote.Context,typeof(∂(#7))})(::Float32) at C:\Users\wkearney\.julia\dev\Zygote\src\compiler\interface.jl:177 [21] gradient(::Function, ::Params) at C:\Users\wkearney\.julia\dev\Zygote\src\compiler\interface.jl:54 ``` Co-authored-by: William Kearney <William.Kearney.ctr@nrlssc.navy.mil>
White testing my own AD package, I tried to compare its performance to Zygote, but results turned to be unexpectedly poor for it.
Setup
Variational autoencoder with all dense layers on MNIST dataset. Nothing fancy, only built-in functions like matrix multiplication, elementwise operations and aggregation functions. I also defined derivative rules for both - Yota (my package) and Zygote. Full source can be found here.
It's also worth to mention that with the original cost function Zygote failed with segfault, so for simplicity I commended out guilty line for now.
Benchmark results
I use some aggressive optimizations (e.g. full buffering) so I actually expected 1.5-3x times difference, but not 30x. Note that both libraries give the same results and code doesn't have dead code branches.
What could go wrong?
sum
with keywords ormean
?The text was updated successfully, but these errors were encountered: