Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broadcast of short-circuiting boolean operator || generates runtime dispatch and per-element allocations #54141

Closed
jcphill opened this issue Apr 18, 2024 · 5 comments · Fixed by #54152
Labels
broadcast Applying a function over a collection performance Must go faster

Comments

@jcphill
Copy link

jcphill commented Apr 18, 2024

Creating a BitVector from a slightly non-trivial expression using the short-circuiting boolean operator .|| results in runtime dispatch, per-element allocations, and worse performance than the equivalent expression using the bitwise operator .|. The arrays are both in a named tuple and of SVector element type, and LinearAlgebra.norm is called on the right-hand side.

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (x86_64-apple-darwin22.4.0)
  CPU: 8 × Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, icelake-client)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)

julia> using StaticArrays

julia> using LinearAlgebra

julia> ab = (a=zeros(SVector{3,Float64}, 1000000),
             b=zeros(SVector{3,Float64}, 1000000));

julia> f_bitwise(ab) = (first.(ab.a).>0) .| (norm.(ab.b) .!= 0)
f_bitwise (generic function with 1 method)

julia> f_boolean(ab) = (first.(ab.a).>0) .|| (norm.(ab.b) .!= 0)
f_boolean (generic function with 1 method)

julia> f_bitwise(ab) == f_boolean(ab)
true

julia> @time f_bitwise(ab);
  0.007586 seconds (4 allocations: 126.391 KiB)

julia> @time f_boolean(ab);
  0.117142 seconds (2.00 M allocations: 61.159 MiB, 69.11% gc time)

julia> using JET

julia> @report_opt f_bitwise(ab)
No errors detected


julia> @report_opt f_boolean(ab)
═════ 6 possible errors found ═════
┌ f_boolean(ab::@NamedTuple{a::Vector{SVector{3, Float64}}, b::Vector{SVector{3, Float64}}}) @ Main ./REPL[6]:1
│┌ materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, Base.Broadcast.var"#7#8"{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, Base.Broadcast.var"#12#14"{Base.Broadcast.var"#18#20"{Base.Broadcast.var"#15#16"{Base.Broadcast.var"#11#13"}, Base.Broadcast.var"#15#16"{Base.Broadcast.var"#17#19"}, Base.Broadcast.var"#25#26"{Base.Broadcast.var"#27#28"}, Base.Broadcast.var"#21#22"{Base.Broadcast.var"#23#24"}, typeof(norm)}, typeof(!=)}, Tuple{Vector{SVector{3, Float64}}, Int64}}}, Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(>), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(first), Tuple{Vector{SVector{3, Float64}}}}, Int64}}, Vector{SVector{3, Float64}}, Int64}}) @ Base.Broadcast ./broadcast.jl:903
││┌ copy(bc::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Tuple{Base.OneTo{Int64}}, Base.Broadcast.var"#7#8"{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, Base.Broadcast.var"#12#14"{Base.Broadcast.var"#18#20"{Base.Broadcast.var"#15#16"{Base.Broadcast.var"#11#13"}, Base.Broadcast.var"#15#16"{Base.Broadcast.var"#17#19"}, Base.Broadcast.var"#25#26"{Base.Broadcast.var"#27#28"}, Base.Broadcast.var"#21#22"{Base.Broadcast.var"#23#24"}, typeof(norm)}, typeof(!=)}, Tuple{Vector{SVector{3, Float64}}, Int64}}}, Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(>), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(first), Tuple{Vector{SVector{3, Float64}}}}, Int64}}, Vector{SVector{3, Float64}}, Int64}}) @ Base.Broadcast ./broadcast.jl:925
│││┌ combine_eltypes(f::Base.Broadcast.var"#7#8"{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, Base.Broadcast.var"#12#14"{Base.Broadcast.var"#18#20"{Base.Broadcast.var"#15#16"{Base.Broadcast.var"#11#13"}, Base.Broadcast.var"#15#16"{Base.Broadcast.var"#17#19"}, Base.Broadcast.var"#25#26"{Base.Broadcast.var"#27#28"}, Base.Broadcast.var"#21#22"{Base.Broadcast.var"#23#24"}, typeof(norm)}, typeof(!=)}, Tuple{Vector{SVector{3, Float64}}, Int64}}}, args::Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(>), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(first), Tuple{Vector{SVector{3, Float64}}}}, Int64}}, Vector{SVector{3, Float64}}, Int64}) @ Base.Broadcast ./broadcast.jl:760
││││┌ promote_typejoin_union(::Type) @ Base ./promotion.jl:188
│││││┌ promote_typejoin(a::Type, b::Type) @ Base ./promotion.jl:172
││││││┌ typejoin(a::Type, b::Type) @ Base ./promotion.jl:34
│││││││┌ typejoin(a::Any, b::Type) @ Base ./promotion.jl:36
││││││││┌ typejoin(a::Any, b::Any) @ Base ./promotion.jl:127
│││││││││ runtime dispatch detected: Base.UnionAll(%403::Any, %405::Any)::Any
││││││││└────────────────────
│││││││┌ typejoin(a::Any, b::Type) @ Base ./promotion.jl:127
││││││││ runtime dispatch detected: Base.UnionAll(%398::Any, %400::Any)::Any
│││││││└────────────────────
││││││┌ typejoin(a::Type, b::Type) @ Base ./promotion.jl:127
│││││││ runtime dispatch detected: Base.UnionAll(%393::Any, %395::Any)::Any
││││││└────────────────────
││││┌ typejoin_union_tuple(T::DataType) @ Base ./promotion.jl:216
│││││ runtime dispatch detected: Base.promote_typejoin_union(%40::Any)::Type
││││└────────────────────
│││┌ combine_eltypes(f::Base.Broadcast.var"#7#8"{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, Base.Broadcast.var"#12#14"{Base.Broadcast.var"#18#20"{Base.Broadcast.var"#15#16"{Base.Broadcast.var"#11#13"}, Base.Broadcast.var"#15#16"{Base.Broadcast.var"#17#19"}, Base.Broadcast.var"#25#26"{Base.Broadcast.var"#27#28"}, Base.Broadcast.var"#21#22"{Base.Broadcast.var"#23#24"}, typeof(norm)}, typeof(!=)}, Tuple{Vector{SVector{3, Float64}}, Int64}}}, args::Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(>), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(first), Tuple{Vector{SVector{3, Float64}}}}, Int64}}, Vector{SVector{3, Float64}}, Int64}) @ Base.Broadcast ./broadcast.jl:760
││││ runtime dispatch detected: Base._return_type(f::Base.Broadcast.var"#7#8"{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, Base.Broadcast.var"#12#14"{Base.Broadcast.var"#18#20"{Base.Broadcast.var"#15#16"{Base.Broadcast.var"#11#13"}, Base.Broadcast.var"#15#16"{Base.Broadcast.var"#17#19"}, Base.Broadcast.var"#25#26"{Base.Broadcast.var"#27#28"}, Base.Broadcast.var"#21#22"{Base.Broadcast.var"#23#24"}, typeof(norm)}, typeof(!=)}, Tuple{Vector{SVector{3, Float64}}, Int64}}}, ::Tuple{Bool, SVector{3, Float64}, Int64})::Type
│││└────────────────────
│││┌ combine_eltypes(f::Base.Broadcast.var"#7#8"{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, Base.Broadcast.var"#12#14"{Base.Broadcast.var"#18#20"{Base.Broadcast.var"#15#16"{Base.Broadcast.var"#11#13"}, Base.Broadcast.var"#15#16"{Base.Broadcast.var"#17#19"}, Base.Broadcast.var"#25#26"{Base.Broadcast.var"#27#28"}, Base.Broadcast.var"#21#22"{Base.Broadcast.var"#23#24"}, typeof(norm)}, typeof(!=)}, Tuple{Vector{SVector{3, Float64}}, Int64}}}, args::Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(>), Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1}, Nothing, typeof(first), Tuple{Vector{SVector{3, Float64}}}}, Int64}}, Vector{SVector{3, Float64}}, Int64}) @ Base.Broadcast ./broadcast.jl:760
││││ runtime dispatch detected: Base.Broadcast.promote_typejoin_union(%2::Type)::Type
│││└────────────────────

Here is the input code only:

using StaticArrays
using LinearAlgebra

ab = (a=zeros(SVector{3,Float64}, 1000000),
      b=zeros(SVector{3,Float64}, 1000000));

f_bitwise(ab) = (first.(ab.a).>0) .| (norm.(ab.b) .!= 0)

f_boolean(ab) = (first.(ab.a).>0) .|| (norm.(ab.b) .!= 0)

f_bitwise(ab) == f_boolean(ab)

@time f_bitwise(ab);

@time f_boolean(ab);

using JET

@report_opt f_bitwise(ab)

@report_opt f_boolean(ab)
@nsajko

This comment was marked as resolved.

@jcphill
Copy link
Author

jcphill commented Apr 18, 2024

Done.

@mbauman
Copy link
Sponsor Member

mbauman commented Apr 18, 2024

Fascinating; I was about to punt this to StaticArrays, but neither f_boolean nor f_bitwise are using StaticArray implementations here. I believe this has to do with how .|| flattens the broadcasted tree before evaluating it. Here's a slightly more minimal MWE:

julia> using StaticArrays, LinearAlgebra; using .Broadcast: instantiate, broadcasted, Broadcasted, oror

julia> bc = convert(Broadcasted{Nothing}, instantiate(broadcasted(oror, false, broadcasted(>, broadcasted(norm, zeros(SVector{3, Float64}, 1000000)), 0))));

julia> dest = falses(1000000);

julia> copyto!(dest, bc); @time copyto!(dest, bc);
  0.111565 seconds (3.00 M allocations: 76.294 MiB, 4.76% gc time)

julia> bc[1]; @time bc[1]
  0.000014 seconds (3 allocations: 80 bytes)
false

Interestingly all of ||, norm and SVector are required here; replacing any with |, sum, or Vector alternatives is not sufficient. I'm still not sure where this is going sideways.

@LilithHafner LilithHafner added the performance Must go faster label Apr 19, 2024
@LilithHafner
Copy link
Member

JET is clean on the OP in 1.11.0-beta1.

It makes sense that the broadcasted .|| would be slower than the .| because for operations as fast as computing the norm of three elements it's faster to always do it than branch and maybe do it.

The allocations are unexpected and remain present even in 1.11.0-beta1. Unexpected allocations are typically worth investigating, and in a task like this, they are also likely the bottleneck.

Aside: IMO it is more idiomatic to define a complex scalar function and broadcast that than to define a complex broadcast, but this does not make the issue you found invalid.

g_bitwise(a, b) = (first(a) > 0) | (norm(b) != 0)
g_bitwise.(ab.a, ab.b)

I concur that flatten is an issue here.

I suspect specialization/compiler heuristics are hurting this:

julia> using Chairmarks, StaticArrays, LinearAlgebra; using .Broadcast: instantiate, broadcasted, Broadcasted, oror

julia> bc = Base.Broadcast.flatten(broadcasted(>, broadcasted(norm, zeros(SVector{3, Float64}, 1000000)), 0))
Broadcasted(#12, (SVector{3, Float64}[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]  …  [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]], 0))

julia> inp = first.(bc.args)
([0.0, 0.0, 0.0], 0)

julia> function slow(bc)
           bcf = Base.Broadcast.flatten(bc)
           broadcasted((args...) -> bcf.f(args...), bcf.args...)
       end
slow (generic function with 1 method)

julia> function fast(bc)
           bcf = Base.Broadcast.flatten(bc)
           broadcasted(bcf.f, bcf.args...)
       end
fast (generic function with 1 method)

julia> @b slow(bc) _.f($inp...)
20.280 ns (2 allocs: 64 bytes)

julia> @b fast(bc) _.f($inp...)
2.840 ns

@LilithHafner
Copy link
Member

LilithHafner commented Apr 19, 2024

This fixes the allocations and major runtime discrepancies:

julia> @eval Base.Broadcast function broadcasted(::OrOr, a, bc::Broadcasted)
           bcf = flatten(bc)
           broadcasted(((a, args::Vararg{Any, N}) where {N}) -> a || bcf.f(args...), a, bcf.args...)
       end

See also: https://docs.julialang.org/en/v1/manual/performance-tips/#Be-aware-of-when-Julia-avoids-specializing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
broadcast Applying a function over a collection performance Must go faster
Projects
None yet
4 participants