-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Open
Labels
broadcastApplying a function over a collectionApplying a function over a collectioncompiler:simdinstruction-level vectorizationinstruction-level vectorizationperformanceMust go fasterMust go faster
Description
I was surprised by this slowdown when writing back into x, instead of into another array y:
julia> f23(x) = ifelse(x>0, x^2, x^3);
julia> x = randn(Float32, 100, 100); y = similar(x);
julia> @btime $y .= f23.($x);
1.958 μs (0 allocations: 0 bytes)
julia> @btime $x .= f23.($x);
6.567 μs (0 allocations: 0 bytes)
julia> @btime f23.($x); # allocating; mean time 3.010 μs still faster than x .= case
2.236 μs (2 allocations: 39.11 KiB)This is 1.7.0-rc2, but similar on 1.5 and master, and on other computers. I don't think it's a benchmarking artefact, it persists with evals=1 setup=(x=...; y=...). I don't think it's a hardware limit, since @turbo $x .= f23.($x) with LoopVectorization.jl, or @.. $x = f23($x) with FastBroadcast.jl, don't show this difference.
For comparison, it seems that map! is never fast here, although map is:
julia> @btime map!(f23, $y, $x);
8.055 μs (0 allocations: 0 bytes)
julia> @btime map!(f23, $x, $x);
8.055 μs (0 allocations: 0 bytes)
julia> @btime map(f23, $x);
1.917 μs (2 allocations: 39.11 KiB)Metadata
Metadata
Assignees
Labels
broadcastApplying a function over a collectionApplying a function over a collectioncompiler:simdinstruction-level vectorizationinstruction-level vectorizationperformanceMust go fasterMust go faster