-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Closed
Labels
performanceMust go fasterMust go faster
Description
julia> using BenchmarkTools
julia> x = rand(128); y = rand(128);
julia> @noinline function vdiv!(x, y, a)
inva = inv(a)
@inbounds for i ∈ eachindex(x, y)
x[i] = y[i] * inva
end
x
end
vdiv! (generic function with 1 method)
julia> @noinline function vdiv_fast!(x, y, a)
@fastmath inva = 1 / a
@inbounds for i ∈ eachindex(x, y)
@fastmath x[i] = y[i] * inva
end
x
end
vdiv_fast! (generic function with 1 method)
julia> @btime vdiv_fast!($x, $y, 1/1.4);
71.293 ns (0 allocations: 0 bytes)
julia> @btime vdiv!($x, $y, 1/1.4);
11.967 ns (0 allocations: 0 bytes)The LICM pass moves fdiv out of loops, while instcombine moves it back in by combining the multiplication and division.
Clang does not have any instcombines following the last licm, while Julia does.
I don't know if fdiv is the only instruction with this adversarial licm-instcombine behavior, but if it isn't, it seems (based on Clang) that the LLVM passes assume licm will be ordered last.
Metadata
Metadata
Assignees
Labels
performanceMust go fasterMust go faster