-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
@avx
incorrectly handling variable redefinitions between iterations
#266
Comments
Thanks for the issue. I think correctly handling this will wait for the rewrite, which will track dependencies between loop iterations. |
I see. Does that apply as well to the |
Yes. |
@MasonProtter, here is how to rewrite the loops to get them to work: using LoopVectorization, LinearAlgebra
function matmul!(u::AbstractVector{T}, A::Tridiagonal{T}, v::AbstractVector{T}) where {T}
@assert length(u) == size(A,1) == size(A,2) == length(v)
dl, d, du = A.dl, A.d, A.du
N = length(u); @assert N > 2
p = zero(T)
@inbounds u[1] = d[1] * v[1] + du[1] * v[2]
for i in 2:(N-1)
p = v[i-1]
c = v[i]
n = v[i+1]
u[i] = dl[i-1] * p + d[i] * c + du[i] * n
end
p = v[N-1]
c = v[N]
@inbounds u[N] = dl[N-1] * p + d[N] * c
u
end
function matmul_turbo!(u::AbstractVector{T}, A::Tridiagonal{T}, v::AbstractVector{T}) where {T}
@assert length(u) == size(A,1) == size(A,2) == length(v)
dl, d, du = A.dl, A.d, A.du
N = length(u); @assert N > 2
p = zero(T)
@inbounds u[1] = d[1] * v[1] + du[1] * v[2]
@turbo for i in 2:(N-1)
p = v[i-1]
c = v[i]
n = v[i+1]
u[i] = dl[i-1] * p + d[i] * c + du[i] * n
end
p = v[N-1]
c = v[N]
@inbounds u[N] = dl[N-1] * p + d[N] * c
u
end
let N = 5
T = Float64
A = Tridiagonal(rand(T, N-1), rand(T, N), rand(T, N-1))
v = rand(T, N)
u1 = Array{T}(undef, N)
u2 = Array{T}(undef, N)
matmul!(u1, A, v)
mul!(u2, A, v)
u1 - u2
end
let N = 5
T = Float64
A = Tridiagonal(rand(T, N-1), rand(T, N), rand(T, N-1))
v = rand(T, N)
u1 = Array{T}(undef, N)
u2 = Array{T}(undef, N)
matmul_turbo!(u1, A, v)
mul!(u2, A, v)
u1 - u2
end I get: julia> let N = 5
T = Float64
A = Tridiagonal(rand(T, N-1), rand(T, N), rand(T, N-1))
v = rand(T, N)
u1 = Array{T}(undef, N)
u2 = Array{T}(undef, N)
matmul!(u1, A, v)
mul!(u2, A, v)
u1 - u2
end
5-element Vector{Float64}:
0.0
0.0
0.0
0.0
0.0
julia> let N = 5
T = Float64
A = Tridiagonal(rand(T, N-1), rand(T, N), rand(T, N-1))
v = rand(T, N)
u1 = Array{T}(undef, N)
u2 = Array{T}(undef, N)
matmul_turbo!(u1, A, v)
mul!(u2, A, v)
u1 - u2
end
5-element Vector{Float64}:
0.0
0.0
0.0
1.1102230246251565e-16
0.0 This obviously doesn't solve the issue, LoopVectorization should figure out how to "correctly handle variable redefinitions between iterations" itself. |
Ah interesting, I had tried this before and was running into performance troubles, but I just tried it again and it's quite fast at least for larger matrices. Maybe a change happened in LV or maybe I was just doing something dumb. Thanks Chris! |
I'm not sure if this is something that
@avx
is supposed to be able to handle, but when I writeI get correct results compared to
mul!
However, sticking
@avx
on the loop makes it get the wrong answer:One thing I tried in order to fix this was to use an intermediate array to store
p, c, n
, but@avx
didn't like that, giving anundefvarerror
:The text was updated successfully, but these errors were encountered: