Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should copy_transpose! also transpose elements? #51804

Closed
maltezfaria opened this issue Oct 21, 2023 · 6 comments · Fixed by #52116
Closed

Should copy_transpose! also transpose elements? #51804

maltezfaria opened this issue Oct 21, 2023 · 6 comments · Fixed by #52116
Labels

Comments

@maltezfaria
Copy link
Contributor

A while back I ran into this issue with StaticArrays:

using StaticArrays

A = @SMatrix rand(3, 3)
B = @SMatrix rand(3, 3)

blockA = fill(A, 3, 2)
blockB = fill(B, 3, 2)

val1 = blockA * transpose(blockB)
val2 = blockA * collect(transpose(blockB))

val1  val2 # false, by a lot

Recently, this came back to bite me again, and I managed to narrow down the issue to this chunk in the transpose.jl file.

for jsrc in jr_src
    jdest = first(jr_dest)
    for isrc in ir_src
        B[idest,jdest] = A[isrc,jsrc]
        jdest += step(jr_dest)
    end
    idest += step(ir_dest)
end

Replacing B[idest,jdest] = A[isrc,jsrc] by B[idest,jdest] = transpose(A[isrc,jsrc]) seems to fix the StaticArray bug, which is related to the copy_transpose! not transposing its elements. Since the transpose(x::Number) = x, this change should not break the scalar version, but I am not sure if there are further implications of such a change.

@jishnub jishnub added the domain:linear algebra Linear algebra label Oct 25, 2023
@dkarrasch
Copy link
Member

Transpose and adjoint act recursively, and I'd think that this should be handled consistently. So, what you found looks like a bug to me. OTOH, without the dependency on StaticArrays, this seems to work. What exactly is the difference then?

@maltezfaria
Copy link
Contributor Author

maltezfaria commented Oct 25, 2023

The difference is that a check is made on whether the element types are of bitstype in order to compute an appropriate tile size:

if isbitstype(R) && isbitstype(T) && isbitstype(S) && (tA == 'N' || tB != 'N')

For a matrix of non bits type, the code goes down a different branch, where it takes the transpose of the individual elements, e.g.:

Ctmp += A[i, k] * transpose(B[j, k])

@maltezfaria
Copy link
Contributor Author

Gentle bump.

I think this is indeed a bug. Would be happy to submit a PR if it may help expedite things (but the fix is a single word).

@dkarrasch
Copy link
Member

The function that you're referring to is getting a big overhaul in #52038. Could you perhaps check that one out and see if the issue is still present? Besides, as I said, I think the function should transpose elements, and a PR (including a test) would be most welcome!

@maltezfaria
Copy link
Contributor Author

Thanks for the reference! I can confirm that #52038 fixes the problem I encountered with the generic matmul as it avoids the (still buggy) LinearAlgebra.copy_transpose!:

function test_copy_transpose!(T)
    v = rand(T)
    A = fill(v,2,2)
    At = collect(transpose(A))
    B = similar(A)
    LinearAlgebra.copy_transpose!(B,axes(B,1),axes(B,2),A,axes(A,1),axes(A,2))
    B  At
end

test_copy_transpose!(Float64) # true
test_copy_transpose!(ComplexF64) # true
test_copy_transpose!(SMatrix{2,2,Float64,4}) # false

As already mentioned, the issue is that copy transpose is not recursive.

@maltezfaria
Copy link
Contributor Author

I noticed that copy_transpose! is not used very often in LinearAlgebra, but FWIW the "fix" is simply to transpose the elements during the copy:

function copy_transpose!(B::AbstractVecOrMat, ir_dest::AbstractRange{Int}, jr_dest::AbstractRange{Int},
                         A::AbstractVecOrMat, ir_src::AbstractRange{Int}, jr_src::AbstractRange{Int})
    if length(ir_dest) != length(jr_src)
        throw(ArgumentError(LazyString("source and destination must have same size (got ",
                                   length(jr_src)," and ",length(ir_dest),")")))
    end
    if length(jr_dest) != length(ir_src)
        throw(ArgumentError(LazyString("source and destination must have same size (got ",
                                   length(ir_src)," and ",length(jr_dest),")")))
    end
    @boundscheck checkbounds(B, ir_dest, jr_dest)
    @boundscheck checkbounds(A, ir_src, jr_src)
    idest = first(ir_dest)
    for jsrc in jr_src
        jdest = first(jr_dest)
        for isrc in ir_src
            B[idest,jdest] = transpose(A[isrc,jsrc]) # <-- recursive transposition
            jdest += step(jr_dest)
        end
        idest += step(ir_dest)
    end
    return B
end

I think the reason this is a corner case not caught earlier is that copy_transpose! does not get called in LinearAlgebra unless the entries of the array are of bitstype. So you need a bitstype whose transpose is not itself, such as an SMatrix.

In any case, thanks for the work in making generic_matmatmul faster!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants