Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cudnn complex convolution via gauss trick #517

Merged
merged 10 commits into from
Jul 15, 2023

Conversation

nikopj
Copy link
Contributor

@nikopj nikopj commented Jun 30, 2023

Addresses #510 using the same method as in Pytorch (Gauss's trick, complex conv via 3 real convs).

PR Checklist

  • Tests are added
  • Documentation: conv docstring updated

@CarloLucibello
Copy link
Member

Great, looks good to go. Since this is an important feature, maybe we should document it somewhere? the conv docstring?

@nikopj
Copy link
Contributor Author

nikopj commented Jul 2, 2023

Great, looks good to go. Since this is an important feature, maybe we should document it somewhere? the conv docstring?

I noticed one dissimilarity between the cpu and cuda versions was that the cuda version could only handle when both arguments are complex, whereas the cpu version can handle a mix (though with errors for some of the pullback (∇) functions). This is understandable when the mix of real and complex doesn't make sense. E.g. for dx = ∇conv_data(dy, w), we can have (w real, dy complex) but not (w complex, dy real), b/c w is complex => y complex for y = conv(x, w).

I've now added a few more functions to allow the cuda conv functions to also handle a mix of real and complex inputs (in the same way that the cpu version can), and an additional sentence to the conv docstring mentioning that real and complex inputs are allowed. I tried to make the implementation more clean by writing an inline function _complex! which handles the beta if-statement.

There is one test that fails for the mixed case (∇conv_filter!, non-zero beta, flipkernel=true, all input sizes), though I can't seem to figure out why. I've left it commented out for now with a note. Minus this one bug, I think things are ready to go.

@ToucheSir
Copy link
Member

ToucheSir commented Jul 4, 2023

I would make sure the case with non-zero beta isn't a correctness issue. We have some handling of that for single conv calls, but some of the fast paths (e.g. FluxML/NNlibCUDA.jl#53) may no longer hold if one chains a bunch of them with delegated arguments like this PR does.

@nikopj
Copy link
Contributor Author

nikopj commented Jul 4, 2023

Ok, I think the beta bug is an error with the CPU version. Heres a MWE for the mixed real-complex ∇conv_filter! with non-zero beta:

using NNlib, CUDA, cuDNN

for T=(Float64, ComplexF64), beta=(0,1), flip=(false, true)
    @show T, beta, flip
    x_cpu = fill(T(1), 2, 1, 1)
    w_cpu = T.([1; -1;;;])
    x_gpu = CuArray(x_cpu)
    w_gpu = CuArray(w_cpu)
    cdims = NNlib.DenseConvDims(x_cpu, w_cpu; flipkernel=flip)
    y_cpu = fill(T(1), 1, 1, 1)
    y_gpu = CuArray(y_cpu)

    w_cpu_2 = NNlib.∇conv_filter!(copy(w_cpu), real(x_cpu), y_cpu, cdims, alpha=T(1), beta=T(beta))
    w_gpu_2 = NNlib.∇conv_filter!(copy(w_gpu), real(x_gpu), y_gpu, cdims, alpha=T(1), beta=T(beta))
    @show w_cpu_2
    @show w_gpu_2
    @show w_cpu_2  Array(w_gpu_2)
end

Output:

(T, beta, flip) = (Float64, 0, false)
w_cpu_2 = [1.0; 1.0;;;]
w_gpu_2 = [1.0; 1.0;;;]
w_cpu_2 ≈ Array(w_gpu_2) = true
(T, beta, flip) = (Float64, 0, true)
w_cpu_2 = [1.0; 1.0;;;]
w_gpu_2 = [1.0; 1.0;;;]
w_cpu_2 ≈ Array(w_gpu_2) = true
(T, beta, flip) = (Float64, 1, false)
w_cpu_2 = [2.0; 0.0;;;]
w_gpu_2 = [2.0; 0.0;;;]
w_cpu_2 ≈ Array(w_gpu_2) = true
(T, beta, flip) = (Float64, 1, true)
w_cpu_2 = [2.0; 0.0;;;]
w_gpu_2 = [2.0; 0.0;;;]
w_cpu_2 ≈ Array(w_gpu_2) = true
(T, beta, flip) = (ComplexF64, 0, false)
w_cpu_2 = [1.0 + 0.0im; 1.0 + 0.0im;;;]
w_gpu_2 = [1.0 + 0.0im; 1.0 + 0.0im;;;]
w_cpu_2 ≈ Array(w_gpu_2) = true
(T, beta, flip) = (ComplexF64, 0, true)
w_cpu_2 = [1.0 + 0.0im; 1.0 + 0.0im;;;]
w_gpu_2 = [1.0 + 0.0im; 1.0 + 0.0im;;;]
w_cpu_2 ≈ Array(w_gpu_2) = true
(T, beta, flip) = (ComplexF64, 1, false)
w_cpu_2 = [2.0 + 0.0im; 0.0 + 0.0im;;;]
w_gpu_2 = [2.0 + 0.0im; 0.0 + 0.0im;;;]
w_cpu_2 ≈ Array(w_gpu_2) = true
(T, beta, flip) = (ComplexF64, 1, true)
w_cpu_2 = [0.0 + 0.0im; 2.0 + 0.0im;;;]
w_gpu_2 = [2.0 + 0.0im; 0.0 + 0.0im;;;]
w_cpu_2 ≈ Array(w_gpu_2) = false

So only the last case fails with (T=ComplexF64, beat=1, flipkernel=true). As all of the arguments have zero imaginary part, we should expect the answer to be the same as for T=Float64, which points to the CPU version being incorrect.

Edit

I investigated this further in #518, solution in #519. The PR should be good to go once that is merged.

Copy link
Member

@ToucheSir ToucheSir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that we already have GPU-related notes in https://github.com/FluxML/NNlib.jl/blob/v0.9.3/docs/src/reference.md?plain=1#L79. It would be good to add what @CarloLucibello mentioned there too. Otherwise this LGTM.

@ToucheSir ToucheSir merged commit 6824c6d into FluxML:master Jul 15, 2023
@ToucheSir
Copy link
Member

It's great to have this feature now, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants