Skip to content
This repository has been archived by the owner on Nov 18, 2020. It is now read-only.

Implement FFTs #4

Closed
jpsamaroo opened this issue Apr 12, 2019 · 9 comments · Fixed by #17
Closed

Implement FFTs #4

jpsamaroo opened this issue Apr 12, 2019 · 9 comments · Fixed by #17
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@jpsamaroo
Copy link
Member

https://github.com/ROCmSoftwarePlatform/rocFFT

@jpsamaroo jpsamaroo added the help wanted Extra attention is needed label May 16, 2019
@antholzer
Copy link
Contributor

I started working on this. https://github.com/antholzer/ROCArrays.jl/tree/rocfft-basics

1D forward is working (inverse has wrong scaling factor)

I still need to implement region selection (i.e. batched transformation) and 2D/3D.
I will try to get this implemented and then open a pull request.

@jpsamaroo
Copy link
Member Author

Awesome! I see you built on my rocBLAS PR, which seems like a good call. Let me know if you need any help; I'm pretty short on free time these days, but I'd like to do whatever I can to help get your work merged and make this package finally usable 😄

@antholzer
Copy link
Contributor

Regions are still missing and 3D out of place does not seem to do anything.

However I have a problem with synchronization (in C++ they use hipDeviceSynchronize()).
Do you know how we can do this synchronization (or force it when copying to CPU array)?

i.e. using the following test function the test will only pass for verbose=true

function out_of_place(X::AbstractArray{T,N}, verbose=false) where {T <: Complex,N}
    fftw_X = fft(X)
    d_X = ROCArray(T, size(X))
    copyto!(d_X, X)
    p = plan_fft(d_X)
    Y = zeros(T, p.osz)
    d_Y = p * d_X;
    verbose && println(d_Y[1])
    copyto!(Y, d_Y);
    @test isapprox(Y, fftw_X, rtol = MYRTOL, atol = MYATOL)
end

@jpsamaroo
Copy link
Member Author

We should see what that HIP call does under the covers and just do that, with a nicer name of course (like sync(agent::Agent)). Opened an issue https://github.com/JuliaGPU/AMDGPUnative.jl/issues/19, I'll investigate if I can make some time this week.

@jpsamaroo
Copy link
Member Author

That device sync functionality could potentially be less trivial than I'm expecting (I couldn't figure out what HSA calls are needed from a first pass through HIP's source code), so I need to ask: is this functionality a huge blocker for your FFT functionality specifically, or do you foresee it as an overall issue for all ROCm library calls? I want to be sure you're not blocked on me (or someone) getting that functionality implemented 🙂

@antholzer
Copy link
Contributor

I do not think that it a blocker for an initial implementation, since I can work around it like above.

I'm still having some trouble with batched ffts (only 1D batches working) and with rfft, but hopefully I can fix most of it soon ;)
Once I'm done with that I can also try to look into sync functionality, not sure if it is needed for any of the other libraries.

@jpsamaroo jpsamaroo added the enhancement New feature or request label Nov 20, 2019
@antholzer
Copy link
Contributor

Current status here

Test Summary:                  | Pass  Fail  Total
ROCArrays                      |  111    48    159
  ROCArrays External Libraries |  111    48    159
    Build Information          |    1            1
    Highlevel                  |    2            2
    Level 1 BLAS               |   12           12
    Level 2 BLAS               |    2            2
    FFT                        |   94    48    142
      T = Complex{Float64}     |   30     7     37
        1D                     |    3            3
        1D inplace             |    2            2
        2D                     |    3            3
        2D inplace             |    2            2
        Batch 1D               |    6            6
        3D                     |          3      3
        3D inplace             |    2            2
        Batch 2D (in 3D)       |    5     2      7
        Batch 2D (in 4D)       |    7     2      9
      T = Complex{Float32}     |   30     7     37
        1D                     |    3            3
        1D inplace             |    2            2
        2D                     |    3            3
        2D inplace             |    2            2
        Batch 1D               |    6            6
        3D                     |          3      3
        3D inplace             |    2            2
        Batch 2D (in 3D)       |    5     2      7
        Batch 2D (in 4D)       |    7     2      9
      T = Float32              |   19    15     34
        1D                     |    4            4
        2D                     |    2     2      4
        Batch 1D               |    3     3      6
        3D                     |    2     2      4
        Batch 2D (in 3D)       |    3     4      7
        Batch 2D (in 4D)       |    5     4      9
      T = Float64              |   15    19     34
        1D                     |    4            4
        2D                     |    2     2      4
        Batch 1D               |    3     3      6
        3D                     |    1     3      4
        Batch 2D (in 3D)       |    2     5      7
        Batch 2D (in 4D)       |    3     6      9

where 3D Complex is broken: ROCm/rocFFT#270 and I am not sure if (inverse) rfft for 2/3D was really fixed: ROCm/rocFFT#128
Not sure why complex batched is not working, since the parameters should be correct.

@jpsamaroo
Copy link
Member Author

For the closed issue, can you put together a reproducer in C++, and then post that in the issue?

For the open issue, we'll just have to wait on a fix (but at least it's assigned to someone).

Aside from those, awesome work! Would you like to open a PR to gain some visibility for your work? It might make it easier for people to help with any test failures you can't figure out how to solve.

@antholzer
Copy link
Contributor

I created a PR and also posted in the rocFFT issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants