Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

16-bit FFT #31

Closed
milankl opened this issue Jan 7, 2022 · 18 comments
Closed

16-bit FFT #31

milankl opened this issue Jan 7, 2022 · 18 comments
Labels
low precision 🚂 Low precision arithmetics, rounding, 32 or 16-bit.
Milestone

Comments

@milankl
Copy link
Member

milankl commented Jan 7, 2022

We currently use FFTW.jl, Julia-bindings to the C library of the same name. This library implements functions like fft,ifft,rfft,irfft,plan_fft for Float32 and Float64, but there's currently no 16-bit FFT for Float16. Sure, we can always promote Float16 to Float32 and round down after the (inverse) fourier transform, but we should look out for 16-bit FFT implementations.

I've once started the pure-Julia FFT package coolFFT.jl which is just a very simple implementation of the Cooley-Tukey algorithm and therefore only works with arrays of length 2^n, but we could use a polished version of that as a fallback for arbitrary number formats?

@samhatfield

@samhatfield
Copy link
Contributor

There could be a performance penalty from using a home-baked FFT instead of FFTW but I don't think this will be significant for our purposes. Which parameter will be constrained if we go for an 2^n-only FFT? The number of longitudes must be 2^n?

@milankl
Copy link
Member Author

milankl commented Jan 10, 2022

Yes, so we could go for nlon=128, nlat=64 and T42 for example, or T84 with 256x128 should be possible too. But performance is an issue, as I also don't have any experience how to write an rfft/irfft plan even for 2^n (which FFTW provides for arbitrary length)

It'd like to use FFTW.jl as much as possible, but I don't know how to investigate otherwise a 16-bit fourier transform? How did you and Mat did that in previous work?

@samhatfield
Copy link
Contributor

We didn't 😂 We only looked at the Legendre transform!

@samhatfield
Copy link
Contributor

This all sounds familiar... is there no native implementation of FFTW in Julia?

@milankl
Copy link
Member Author

milankl commented Jan 10, 2022

The author for FFTW is a Julia-guy, probably no interest in reinventing the wheel... Instead he wrote a really comprehensive FFTW.jl wrapper! I'll probably chat with him at some point regarding possibilities towards a 16-bit FFT, but for the time being I guess Float32 FFTW.jl it is, and any other format will be converted to Float32 and back afterwards...

@samhatfield
Copy link
Contributor

Sounds good.

@jakebolewski
Copy link

jakebolewski commented Feb 1, 2022

There is a generic FFT implementation in FastTransforms that should work (for powers of 2): https://github.com/JuliaApproximation/FastTransforms.jl/blob/master/src/fftBigFloat.jl

Steven Johnson also had a PR a long time ago for a pure julia DFT implementation based on FFTW a long time ago which could be revived as a starting point: https://github.com/JuliaLang/julia/pull/6193/files

@milankl
Copy link
Member Author

milankl commented Feb 1, 2022

Thanks @jakebolewski for joining the discussion. It took me a while to understand that FastTransforms.jl also exports fft from FFTW.jl, but what we would probably want is FastTransforms.generic_fft which seems to work with power-2 but also non-power2-sized arrays

julia> using FastTransforms
julia> a = rand(Float16,12);
julia> FastTransforms.generic_fft(a)
12-element Vector{ComplexF16}:
     Float16(6.87) - Float16(0.0006714)im
...

However, I can't seem to get the rfft working?

julia> FastTransforms.generic_rfft(a)
ERROR: MethodError: no method matching generic_rfft(::Vector{Float16})
...

Maybe in any case a good starting point!!

@jakebolewski
Copy link

jakebolewski commented Feb 1, 2022

It looks like you have to manually specify a region which seems like an oversight.

@milankl milankl added the low precision 🚂 Low precision arithmetics, rounding, 32 or 16-bit. label Apr 5, 2022
@milankl
Copy link
Member Author

milankl commented Apr 21, 2022

@maximilian-gelbrecht shared these links for 16-bit FFTs with CUDA (cuFFT) on GPUs.

https://docs.nvidia.com/cuda/cufft/index.html
https://docs.nvidia.com/cuda/cufft/index.html#half-precision-transforms

  • requires a GPU but is an alternative to FFTW we are currently using
  • restricted to powers of two
  • but Float16 and BFloat16 support

@giordano
Copy link
Contributor

There is FourierTransforms.jl by @eschnett, but last time I tried it, performance was very far from FFTW

@milankl
Copy link
Member Author

milankl commented Apr 29, 2022

Amazing, thanks Mosè for sharing. For the time being we'll use FFTW.jl, but it's great to know that for non-Float32/64 we could fall back to FourierTransforms.jl! I didn't come across it yet.

@eschnett
Copy link

There is another, older, non-registered package that also implements fast Fourier transforms in Julia. (I forgot the name.) It's not maintained, not registered, and I don't know whether it still works, but it has a much more detailed set of algorithms implemented. It would be worthwhile brushing it up.

@giordano
Copy link
Contributor

@eschnett
Copy link

Yes, I think so.

@giordano
Copy link
Contributor

It's mentioned in the README of your package 😛

@eschnett
Copy link

Yay me!

@milankl milankl added this to the v0.4 milestone Sep 2, 2022
@milankl milankl modified the milestones: v0.4, v0.5 Oct 13, 2022
@milankl milankl modified the milestones: v0.5, v0.6 May 13, 2023
@milankl milankl modified the milestones: v0.6, v0.7 Aug 30, 2023
@milankl
Copy link
Member Author

milankl commented Sep 1, 2023

I cannot remember when I addressed this problem but this actually works now

julia> alms
6×6 LowerTriangularMatrix{ComplexF16}:
 0.03186+0.0im      0.0+0.0im         0.0+0.0im            0.0+0.0im      0.0+0.0im
 -0.8555+0.0im     1.99-1.036im       0.0+0.0im             0.0+0.0im      0.0+0.0im
 -0.6074+0.0im  -0.1892+0.0996im  0.05646-0.5024im          0.0+0.0im      0.0+0.0im
 -0.1587+0.0im    1.364+0.1625im    0.471-0.4744im          0.0+0.0im      0.0+0.0im
 -0.1787+0.0im   -0.506+0.705im    -1.033+0.04752im       0.739+0.5156im   0.0+0.0im
 -0.2646+0.0im    2.184-1.137im    0.7017-0.6562im     0.05887-0.5557im  0.83-0.754im

julia> map = gridded(alms);

julia> alms2 = spectral(map)
6×6 LowerTriangularMatrix{ComplexF16}:
 0.03125+0.0im      0.0+0.0im          0.0+0.0im            0.0+0.0im     0.0+0.0im
  -0.857+0.0im    1.988-1.037im        0.0+0.0im             0.0+0.0im     0.0+0.0im
  -0.608+0.0im  -0.1891+0.09937im  0.05713-0.5024im          0.0+0.0im     0.0+0.0im
 -0.1602+0.0im    1.367+0.1602im    0.4707-0.4739im          0.0+0.0im     0.0+0.0im
 -0.1797+0.0im   -0.505+0.706im     -1.032+0.04785im      0.7393+0.516im   0.0+0.0im
  -0.266+0.0im    2.184-1.139im     0.7017-0.6562im     0.05902-0.555im  0.83-0.7544im

julia> alms2  alms
true

closing this therefore! 🥳

@milankl milankl closed this as completed Sep 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low precision 🚂 Low precision arithmetics, rounding, 32 or 16-bit.
Projects
None yet
Development

No branches or pull requests

5 participants