-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use real FFTs #36
Comments
For bonus points, optimize on the kpoints that are real (there's gamma but also some others; see https://docs.abinit.org/theory/wavefunctions/ for a nice writeup) and store wavefunctions in packed mode. That should be relatively easy since the only place we actually use them is in the G_to_r functions: it's just that the basis field of the Kpoint struct only contains half the G vectors. NB: to get a unitary representation (orthogonal basis), a specific treatment is needed: each G component gets a factor sqrt(2) except for the DC one. |
About the dispatch: I think it's fair to only do this if the in-place version of the FFTs is used or if an appropriate kwarg is used. |
Related to #9 |
So I've been thinking about this, and the main difficulty is the in-place. First, I've been doing some more serious benchmarking on this: https://gist.github.com/antoine-levitt/3495f47c99676f26824df1fbe9462040. So if I'm interpreting this right, a reasonable mental model is that an alloc, a fill and a copy take about the same time. That unit of time is about 1/8th of an FFT. (some of that might be due to julia issue 130. So allocations do matter, unfortunately (especially when going to eg threading). The main problem is that we use in-place ffts, which is a lot more tricky with real FFTs, essentially because the input and output don't have the same size (they also don't have the same type, but we can work around that with So the simplest option is to just drop the in-place FFTs entirely. Does that hurt us very much? In my tests I see a performance difference of about 10% in favour of in-place. So acceptable for the sake of a much cleaner interface. So, roadmap: 1 loop and thread at the level of Hamiltonian application (rather than FFTs), so FFT functions only ever have to deal with one wavefunction at a time. Ie do 2 do a barebones implementation with out-of-place FFTs keeping the data format for the Psi the same 3 benchmark that implementation. Maybe we need to play some tricks like reuse temporary buffers; we can do that with thread-local buffers (see https://julialang.org/blog/2019/07/multithreading paragraph "thread-local state" for an example; there is also the convenient 4 switch to a compressed representation for the reciprocal-space representation of the psi. That involves annoying factors of sqrt(2) in both the psi and the nonlocal terms, but we can maybe abstract that away, with something like a |
So I've started on this, and hit an unexpected fun thing. I was expecting for the zero kpoint to store only half the G vectors, and have a basis like
in QE (important functions have two versions, one for gamma and the others for the other kpoints) and
in ABINIT (complex numbers are not used for the storage of the wavefunctions, and npw gets divided by 2 in the case when k is the gamma point). Dynamic languages in general and julia in particular really shine here. We just have to be careful to type Psi as a Vector{Any}, so that Psi[1] can be a matrix of reals and Psi[2] a matrix of complex. |
Done in #722. |
Add an option to perform real-to-complex and complex-to-real FFTs with a kwarg, and use that to do the potentials as real arrays. Could also do dispatch on the eltype of f_real, but it's not compatible with G_to_r(f_fourier).
The text was updated successfully, but these errors were encountered: